Nextstrain logo

Tutorial: Using Nextstrain for SARS-CoV-2

Analysis

1. Setup and installation
2. Preparing your data
3. Orientation: analysis workflow
4. Orientation: files
5. Running & troubleshooting
6. Customizing your analysis
7. Customizing your visualization

Visualization & interpretation

8. Options: visualizing and sharing
9. Interpreting your results
10. Writing a narrative

Stuck? Ask us on the discussion board. We're happy to help!

This project is maintained by nextstrain

Configuration parameters for Nextstrain SARS-CoV-2 workflow

S3_DST_BUCKET

S3_DST_COMPRESSION

S3_DST_ORIGINS

active_builds

ancestral

inference

build_sizes

builds

builds:
  global:
    region: global
    subsampling_scheme: global

  washington:
    region: North America
    country: USA
    division: Washington
    subsampling_scheme: all

Builds support any named attributes that can be referenced by subsampling schemes. Builds also support the following specific attributes.

auspice_config

colors

description

region

subclades

subsampling_scheme

title

combine_sequences_for_subsampling

warn_about_duplicates

conda_environment

custom_rules

default_build_name

exposure

default

files

include

exclude

reference

alignment_reference

annotation

outgroup

ordering

color_schemes

auspice_config

lat_longs

description

clades

emerging_lineages

filter

min_length

exclude_where

exclude_ambiguous_dates_by

min_date

frequencies

min_date

pivot_interval

pivot_interval_units

narrow_bandwidth

proportion_wide

minimal_frequency

stiffness

inertia

genes

inputs

inputs:
  - name: example-data
    metadata: data/example_metadata.tsv.xz
    sequences: data/example_sequences.fasta.xz
  - name: prealigned-data
    metadata: data/other_metadata.tsv.xz
    aligned: data/other_aligned.fasta.xz
  - name: prealigned-and-masked-data
    metadata: data/other_metadata.tsv.xz
    masked: data/other_masked.fasta.xz
  - name: prealigned-masked-and-filtered-data
    metadata: data/other_metadata.tsv.xz
    filtered: data/other_masked.fasta.xz

Valid attributes for list entries in inputs are provided below.

name

metadata

sequences

aligned

masked

filtered

localrules

logistic_growth

delta_pivots

min_tips

min_frequency

max_frequency

mask

mask_from_beginning

mask_from_end

mask_sites

partition_sequences

reference_node_name

refine

root

clock_rate

clock_std_dev

coalescent

date_inference

divergence_unit

clock_filter_iqd

keep_polytomies

no_timetree

run_pangolin

deploy_url

slack_channel

slack_token

strip_strain_prefixes

sanitize_metadata

parse_location_field

rename_fields

    - "Virus name=strain"
    - "Type=type"
    - "Accession ID=gisaid_epi_isl"
    - "Collection date=date"
    - "Additional location information=additional_location_information"
    - "Sequence length=length"
    - "Host=host"
    - "Patient age=patient_age"
    - "Gender=sex"
    - "Clade=GISAID_clade"
    - "Pango lineage=pango_lineage"
    - "Pangolin version=pangolin_version"
    - "Variant=variant"
    - "AA Substitutions=aa_substitutions"
    - "aaSubtitutions=aa_substitutions"
    - "Submission date=date_submitted"
    - "Is reference?=is_reference"
    - "Is complete?=is_complete"
    - "Is high coverage?=is_high_coverage"
    - "Is low coverage?=is_low_coverage"
    - "N-Content=n_content"
    - "GC-Content=gc_content"

subsampling

Each named subsampling scheme supports the following attributes that the workflow passes to augur filter.

group_by

seq_per_group

max_sequences

sampling_scheme

exclude

include

query

exclude_ambiguous_dates_by

min_date

max_date

priorities

subsampling:
  my-scheme:
    my-first-rule:
      max_sequences: 10
    my-second-rule:
      max_sequences: 10
      # Prioritize sequences that are genetically similar to
      # sequences in the sequences selected by the
      # `my-first-rule` rule.
      priorities:
        type: proximity
        focus: my-first-rule

title

traits

traits:
  default:
    sampling_bias_correction: 2.5
    columns: ["country_exposure"]
  washington:
    # Override default sampling bias correction for
    # "washington" build and continue to use default
    # trait columns.
    sampling_bias_correction: 5.0

Each named traits configuration (default or build-named) supports the following attributes.

sampling_bias_correction

columns

tree

tree-builder-args

auspice_json_prefix