The following steps will prepare you to run complete analyses of SARS-CoV-2 data by installing required software and running a simple example workflow.
The following instructions use Conda to install the tools you'll need for this tutorial. Conda is a package and environment management system that allows you to install Python and other software into controlled environments without disrupting other software you have installed (e.g., on your computer, your shared cluster, etc.).
Install Miniconda with Python 3 for your operating system, update Conda to the latest version, and install Mamba for faster installation.
conda update -n base conda conda install -n base -c conda-forge mamba
mamba create -n nextstrain -c conda-forge -c bioconda \ augur auspice nextstrain-cli nextalign snakemake awscli git pip
Confirm that the installation worked.
conda activate nextstrain nextstrain check-setup --set-default
The final output from the last command should look like this:
Setting default environment to native.
Update your environment regularly to the latest versions of these tools.
# Update Conda and Mamba. conda update -n base conda mamba # Update tools in the Nextstrain environment. conda activate nextstrain conda update --all
Use Git to download a copy of the ncov repository containing the workflow and this tutorial.
git clone https://github.com/nextstrain/ncov.git cd ncov
Alternately, download a compressed copy of the ncov repository called
Open this file to decompress it and create a directory called
ncov-master/ with the contents of the workflow in it.
Navigate to this directory from the command line.
We update the official workflow regularly with:
Update your local copy of the workflow, to benefit from these changes.
# Download and apply changes from the Nextstrain team. # This only works if there is no conflict with your local repository. git pull --ff-only origin master # OR: # Alternately, download and apply changes from the Nextstrain team # and then replay your local changes on top of those incoming changes. git pull --rebase origin master
Alternately, download a specific version of the workflow that you know works for you. We create new releases of the workflow any time we introduce breaking changes, so you can choose when to update based on what has changed.
# Download version 7 (v7) of the workflow. curl -OL https://github.com/nextstrain/ncov/archive/refs/tags/v7.zip # Uncompress the workflow. unzip v7.zip # Change into the workflow's directory. cd ncov-7/
Run a basic workflow with example data, to confirm that your Nextstrain environment is properly configured.
nextstrain build . --cores 4 --use-conda \ --configfile ./my_profiles/getting_started/builds.yaml
nextstrain build command runs a pathogen analysis or "build" defined by a Snakemake workflow in a specific directory.
Snakefile is in the current directory, we specify the build directory as
All other arguments pass through to Snakemake.
--use-conda flag instructs Snakemake to maintain a workflow-specific Conda environment defined in
Snakemake will activate this environment before it runs each rule, ensuring each rule runs with the necessary software.
The first time you use this flag, Snakemake will download and install all Nextstrain tools required to run the workflow.
This can take a little time.
The next time you use this flag, Snakemake will detect the existing environment and use it immediately.
getting_started build produces a minimal global phylogeny for visualization in Auspice.
This workflow should complete in about 10 minutes on a MacBook Pro (2.7 GHz Intel Core i5) with four cores, including 5 minutes to create the Conda environment and 5 minutes to run the workflow.
Open http://auspice.us in your browser.
Drag and drop the JSON file
auspice/ncov_global.json anywhere on the http://auspice.us landing page, to visualize the resulting phylogeny.
The resulting phylogeny should look something like this.