1 Requirements and installation
1.1 Requirements
This pipeline is written using nextflow DSL2. You need to have Nextflow >= 23.04.4 available on your system. Please refer to corresponding Nextflow documentation Nextflow requires Java to be available. This pipeline has been tested using Java v 17.0.4
While nextflow will download/make software used in this pipeline available, you sill need to download some external databases and provide their path to use this pipeline.
1.2 Installation
1.3 Basic installation
To be able to run, [Nextflow](https://www.nextflow.io/) v > XX
must be installed.
We use containers to run each software. A container system (Docker, Singularity or Apptainer must be available our your system).
Notes:
We use Apptainer on our HPC, which is then what the pipeline has been tested with.
Running the pipeline with Conda is currently not available
1.3.1 Obtain necessary external databases
This pipeline make use of several external databases, you will have to modify the path of those in the nextflow.config
file, so it corresponds to your system (OR you can provide those in the running command, but that is certainly more combersome).
We unfortunately cannot provide image nor create those databases as ready to use for the pipeline, as they would require too much space or need to be adapted to the organism you are working with.
Moreover, internet access might not be accessible at run time on HPC clusters, so downloading those databases is the only option.
1.3.1.1 Blastn database (FILTER_CONTIGS track / option in COMPASS track)
A local blastn database must have been pre-downloaded before using this track (no internet query during compute). For more information, please see NCBI here for download. Please provide the path of blast database to the parameter blastDB
in nextflow.config.
1.3.1.2 NCBI taxonomic rank file (FILTER_CONTIGS track / option in COMPASS track)
The taxonomic rank file rankedlineage.dmp
from NCBI new taxonomy is used to link the taxon_id (number) to the scientific name.
This file must be downloaded and its path must be provided as parameter ranked_taxo_file
in nextflow.config at installation.
1.3.1.3 KRAKEN2 (COMPASS track)
If you are working with organism that is not (or very poorly) represented in the database, you will get taxonomic classification that is attributed to other taxonomic groups.
If you decide to download an already made database, please ensure that the organism you are interested in is included in the database. Note that Kraken mini database is likely to be too restrictive and this will lead to spurious results.
For better classification results, we recomend you build your own database, ensuring that the organism of interst is included in the database.
Please have a look at Kraken2 documentation for database creating.
Please provide the path of kraken2 database to the parameter krakenDB
in nextflow.config.
1.3.1.4 Busco (COMPASS track)
Please refer to BUSCO documentation for downloading the lineage markers sets that you will want to work with, and provide the path of the busco_downloads
directory (that is created at download by BUSCO) to the parameter busco_download_path
in nextflow.config.
1.3.2 Configuration on your system
Clone the diplotopia repository. Create a configuration file to run on your system, using a template in ./config
1.3.3 Configuration of your bashrc for nextflow
As per today, this pipeline will only run with containers. You will need some configuration of paths nextflow will use to cache and store those containers, according to your system. This is related to nextflow usage, and not to diplotopia specifically.
Please have a look at nextflow documentation to configure containers.
NB: SAGA users, please look HERE, to configure your system (.bashrc) ready to use nexflow