Introduction
Analyses on bacterial or viral isolates available in the Choesive Information System are organized into categories. Such groups are assigned names following a specific nomenclature system, which describes the analysis type and execution level inside pipelines.
Nomenclature system
Prefixes
Analysis type and execution level are summarized by a short prefix code. The table below lists such prefixes:
1PP | preprocessing analyses |
2AS | assembly tools |
2MG | metagenomics analyses |
3TX | taxonomical classification |
4TY | in silico typing |
4AN | genome annotation |
The first element of the prefix is a number, indicating the usual execution level: preprocessing is usually performed before any other analysis, thus is assigned level 1; taxonomical analyses require files from assembly as input and as such they are assigned level 3, after preprocessing and assembly.
An additional class is represented by the code "0SQ", which identifies Sequence Quality checks performed automatically on all new reads in Cohesive.
Analysis names
The analysis name follows the prefix and describes the kind of data handling performed by available bioinformatic tools.
For example, the "trimming" analysis' full name is "1PP_trimming", since it's classified as preprocessing. Similarly, de novo assembly will be called "2AS_denovo" and such name will be maintained independently of the bioinformatic tool used to perform it.
The sections below list all analyses and their groups, together with brief descriptions and links to the appropriate Wiki pages.
Suffixes
Many of the available analyses can be executed with multiple softwares, also called "bioinformatic tools" or "methods".
An analysis name can thus be completed by appending 2 underscore characters ("_") and the name of the selected tool; e.g. 2AS_denovo__spades and 2AS_denovo__unicycler, both of which execute the de novo assembly, but with different softwares.
Analyses that can be executed with multiple softwares will allow tool selection through a dropdown menu, available in the second step ("Tools") of the Run Analysis wizard.
Available tools for each analysis are listed in the respective analysis Wiki page.
NOTE: some analyses and pipelines are not yet available in the Cohesive Information System Demo.
Single Sample Analysis
Prefix | Analysis Name | Description | Tools |
---|---|---|---|
1PP | trimming | removal of low quality nucleotide calls from raw reads | trimmomatic |
hostdepl | depletion of host sequences: reads are mapped against the selected host genome to remove contaminant sequences | bowtie | |
downsampling | reduction of the number of sequences in genomic regions with excessive and uninformative vertical coverage | bbnorm | |
2AS | denovo | de novo assembly: builds genome scaffolds from the pool of contigs | SPAdes |
unicycler | |||
mapping | sequence mapping with a reference genome | bowtie | |
ivar | |||
snippy | |||
2MG | denovo | de novo assembly for metagenomics: the metaSPAdes software builds a de Bruijn graph for all reads with SPAdes, which is then transformed into an assembly graph, finding paths corresponding to genome fragments in a metagenome | metaSPAdes |
3TX | class | taxonomic classification and contamination check of the organisms the reads belong to | kraken |
kraken2 | |||
confindr | |||
species | closest bacterial or viral species identification / identification of the best viral reference | kmerfinder | |
blast | |||
vdabricate | |||
4TY | MLST | Multi-Locus Sequence Typing in silico: it uses schemas of 7 conserved genes specific for each bacterium to assign Sequence Type and Clonal Complex | mlst |
cgMLST | core genome Multi-Locus Sequence Typing in silico phylogenetic analysis: allele calling species-specific core genome allele schemas | chewBBACA | |
mentalist | |||
blastMLST | |||
flaA | analysis specific for Campylobacter. In silico identification of flaA locus variant (MLST for flaA) | flaA | |
lineage | lineage assignment for SARS-CoV2 | pangolin | |
lineage assignment for West Nile Virus | westnile | ||
wgMLST | whole genome MLST | chewBBACA | |
4AN | genes | functional genome annotation through ORF (Open Reading Frame) search in a genome and identification of possible coded proteins | prokka |
AMR | prediction of antibiotic resistance-associated genes presence | abricate | |
blast | |||
staramr | |||
filtering |
Multi Sample Analysis
Analysis type | Analysis name / Tool | Description |
---|---|---|
Gene-by-gene based clustering | Grapetree | Gene-by-gene based clustering - builds MST (Minimum Spanning Tree) e NJ (Neighbor Joining) tree nwk files |
Reportree | Gene-by-gene based clustering | |
Pangenome extraction | Panaroo | calculates a presence/absence binary matrix of accessori genes in the sample genomes, starting from gff files from Prokka (genome annotation) |
Snippy-core | executes Snippy to identify mutations (SNPs e indels) in sample reads diverging from a reference haploid genome, followed by Snippy-core to build the core.vcf from Snippy's vcf files. core.vcf contains all core mutations among those listed in Snippy's vcf files | |
SNP-based clustering | CFSAN | SNP identification through reference-based phylogenetic analysis | kSNP3 | SNP identification through reference-free phylogenetic analysis. It builds a Maximum Likelihood tree graph | VCF2MST | fast construction of MST graph from a VCF file with no phylogenetic inferences |
Pipelines
In addition to single analyses, Cohesive also delivers automatic pipelines, which consist of concatenated single analyses. Pipelines' goal is to make running a workflow simpler and faster, especially if such a workflow is a common one.
The table below lists pipelines and the analyses that are part of them (for more information on softwares used in such analyses, please refer to the corresponding Wiki pages).
Pipeline Name | Description | Analyses |
---|---|---|
Covid Emergency | SARS-CoV2 fast assembly and lineage assignment | 2AS_mapping + 4TY_lineage |
Depletion & de novo | host depletion from trimmed reads and de novo assembly | 1PP_hostdepl + 2AS_denovo |
Genome Draft | mapping and genome annotation. Mapping is performed with both bowtie and Snippy. | 2AS_mapping + 4AN_genes |
Enterotoxin S. aureus finder | de novo assembly and blast to identify enterotoxin gene presence | 2AS_denovo + 4AN_AMR |
NgsManager | macro-pipeline which, depending on sample type, executes the pipeline "Raw Reads Processing" followed by "Covid Emergency" (for SARS-CoV2), "WGS on Bacteria" and "Typing on Bacteria" (for bacterial isolates) or "Genoma Draft" (for viral isolates or biologic sample) | analyses from the "Raw Reads Processing", "Covid Emergency", "WGS on Bacteria", "Typing on Bacteria" and "Genoma Draft" pipelines |
Raw Reads Processing | reads quality check, trimming and virus/bacteria classification | 0SQ_rawreads + 1PP_trimming + 3TX_class |
Typing on Bacteria | horizontal and vertical coverage calculation, species calculation, gene annotation, identification of antibiotic resistance- and virulence-associated genes, typing | 2AS_mapping + 3TX_species + 4AN_genes + 4AN_AMR + 4TY_wgMLST + 4TY_cgMLST + 4TY_MLST + 4TY_flaA |
WGS on Bacteria | assembly of bacterial isolate | 2AS_denovo |
WNV - lineage calculation and mapping | lineage calculation and mapping against the calculated lineage reference for West Nile Virus samples | 4TY_lineage + 2AS_mapping |