Navigation icon
Topics

Introduction

Analyses on bacterial or viral isolates available in the Choesive Information System are organized into categories. Such groups are assigned names following a specific nomenclature system, which describes the analysis type and execution level inside pipelines.

Nomenclature system

Prefixes

Analysis type and execution level are summarized by a short prefix code. The table below lists such prefixes:

1PP preprocessing analyses
2AS assembly tools
2MG metagenomics analyses
3TX taxonomical classification
4TY in silico typing
4AN genome annotation

The first element of the prefix is a number, indicating the usual execution level: preprocessing is usually performed before any other analysis, thus is assigned level 1; taxonomical analyses require files from assembly as input and as such they are assigned level 3, after preprocessing and assembly.

An additional class is represented by the code "0SQ", which identifies Sequence Quality checks performed automatically on all new reads in Cohesive.

Analysis names

The analysis name follows the prefix and describes the kind of data handling performed by available bioinformatic tools.

For example, the "trimming" analysis' full name is "1PP_trimming", since it's classified as preprocessing. Similarly, de novo assembly will be called "2AS_denovo" and such name will be maintained independently of the bioinformatic tool used to perform it.

The sections below list all analyses and their groups, together with brief descriptions and links to the appropriate Wiki pages.

Suffixes

Many of the available analyses can be executed with multiple softwares, also called "bioinformatic tools" or "methods".

An analysis name can thus be completed by appending 2 underscore characters ("_") and the name of the selected tool; e.g. 2AS_denovo__spades and 2AS_denovo__unicycler, both of which execute the de novo assembly, but with different softwares.

Analyses that can be executed with multiple softwares will allow tool selection through a dropdown menu, available in the second step ("Tools") of the Run Analysis wizard.

Available tools for each analysis are listed in the respective analysis Wiki page.

NOTE: some analyses and pipelines are not yet available in the Cohesive Information System Demo.

Single Sample Analysis

Prefix Analysis Name Description Tools
1PP trimming removal of low quality nucleotide calls from raw reads trimmomatic
hostdepl depletion of host sequences: reads are mapped against the selected host genome to remove contaminant sequences bowtie
downsampling reduction of the number of sequences in genomic regions with excessive and uninformative vertical coverage bbnorm
2AS denovo de novo assembly: builds genome scaffolds from the pool of contigs SPAdes
unicycler
mapping sequence mapping with a reference genome bowtie
ivar
snippy
2MG denovo de novo assembly for metagenomics: the metaSPAdes software builds a de Bruijn graph for all reads with SPAdes, which is then transformed into an assembly graph, finding paths corresponding to genome fragments in a metagenome metaSPAdes
3TX class taxonomic classification and contamination check of the organisms the reads belong to kraken
kraken2
confindr
species closest bacterial or viral species identification / identification of the best viral reference kmerfinder
blast
vdabricate
4TY MLST Multi-Locus Sequence Typing in silico: it uses schemas of 7 conserved genes specific for each bacterium to assign Sequence Type and Clonal Complex mlst
cgMLST core genome Multi-Locus Sequence Typing in silico phylogenetic analysis: allele calling species-specific core genome allele schemas chewBBACA
mentalist
blastMLST
flaA analysis specific for Campylobacter. In silico identification of flaA locus variant (MLST for flaA) flaA
lineage lineage assignment for SARS-CoV2 pangolin
lineage assignment for West Nile Virus westnile
wgMLST whole genome MLST chewBBACA
4AN genes functional genome annotation through ORF (Open Reading Frame) search in a genome and identification of possible coded proteins prokka
AMR prediction of antibiotic resistance-associated genes presence abricate
blast
staramr
filtering

Multi Sample Analysis

Analysis type Analysis name / Tool Description
Gene-by-gene based clustering Grapetree Gene-by-gene based clustering - builds MST (Minimum Spanning Tree) e NJ (Neighbor Joining) tree nwk files
Reportree Gene-by-gene based clustering
Pangenome extraction Panaroo calculates a presence/absence binary matrix of accessori genes in the sample genomes, starting from gff files from Prokka (genome annotation)
Snippy-core executes Snippy to identify mutations (SNPs e indels) in sample reads diverging from a reference haploid genome, followed by Snippy-core to build the core.vcf from Snippy's vcf files. core.vcf contains all core mutations among those listed in Snippy's vcf files
SNP-based clustering CFSAN SNP identification through reference-based phylogenetic analysis
kSNP3 SNP identification through reference-free phylogenetic analysis. It builds a Maximum Likelihood tree graph
VCF2MST fast construction of MST graph from a VCF file with no phylogenetic inferences

Pipelines

In addition to single analyses, Cohesive also delivers automatic pipelines, which consist of concatenated single analyses. Pipelines' goal is to make running a workflow simpler and faster, especially if such a workflow is a common one.

The table below lists pipelines and the analyses that are part of them (for more information on softwares used in such analyses, please refer to the corresponding Wiki pages).

Pipeline Name Description Analyses
Covid Emergency SARS-CoV2 fast assembly and lineage assignment 2AS_mapping + 4TY_lineage
Depletion & de novo host depletion from trimmed reads and de novo assembly 1PP_hostdepl + 2AS_denovo
Genome Draft mapping and genome annotation. Mapping is performed with both bowtie and Snippy. 2AS_mapping + 4AN_genes
Enterotoxin S. aureus finder de novo assembly and blast to identify enterotoxin gene presence 2AS_denovo + 4AN_AMR
NgsManager macro-pipeline which, depending on sample type, executes the pipeline "Raw Reads Processing" followed by "Covid Emergency" (for SARS-CoV2), "WGS on Bacteria" and "Typing on Bacteria" (for bacterial isolates) or "Genoma Draft" (for viral isolates or biologic sample) analyses from the "Raw Reads Processing", "Covid Emergency", "WGS on Bacteria", "Typing on Bacteria" and "Genoma Draft" pipelines
Raw Reads Processing reads quality check, trimming and virus/bacteria classification 0SQ_rawreads + 1PP_trimming + 3TX_class
Typing on Bacteria horizontal and vertical coverage calculation, species calculation, gene annotation, identification of antibiotic resistance- and virulence-associated genes, typing 2AS_mapping + 3TX_species + 4AN_genes + 4AN_AMR + 4TY_wgMLST + 4TY_cgMLST + 4TY_MLST + 4TY_flaA
WGS on Bacteria assembly of bacterial isolate 2AS_denovo
WNV - lineage calculation and mapping lineage calculation and mapping against the calculated lineage reference for West Nile Virus samples 4TY_lineage + 2AS_mapping