Introduction

Analyses on bacterial or viral isolates available in the Choesive Information System are organized into categories. Such groups are assigned names following a specific nomenclature system, which describes the analysis type and execution level inside pipelines.

Nomenclature system

Prefixes

Analysis type and execution level are summarized by a short prefix code. The table below lists such prefixes:


1PP	preprocessing analyses
2AS	assembly tools
2MG	metagenomics analyses
3TX	taxonomical classification
4TY	in silico typing
4AN	genome annotation

The first element of the prefix is a number, indicating the usual execution level: preprocessing is usually performed before any other analysis, thus is assigned level 1; taxonomical analyses require files from assembly as input and as such they are assigned level 3, after preprocessing and assembly.

An additional class is represented by the code "0SQ", which identifies Sequence Quality checks performed automatically on all new reads in Cohesive.

Analysis names

The analysis name follows the prefix and describes the kind of data handling performed by available bioinformatic tools.

For example, the "trimming" analysis' full name is "1PP_trimming", since it's classified as preprocessing. Similarly, de novo assembly will be called "2AS_denovo" and such name will be maintained independently of the bioinformatic tool used to perform it.

The sections below list all analyses and their groups, together with brief descriptions and links to the appropriate Wiki pages.

Suffixes

Many of the available analyses can be executed with multiple softwares, also called "bioinformatic tools" or "methods".

An analysis name can thus be completed by appending 2 underscore characters ("_") and the name of the selected tool; e.g. 2AS_denovo__spades and 2AS_denovo__unicycler, both of which execute the de novo assembly, but with different softwares.

Analyses that can be executed with multiple softwares will allow tool selection through a dropdown menu, available in the second step ("Tools") of the Run Analysis wizard.

Available tools for each analysis are listed in the respective analysis Wiki page.

NOTE: some analyses and pipelines are not yet available in the Cohesive Information System Demo.

Single Sample Analysis

Prefix	Analysis Name	Description	Tools
1PP	trimming	removal of low quality nucleotide calls from raw reads	trimmomatic
	hostdepl	depletion of host sequences: reads are mapped against the selected host genome to remove contaminant sequences	bowtie
	downsampling	reduction of the number of sequences in genomic regions with excessive and uninformative vertical coverage	bbnorm
2AS	denovo	de novo assembly: builds genome scaffolds from the pool of contigs	SPAdes
	denovo		unicycler
	mapping	sequence mapping with a reference genome	bowtie
			ivar
			snippy
2MG	denovo	de novo assembly for metagenomics: the metaSPAdes software builds a de Bruijn graph for all reads with SPAdes, which is then transformed into an assembly graph, finding paths corresponding to genome fragments in a metagenome	metaSPAdes
3TX	class	taxonomic classification and contamination check of the organisms the reads belong to	kraken
			kraken2
			confindr
	species	closest bacterial or viral species identification / identification of the best viral reference	kmerfinder
			blast
			vdabricate
4TY	MLST	Multi-Locus Sequence Typing in silico: it uses schemas of 7 conserved genes specific for each bacterium to assign Sequence Type and Clonal Complex	mlst
	cgMLST	core genome Multi-Locus Sequence Typing in silico phylogenetic analysis: allele calling species-specific core genome allele schemas	chewBBACA
			mentalist
			blastMLST
	flaA	analysis specific for Campylobacter. In silico identification of flaA locus variant (MLST for flaA)	flaA
	lineage	lineage assignment for SARS-CoV2	pangolin
	lineage	lineage assignment for West Nile Virus	westnile
	wgMLST	whole genome MLST	chewBBACA
4AN	genes	functional genome annotation through ORF (Open Reading Frame) search in a genome and identification of possible coded proteins	prokka
	AMR	prediction of antibiotic resistance-associated genes presence	abricate
			blast
			staramr
			filtering

Multi Sample Analysis

Analysis type	Analysis name / Tool	Description
Gene-by-gene based clustering	Grapetree	Gene-by-gene based clustering - builds MST (Minimum Spanning Tree) e NJ (Neighbor Joining) tree nwk files
Gene-by-gene based clustering	Reportree	Gene-by-gene based clustering
Pangenome extraction	Panaroo	calculates a presence/absence binary matrix of accessori genes in the sample genomes, starting from gff files from Prokka (genome annotation)
Pangenome extraction	Snippy-core	executes Snippy to identify mutations (SNPs e indels) in sample reads diverging from a reference haploid genome, followed by Snippy-core to build the core.vcf from Snippy's vcf files. core.vcf contains all core mutations among those listed in Snippy's vcf files
SNP-based clustering	CFSAN	SNP identification through reference-based phylogenetic analysis
	kSNP3	SNP identification through reference-free phylogenetic analysis. It builds a Maximum Likelihood tree graph
	VCF2MST	fast construction of MST graph from a VCF file with no phylogenetic inferences

Pipelines

In addition to single analyses, Cohesive also delivers automatic pipelines, which consist of concatenated single analyses. Pipelines' goal is to make running a workflow simpler and faster, especially if such a workflow is a common one.

The table below lists pipelines and the analyses that are part of them (for more information on softwares used in such analyses, please refer to the corresponding Wiki pages).

Pipeline Name	Description	Analyses
Covid Emergency	SARS-CoV2 fast assembly and lineage assignment	2AS_mapping + 4TY_lineage
Depletion & de novo	host depletion from trimmed reads and de novo assembly	1PP_hostdepl + 2AS_denovo
Genome Draft	mapping and genome annotation. Mapping is performed with both bowtie and Snippy.	2AS_mapping + 4AN_genes
Enterotoxin S. aureus finder	de novo assembly and blast to identify enterotoxin gene presence	2AS_denovo + 4AN_AMR
NgsManager	macro-pipeline which, depending on sample type, executes the pipeline "Raw Reads Processing" followed by "Covid Emergency" (for SARS-CoV2), "WGS on Bacteria" and "Typing on Bacteria" (for bacterial isolates) or "Genoma Draft" (for viral isolates or biologic sample)	analyses from the "Raw Reads Processing", "Covid Emergency", "WGS on Bacteria", "Typing on Bacteria" and "Genoma Draft" pipelines
Raw Reads Processing	reads quality check, trimming and virus/bacteria classification	0SQ_rawreads + 1PP_trimming + 3TX_class
Typing on Bacteria	horizontal and vertical coverage calculation, species calculation, gene annotation, identification of antibiotic resistance- and virulence-associated genes, typing	2AS_mapping + 3TX_species + 4AN_genes + 4AN_AMR + 4TY_wgMLST + 4TY_cgMLST + 4TY_MLST + 4TY_flaA
WGS on Bacteria	assembly of bacterial isolate	2AS_denovo
WNV - lineage calculation and mapping	lineage calculation and mapping against the calculated lineage reference for West Nile Virus samples	4TY_lineage + 2AS_mapping

Nomenclature system

Prefixes

Analysis names

Suffixes

Single Sample Analysis
Multi Sample Analysis
Pipelines

Topics

Introduction

Nomenclature system

Prefixes

Analysis names

Suffixes

Single Sample Analysis

Multi Sample Analysis

Pipelines

Contents

Previous

Analyses, Check analyses

Next

Tool Descriptions, Tools for Long Reads

Topics

Introduction

Nomenclature system

Prefixes

Analysis names

Suffixes

Single Sample Analysis

Multi Sample Analysis

Pipelines

Contents

Search within the documentation

Previous

Analyses, Check analyses

Next

Tool Descriptions, Tools for Long Reads