Navigation icon
Topics

4TY_cgMLST

Introduction

4TY_cgMLST performs the "core genome Multi-Locus Sequence Typing" (cgMLST), a bacterial isolate characterization protocol, which allows identification of clones in microbial population.

uml diagram

Run Analysis 4TY_cgMLST

Once the analysis 4TY_cgMLST has been selected from the run analyses interface, the user will be able to select which bioinformatic tool to use. The available tool is:

  • chewBBACA - BSR-Based Allele Calling Algorithm

Tools for cgMLST possess schemas for specific bacteria. Schemas available for chewBBACA are listed in the table below.

Tool Available schemas
chewBBACA Listeria monocytogenes, Campylobacter jejuni, Campylobacter coli, Staphylococcus aureus, Brucella, Brucella melitensis, Klebsiella pneumoniae.

Note 1: Running 4TY_cgMLST on a microorganism, for which there is no corresponding cgMLST schema will cause the run to fail.

The input selection UI delivers an advanced input selection mode, to allow selection of all types of supported input files at once.

Accepted inputs can be from:

4TY_cgMLST requires input sequences from de novo assembly or mapping; if the latter are provided, the reference genome that has been used for mapping will also be required.

A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.

Output directory

Please refer to Cohesive's specific Wiki page for information on file download.

The output directory is guida ufficiale di available at the link in the download page or at the link presente in the analysis' summary card, and will have the following structure: results > YEAR > ID > 4TY_cgMLST > DSXXXXXXXX-DTXXXXXX_chewbbaca. At that path there will be 3 directories:

  • meta: ("metadata") contains log and configuration files.
  • result: contains the analysis' output files.
  • qc: ("quality check") it contains 2 directories (meta and result). In this case quality check is performed with Quast.

Output files from allele call with chewBBACA are available with 3 different encoding:

  • IZS encoding: each allele is identified with a progressive numeric ID. ID assignment considers all loci, thus it DOES NOT restart from 1 at each new locus.
  • Pasteur encoding: the code for identified alleles consists of a numeric value. Progression restarts from 1 at each locus. For each execution the analysis restarts from the unmodified, downloaded database. Used schema displays download date.
  • MD5 encoding: each allele is identified with an alphanumeric code of 16 characters (MD5 code), obtained through a "hash" applied to the allele's sequence.
File Description Location
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_new_alleles.txt sequences of newly-identified alleles result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_alleles.tsv allele call with Pasteur encoding in csv format result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_contigsInfo.tsv info about the contig mapped on each locus result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_izsam.csv allele call with IZS encoding result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_md5.csv allele call with md5 encoding result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_pasteur_2021-05-28.csv allele call with Pasteur encoding in tsv format result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_statistics.tsv metrics on loci encoded as EXC, INF, LNF, PLOT, NIPH, ALM, ASM result directory
DSXXXXXXXX-ID_import_chewbbaca_check.csv quality check with info on calledPerc, calledNum, annotated, new, notFound, discarded qc > result directory

For more information on locus encoding and on chewBBACA's output files, please refer to chewBBACA's official guide.