Navigation icon
Topics

4TY_wgMLST

Introduction

4TY_wgMLST performs the "whole genome Multi-Locus Sequence Typing" (wgMLST) in silico for the Campylobacter jejuni and Campylobacter coli bacteria. Differently than cgMLST (core genome MLST), characterization takes into account the whole genome.

uml diagram

Run Analysis 4TY_wgMLST

Once the analysis 4TY_wgMLST has been selected from the run analyses interface, the user will be able to select which bioinformatic tool to use. The available tool is chewBBACA - BSR-Based Allele Calling Algorithm.

Note: Only Campylobacter jejuni and Campylobacter coli wgMLST schemas are available. Running the analysis on a different microorganism wiil cause the run to fail.

The input selection UI delivers an advanced input selection mode, to allow selection of all types of supported input files at once.

Accepted inputs can be from:

The software requires the input sequences from de novo assembly or from mapping and, in case the latter is selected, the reference genome used for mapping.

A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.

Output directory

Please refer to Cohesive's specific Wiki page for information on file download.

The output directory is guida ufficiale di available at the link in the download page or at the link presente in the analysis' summary card, and will have the following structure: results > YEAR > ID > 4TY_wgMLST > DSXXXXXXXX-DTXXXXXX_chewbbaca. At that path there will be 3 directories:

  • meta: ("metadata") contains log and configuration files.
  • result: contains the analysis' output files.
  • qc: ("quality check") it contains 2 directories (meta and result). In this case quality check is performed with Quast.

Output files from allele call with chewBBACA are available with 3 different encoding:

  • IZS encoding: each allele is identified with a progressive numeric ID. ID assignment considers all loci, thus it DOES NOT restart from 1 at each new locus.
  • Pasteur encoding: the code for identified alleles consists of a numeric value. Progression restarts from 1 at each locus. For each execution the analysis restarts from the unmodified, downloaded database. Used schema displays download date.
  • MD5 encoding: each allele is identified with an alphanumeric code of 16 characters (MD5 code), obtained through a "hash" applied to the allele's sequence.

Please note that the example file "DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_ccoli_curated_210722.csv" mentioned in the table below can have either the name C. coli or C. jejuni as suffix, depending on the analyzed species.

File Description Location
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_new_alleles.txt sequence of newly-identified alleles result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_alleles.tsv allele call with Pasteur encoding in csv format result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_ccoli_curated_210722.csv allele csv table. Rows: samples, columns: loci result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_contigsInfo.tsv info about the contig mapped on each locus result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_crc32.csv allele csv table encoded in CRC32. Rows: samples, columns: loci result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_izsam.csv allele call with IZS encoding result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_md5.csv allele call with md5 encoding result directory
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_statistics.tsv metrics on loci encoded as EXC, INF, LNF, PLOT, NIPH, ALM, ASM result directory
DSXXXXXXXX-DTXXXXXX_ID_import_chewbbaca_check.csv quality check with info on calledPerc, calledNum, annotated, new, notFound, discarded qc > result directory

For more information on locus encoding and on chewBBACA's output files, please refer to chewBBACA's official guide.