4TY_cgMLST
Introduction
4TY_cgMLST performs the "core genome Multi-Locus Sequence Typing" (cgMLST), a bacterial isolate characterization protocol, which allows identification of clones in microbial population.
Run Analysis 4TY_cgMLST
Once the analysis 4TY_cgMLST has been selected from the run analyses interface, the user will be able to select which bioinformatic tool to use. The available tool is:
- chewBBACA - BSR-Based Allele Calling Algorithm
Tools for cgMLST possess schemas for specific bacteria. Schemas available for chewBBACA are listed in the table below.
Tool | Available schemas |
---|---|
chewBBACA | Listeria monocytogenes, Campylobacter jejuni, Campylobacter coli, Staphylococcus aureus, Brucella, Brucella melitensis, Klebsiella pneumoniae. |
Note 1: Running 4TY_cgMLST on a microorganism, for which there is no corresponding cgMLST schema will cause the run to fail.
The input selection UI delivers an advanced input selection mode, to allow selection of all types of supported input files at once.
Accepted inputs can be from:
4TY_cgMLST requires input sequences from de novo assembly or mapping; if the latter are provided, the reference genome that has been used for mapping will also be required.
A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.
Output directory
Please refer to Cohesive's specific Wiki page for information on file download.
The output directory is guida ufficiale di available at the link in the download page or at the link presente in the analysis' summary card, and will have the following structure: results > YEAR > ID > 4TY_cgMLST > DSXXXXXXXX-DTXXXXXX_chewbbaca
. At that path there will be 3 directories:
- meta: ("metadata") contains log and configuration files.
- result: contains the analysis' output files.
- qc: ("quality check") it contains 2 directories (meta and result). In this case quality check is performed with Quast.
Output files from allele call with chewBBACA are available with 3 different encoding:
- IZS encoding: each allele is identified with a progressive numeric ID. ID assignment considers all loci, thus it DOES NOT restart from 1 at each new locus.
- Pasteur encoding: the code for identified alleles consists of a numeric value. Progression restarts from 1 at each locus. For each execution the analysis restarts from the unmodified, downloaded database. Used schema displays download date.
- MD5 encoding: each allele is identified with an alphanumeric code of 16 characters (MD5 code), obtained through a "hash" applied to the allele's sequence.
File | Description | Location |
---|---|---|
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_new_alleles.txt | sequences of newly-identified alleles | result directory |
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_alleles.tsv | allele call with Pasteur encoding in csv format | result directory |
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_contigsInfo.tsv | info about the contig mapped on each locus | result directory |
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_izsam.csv | allele call with IZS encoding | result directory |
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_md5.csv | allele call with md5 encoding | result directory |
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_pasteur_2021-05-28.csv | allele call with Pasteur encoding in tsv format | result directory |
DSXXXXXXXX-DTXXXXXX_ID_chewbbaca_results_statistics.tsv | metrics on loci encoded as EXC, INF, LNF, PLOT, NIPH, ALM, ASM | result directory |
DSXXXXXXXX-ID_import_chewbbaca_check.csv | quality check with info on calledPerc, calledNum, annotated, new, notFound, discarded | qc > result directory |
For more information on locus encoding and on chewBBACA's output files, please refer to chewBBACA's official guide.