VCF2MST
Introduction
VCF2MST identifies SARS-CoV-2's PANGO lineages and builds a phylogenetic tree graph through the Minimum Spanning Tree (MST) algorithm, starting from VCF files of Single Nucleotide Polymorphysms (SNPs), without alignment-based phylogenomic inference.
The software builds a Minimum Spanning Tree based on Hamming distances, which measure the number of substitutions necessary for a sequence (string) to be transformed into another string.
More information on the software is available at VCF2MST's official GitHub page ("VCF2MST - Hamming Distance based Minimum Spanning Tree from Samples vcf using graptree") or at the corresponding research article.
Run Analysis VCF2MST
The analysis VCF2MST can be selected from the run analyses page.
The input selection UI delivers an advanced input selection mode, to allow selection of all types of supported input files at once.
Accepted inputs for VCF2MST are from step_2AS_mapping. In the input slection interface there will also be a field to provide the reference genome (auto-filled).
A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.
The analysis summary lists the output directory and additional options, such as access to some of the metadata, log and output files and direct graph visualization thanks to GrapeTree's integration in Cohesive.
Output directory
Please refer to Cohesive's specific Wiki page for information on file download.
The output directory is available at the link in the download page or at the link in the analysis' summary card. The results
directory is located directly in the root directory. Inside results
there are 2 subdirectories:
- meta: ("metadata") contains log and configuration files.
- result: contains the analysis' output files.
The following table lists output files stored in results
.
File | Description | Location |
---|---|---|
tree.nwk | MST treefile in nwk (newick) format | results directory |