4AN_genes
Introduction
4AN_genes performs functional genome annotation, which identifies possible coded proteins and ORFs.
Run Analysis 4AN_genes
Once the analysis 4AN_genes has been selected from the run analyses interface, the user will be able to select which bioinformatic tool to use. The only available tool for this analysis is Prokka - Tool to annotate bacterial, archaeal and viral genomes.
The input selection UI delivers an advanced input selection mode, to allow selection of all types of supported input files at once.
The first required parameter is the kingdom (i.e. virus or bacteria, plus "host", an artificial group which includes possible host organisms). The second parameter is a reference genome.
Accepted inputs can be from:
If output from mapping is provided, the reference genome that has been used for mapping will also be required.
A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.
Output directory
Please refer to Cohesive's specific Wiki page for information on file download.
The output directory is available at the link in the download page or at the link presente in the analysis' summary card, and will have the following structure: results > YEAR > ID > 4AN_genes > DSXXXXXXXX-DTXXXXXX_prokka
. At that path there will be 2 directories:
- meta: ("metadata") contains log and configuration files.
- result: contains the analysis' output files.
The following table lists Prokka's output files.
File | Description | Location |
---|---|---|
log_errore_controlli_esami.log | run's warning and error log | main directory |
metadata_samples.tsv | samples' metadata summary table in tsv format | main directory |
results.csv | summary table separated by semicolon (";") containing sample IDs and information | main directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.err | text report file with run's errors | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.faa | amminoacidic sequences from translation of identified coding genes (faa format - fasta aminoacid) | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.ffn | nucleotidic sequences of identified coding genes (fnn format - fasta nucleotide) | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.fna | nucleotidic sequences of identified coding genes (fna format) | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.fsa | sequences in fsa format (fragment analysis data file) | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.gbk | output file in GenBank format | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.gff | output file in gff format (General Feature Format) | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.log | Prokka's run log | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.sqn | file per la sottomissione a GenBank in formato Sequin | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.tbl | text file with information on sequence and loci | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.tsv | tsv list of loci and proteins from mapped coding genes | results directory |
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.txt | metrics on identified CDS | results directory |
proteins.faa | protein sequnces in faa format | results directory |
For more details on Prokka's output files, please refer to Prokka's official manual.