Navigation icon
Topics

4AN_genes

Introduction

4AN_genes performs functional genome annotation, which identifies possible coded proteins and ORFs.

uml diagram

Run Analysis 4AN_genes

Once the analysis 4AN_genes has been selected from the run analyses interface, the user will be able to select which bioinformatic tool to use. The only available tool for this analysis is Prokka - Tool to annotate bacterial, archaeal and viral genomes.

The input selection UI delivers an advanced input selection mode, to allow selection of all types of supported input files at once.

The first required parameter is the kingdom (i.e. virus or bacteria, plus "host", an artificial group which includes possible host organisms). The second parameter is a reference genome.

Accepted inputs can be from:

If output from mapping is provided, the reference genome that has been used for mapping will also be required.

A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.

Output directory

Please refer to Cohesive's specific Wiki page for information on file download.

The output directory is available at the link in the download page or at the link presente in the analysis' summary card, and will have the following structure: results > YEAR > ID > 4AN_genes > DSXXXXXXXX-DTXXXXXX_prokka. At that path there will be 2 directories:

  • meta: ("metadata") contains log and configuration files.
  • result: contains the analysis' output files.

The following table lists Prokka's output files.

File Description Location
log_errore_controlli_esami.log run's warning and error log main directory
metadata_samples.tsv samples' metadata summary table in tsv format main directory
results.csv summary table separated by semicolon (";") containing sample IDs and information main directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.err text report file with run's errors results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.faa amminoacidic sequences from translation of identified coding genes (faa format - fasta aminoacid) results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.ffn nucleotidic sequences of identified coding genes (fnn format - fasta nucleotide) results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.fna nucleotidic sequences of identified coding genes (fna format) results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.fsa sequences in fsa format (fragment analysis data file) results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.gbk output file in GenBank format results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.gff output file in gff format (General Feature Format) results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.log Prokka's run log results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.sqn file per la sottomissione a GenBank in formato Sequin results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.tbl text file with information on sequence and loci results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.tsv tsv list of loci and proteins from mapped coding genes results directory
DSXXXXXXXX-DTXXXXXX_ID_prokka_REFID_result.txt metrics on identified CDS results directory
proteins.faa protein sequnces in faa format results directory

For more details on Prokka's output files, please refer to Prokka's official manual.