Navigation icon
Topics

Zip and metadata rules

Metadata files and manually-created metadata cards have to follow some rules in order to be correctly validated. This section of the Wiki lists and explains such rules.

Please check the FAQ section in case of doubt.

Metadata

Add data

If metadata are submitted using the "Add data" button, or when metadata cards are modified manually, users are guided by a wizard to compile metadata. In this case only 2 conditions need to be met to ensure a successful metadata validation:

  1. Submit the correct Code (i.e. filename);
  2. Fill in at least the minimum required metadata.

1 - Code

The "Code" corresponds to the name of the fasta or fastq file to which the metadata refer.

  • The code should include the actual file name without extension;
  • The code should not contain blank spaces;
  • In case of fastq of paired-end reads, the code should not include the read forward or read reverse identifier (i.e. "_R1" or "_1") and it should be unique (the metadata card refers to both reads in a pair).

2 - Minimum required metadata

Mandatory fields for metadata may vary depending on sample type; as an example, a public sample will required less mandatory metadata if compared to a sample marked as an official submission.

Minimum required metadata for a generic sample are:

  • The filename or "Code";
  • Sample type (depends on available metadata templates);
  • Species.

The sample type can be selected from a dropdown menu, while the species is selected from the pop-up table of species available in the information system's database.

Add file

Templates for metadata tables are available as .tsv and .csv files and can be downloaded from the Prepare Upload page, or directly from the "Add file" pop-up, as shown in the following video:

CIS-metadata_template.mp4

Each of the file's columns corresponds to a field available in the metadata card.

Depending on user profile, one or more templates may be available, one for each sample type that user is allowed to upload to Cohesive.

Note: always use the metadata template corresponding to the sample type to upload.

Templates include the header and an example line to demonstrate formatting and expected content of each field.

In case the selected template requires to fill in metadata about "species", "material", "host", "matrix" or "sampling point", such fields should not be filled with text or taxonomy: they require the corresponding metadata code of Cohesive's database.

Tables of metadata codes can be consulted in Main elements > Metadata from the Navigation Menu.

  • Species: Main elements > Metadata > Pathogen Species
  • Material: Main elements > Metadata > Material
  • Host: Main elements > Metadata > Host Species
  • Matrix: Main elements > Metadata > Matrix
  • Sampling point: Main elements > Metadata > Sampling Point

Please refer to the Metadata section of this Wiki for more information on the topic.

Sequence files

Accepted extensions for submitted fasta files are: .fasta, .fa.

Accepted extensions for submitted fastq files are: .fastq.gz, .fq.gz.

Note: compression of fastq files in .gz archives is mandatory. Fasta files are accepted with or without .gz compression.

Zip archive

When uploading multiple sequences as a single compressed file, .zip compression has to be used. When loading the .zip file, the system will automatically check metadata cards, which will be paired to the corresponding samples. For this reason, the name of each fasta or fastq file needs to meet the rules discussed above.

In case the zip file contains files with names that are unpairable to any validated metadata card, the upload will exit with "failure" or "warnings" status.

Upload with as zip file can only be used if the zip files contains files of the same type, because each file type needs to be managed through a different upload procedure (e.g. upload of fastas versus upload of paired-end fastqs versus single-end fastqs).

The zip file may contain folders, since the upload system supports retrieval of compressed files up to 10 levels of sub-folders in the archive.