1PP_downsampling

Introduction

Downsampling is defined as a process of reduction in read depth (vertical coverage), at specific positions or regions of the genome.

Sequencing protocols can cause stacking of reads for a specific region, leading to excessive data that slows down execution of downstream analyses, while providing no additional information. Downsampling reads prevents extension of calculation times by discarding read pairs until a defined threshold for desired vertical coverage.

uml diagram

Run Analysis 1PP_downsampling

Once the analysis 1PP_downsampling has been selected from the run analyses interface, the user will be able to select which bioinformatic tool to use. The only available tools for this analysis is bbnorm.

The wizard will require input reads and additional parameters to define the target vertical coverage.

The "kmer length" parameter is specific for used samples, while the "Target" parameter is dependent on kmer length and species. As a result, the second parameter has to be determined empirically.

Warning: the second parameter does not correspond to the final verticale coverage.

Two examples for Listeria monocytogenes are listed below.

species L. monocytogenes, kmer length = 30, target = 31 --> vertical coverage = 40X
species L. monocytogenes, kmer length = 30, target = 8 --> vertical coverage = 10X

The input selection wizard delivers an advanced input selection mode, to allow selection of all types of supported input files at once.

step_1PP_trimming
step_1PP_hostdepl
step_1PP_downsampling

A link to Check analysis will be created after launching the requested analysis. The system will notify the user after a succesful analysis launch and once execution has ended.

Output directory

Please refer to Cohesive's specific Wiki page for information on file download.

The output directory is available at the link in the download page or at the link presente in the analysis' summary card, and will have the following structure: results > YEAR > ID > 1PP_downsampling > DSXXXXXXXX-DTXXXXXX_bbnorm. At that path there will be 2 directories:

meta: ("metadata") contains log and configuration files.
result: contains the analysis' output files.

The table below lists files available in the output directory structure, alongside some useful information.

File	Description	Location
DSXXXXXXXX-DTXXXXXX_ID_bbnorm_kXX_tX_R1.fastq.gz	downsampled read 1 (R1)	result directory
DSXXXXXXXX-DTXXXXXX_ID_bbnorm_kXX_tX_R2.fastq.gz	downsampled read 2 (R2)	result directory

Introduction
Run Analysis 1PP_downsampling
Output directory

Topics

1PP_downsampling

Introduction

Run Analysis 1PP_downsampling

Output directory

Contents

Previous

1 Pre-processing, 1PP_filtering

Next

1 Pre-processing, 1PP_generated

Topics

1PP_downsampling

Introduction

Run Analysis 1PP_downsampling

Output directory

Contents

Search within the documentation

Previous

1 Pre-processing, 1PP_filtering

Next

1 Pre-processing, 1PP_generated