About the COHESIVE Project
Introduction
The "One Health European Joint Programme" (OHEJP) "One Health Structure In Europe" (COHESIVE) aims at improving efficiency of surveillance, risk analysis and outbreak management through a One-Health approach at the Member State level. The achievement of such an objective would require that each Member State integrates data of pathogens from the human, veterinary and food sectors.
The elements under consideration in the OHEJP COHESIVE are:
- WGS data of pathogens analyzed by the Member State laboratories,
- Metadata, such as the minimum epi-data associated with each pathogen.
Moreover, additional useful metadata sources will be considered.
At the Member State level, each system will be analyzed in order to understand:
- which WGS data is considered (i.e., results of bioinformatics approach/pipelines/tool),
- which metadata is considered (i.e., epi-data),
- which interoperability (data import/export) systhems are implemented,
- how harmonization can be performed and so on.
An Information system, the Prototype COHESIVE information system (CIS), is provided to evaluate the integration of information from pathogens at the Member State level.
The Member State willing to participate can choose to fill the CIS with real data, linked existing data, as well as anonymized or random data, during the evaluation exercise. The Member State can decide to test the CIS system just as a passive archive of information or as an active one (i.e., performing the bioinformatics analysis from integrated data).
CIS System Architecture
The CIS is basically a database with a WEB interface based on the opensource CMDBUILD Project. CMDBuild is an open source web enterprise environment used to configure custom applications and to manage databases of items. CMDBuild is released under the open source AGPL license: anyone can download it, install it and freely use it.
CMDBuild has native mechanisms to model his internal database, to design workflow, to configure reports and dashboards, to build connectors with external systems, to geo-refer items, and to administer the system.
The core code is kept separated from the business logic, to ensure maximum extensibility and to allow the use of CMDBuild as a base system to create custom and configurable vertical applications.
CMDBuild is a web-based system: Java for the server side, Ajax web GUI, WebServices based SOA architecture. CMDBuild exploits the best open source technologies and industry standards.
CMDBuild is exclusively based on open source technologies, and in particular on open source solutions that have been accurately selected for technological features and validity.
Integrated software components or interoperable with CMDBuild are the following:
- Database PostgreSQL, the most advanced and reliable open source database (also includes object oriented features, heavily used in database modeling)
- Tomcat servlet container to run the Java server code
- Reporting engine JasperReports with IReport editor
- Quartzschedulersystem
- OpenLayersGeoServer and GIS features
- Sencha Ext JS library for the desktop client interface based on Ajax technology
-
OpenLDAP for accessing external authentication systems
-
Additional features can be find here.
Useful references
Data Flow
Elements from different sources (i.e., metadata from Member State, WGS analysis and every kind of relevant input from Member States) are collected into a database (COHESIVE Cache) and processed through a cleaning step in order to make them homogeneous (cleaning). Subsequently, these elements are processed through a validation step (validation), and only at this point they are ready to be transferred to the PostgresSQL database of the CIS.
ETL Cohesive Information System
Extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the original data in a different context than the source(s).
Data extraction involves data extraction from homogeneous or heterogeneous sources (Cache Database); data transformation processes data through a cleaning step and transforms data into a proper storage format/structure (CIS Cache); finally, data loading describes the insertion of data into the final target database (CIS Database).
A properly designed ETL system extracts data from the source systems, enforces data quality and consistency standards, conforms data so that separate sources can be used together, and finally delivers data in a format ready to be presented (CIS).