BioSemantic

One-line description: 
a framework developed to automate the integration of biological databases

BioSemantic is a framework developed to automate the integration of biological databases. Data are distributed across many different biological databases, in constant evolution, dispersed across different laboratories, with different data models.<--break->

However common datatypes exist into all these databases. For example, some of these databases contain information about a gene, a study or a DNA sequence. These common datatypes can be annotated with the same bio-ontological concepts.

  1. Biological data and databases specificities

A large amount of biological data is produced by different laboratories. This amount of data increase continuously owing to new high throughput technologies like Next Generation Sequencing and DNA mircroarrays.

We have to deal with the multi-level heterogeneity of available data: heterogeneity of data formats, data types, biological study, biological analysis, or data quality. The frequent development of new high throughput technologies producing biological data requires the development of new bioinformatics pipelines, and new data formats.

Data are distributed across many different biological databases, in constant evolution, dispersed across different laboratories, with different data models,

However common datatypes exist into all these databases. For example, some of these databases contain information about a gene, a study or a DNA sequence. These common datatypes can be annotated with the same bio-ontological concepts.

  1. Why developing a BioSemantic framework?
The biologists need:
  • to query dispersed databases in order to increase the size of  datasets, and then to produce more accurate statistical results.
  • to query dispersed databases in order to allow multi-scale studies, and then to infer new knowledge.
The biologist has often only information about its own type of data, and the type of data he would like to retrieve. However, he does not know how many databases have this type of data. He does not know where the data is stored into the database schema, and the correspondences between elements of data models from different databases.
There are ongoing efforts to describe Web Services with semantic annotations in the genomics context (BioMoby, SSWAP, SADI, BioXSD). These efforts add a semantic level to biological Web Services and simplify the design of these Web Services. But the Web Service implementation is still time consuming.
  1. The BioSemantic approach
BioSemantic automates the generation of SPARQL queries. These queries are added into automatically created Semantic Web Services, annotated with SAWSDL elements. A first manual step is the semantic annotation of a the database schema, realized by an expert, with terms from existing bio-ontologies; then Semantic Web Services can be automatically created, by other developers, using the annotations, without any prior knowledge of the database schema.
  1. BioSemantic framework overview
The framework is divided into two main parts:
 
Creation and annotation of the RDF view
We use the D2RQ platform in order to create a D2RQ mapping file. We add automatically semantic metadata about the relational schema into this mapping file. The last step consists of the manual annotation of elements of the database schema with bio-ontological terms. After all these steps, the mapping file is called RDF view and is stored into a repository.
 
Automatic Semantic Web Service creation
A bioinformatician selects an input and an output ontological term. Afterwards our API parses all the mapping files of our repository, and tries to find a shortest path linking the input to the output. Once the shortest path is detected, it is used for the creation of a SPARQL query. The query is integrated into a Web Service backbone, and the Web Service is annotated with SAWSDL annotations. Then, we register the Semantic Web Service in Web Service registry, such as BioCatalogue.
  1. Download

BioSemantic API: contains all sources for automatic SPARQL query creation

BioSemantic webapp: contains the BioSemantic API and a Web application allowing automatic deployment of Semantic Web Services

BioSemantic is now improved in collaboration with the Institute of Computational Biology (IBC)
Referent(s): 
Manuel Ruiz
Partner(s): 
CIRAD, IRD, INRA, IBC