Notice
Colib’read project is now officially finished, but our efforts continue. Algorithmics and tools development are still ongoing.
Notice
Colib’read project is now officially finished, but our efforts continue. Algorithmics and tools development are still ongoing.EXECUTIVE SUMMARY
A few years ago, genomics witnessed an unprecedentedly deep change with the advent of High Throughput Sequencing (HTS), also known as Next Generation Sequencing (NGS). These technologies generate data of a new type in huge volumes. Crucial computational developments are needed to take full advantage of these data. Our project proposes an original way of extracting information from such data. Usually, a generic assembly (pretreatment) is applied to the data, and then, in a second step, any information of interest is extracted. Our aim is to avoid this protocol that leads to a significant loss of information, or generates chimerical results because of the heuristics used in the assembly. Instead, we will develop a set of innovative methods for extracting information of biological interest from HTS data, we will develop a set of innovative methods that bypass any costly and often inaccurate assembly phase. Importantly, the developed methods will not require the availability of a reference genome. This broadens considerably the spectrum of applications of our methods. Shortly, for each biological question, our general approach will consist in 1) defining a model for the searched elements; 2) detecting in one or several HTS datasets those
elements that fit the model; 3) outputting those together with a score and their genomic neighborhood. From a computational viewpoint, our proposal relies on a formal model based on the De-Bruijn graph structure to develop algorithms able to handle huge amount of data. Among others, Colib’read will deliver algorithms based on the De-Bruijn graph, and tools validated by biologists.
This project is at the interface between (i) fundamental computational questions, (ii) algorithmic developments including the design of ad-hoc indexes, parallelization, and (iii) biological applications for validation. Finally (iv) it also includes a large public and educational dissemination.
TEAMS
MEMBERS
- Alexan Andrieux
- Guillaume Blin
- Lilia Brinza
- Bastien Cazaux
- Annie Chateau
- Rayan Chikhi
- Liviu Ciortuz
- Thomas Derrien
- Christophe Hitte
- Fabien Jourdan
- Alice Julien Laferriere
- Dominique Lavenier
- Thierry Lecroq
- Fabrice Legeai
- Claire Lemaitre
- Alban Mancheron
- Vincent Miele
- David Parsons
- Nicolas Philippe
- Pierre Peterlongo
- Eric Rivals
- Guillaume Rizk
- Gustavo Sacomoto
- Marie-France Sagot
- Erwan Scaon
- Raluca Uricaru
- Martin Wannagat