Colib’read Project

EXECUTIVE SUMMARY

A few years ago, genomics witnessed an unprecedentedly deep change with the advent of High Throughput Sequencing (HTS), also known as Next Generation Sequencing (NGS). These technologies generate data of a new type in huge volumes. Crucial computational developments are needed to take full advantage of these data. Our project proposes an original way of extracting information from such data. Usually, a generic assembly (pretreatment) is applied to the data, and then, in a second step, any information of interest is extracted. Our aim is to avoid this protocol that leads to a significant loss of information, or generates chimerical results because of the heuristics used in the assembly. Instead, we will develop a set of innovative methods for extracting information of biological interest from HTS data, we will develop a set of innovative methods that bypass any costly and often inaccurate assembly phase. Importantly, the developed methods will not require the availability of a reference genome. This broadens considerably the spectrum of applications of our methods. Shortly, for each biological question, our general approach will consist in 1) defining a model for the searched elements; 2) detecting in one or several HTS datasets those
elements that fit the model; 3) outputting those together with a score and their genomic neighborhood. From a computational viewpoint, our proposal relies on a formal model based on the De-Bruijn graph structure to develop algorithms able to handle huge amount of data. Among others, Colib’read will deliver algorithms based on the De-Bruijn graph, and tools validated by biologists.

This project is at the interface between (i) fundamental computational questions, (ii) algorithmic developments including the design of ad-hoc indexes, parallelization, and (iii) biological applications for validation. Finally (iv) it also includes a large public and educational dissemination.

TEAMS

MEMBERS

  • Alexan Andrieux
  • Guillaume Blin
  • Lilia Brinza
  • Bastien Cazaux
  • Annie Chateau
  • Rayan Chikhi
  • Liviu Ciortuz
  • Thomas Derrien
  • Christophe Hitte
  • Fabien Jourdan
  • Alice Julien Laferriere
  • Dominique Lavenier
  • Thierry Lecroq
  • Fabrice Legeai
  • Claire Lemaitre
  • Alban Mancheron
  • Vincent Miele
  • David Parsons
  • Nicolas Philippe
  • Pierre Peterlongo
  • Eric Rivals
  • Guillaume Rizk
  • Gustavo Sacomoto
  • Marie-France Sagot
  • Erwan Scaon
  • Raluca Uricaru
  • Martin Wannagat