[page edited on June, 18, 2020] - github webpage
DiscoSnp major update: DiscoSnp becomes DiscoSnp++
Major modifications were made on DiscoSnp these last few weeks. Improvements are the following:
- Thanks to the recoding using the GATB library:
- An even quicker execution speed, and the parallelization of the kissnp2 module.
- An improved progression message
- A unique file for storing the graph. This file (.h5) may be used in any GATB tool.
- DiscoSnp is no more limited to isolated SNP detection:
- Up to P (parameter) close SNPs may be found within a unique bubble
- Insertions and deletions of length lower or equal to D (parameter) are also detected.
Regarding these modifications, and as DiscoSnp is not limited only to SNP prediction anymore, we decided to change also its name.
Thus DiscoSnp becomes DiscoSnp++.
Please post feedbacks and comments on the biostar forum.
A toturial can be followed from those slides demo_discosnp
Software discoSnp++ is designed for discovering Single Nucleotide Polymorphism (SNP) and insertions/deletions (indels) from raw set(s) of reads obtained with Next Generation Sequencers (NGS).
Note that number of input read sets is not constrained, it can be one, two, or more. Note also that no other data as reference genome or annotations are needed.
The software is composed by two modules. First module, kissnp2, detects SNPs from read sets. A second module, kissreads2, enhance the kissnp2 results by computing per read set and for each variant found i/ its mean read coverage and ii/ the (phred) quality of reads generating the polymorphism.
A VCF file using or not a reference genome is also created.
Input (what about read pairs?)
discoSnp takes raw NGS datasets as inputs (fasta, fastq, gzipped or not). No reference genome is required. Read pairs can be given, however the pair information are useless in this framework. The detected SNPs are output in the contig they belong to and the contig length does not depend on pairing information. By the way, two reaf files correspond to paired reads should belong to the same file of files. (see documentations)
Here are a few slides about discoSnp: colloque_GE_2013_discoSnp
Paper & Citation
Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre. (2014). Reference-free detection of isolated SNPs. Nucleic Acids Research. doi:10.1093/nar/gku1187
- Paper discoSnp_NAR_2014
- Additional file discoSnp_NAR_add_file_2014
C. Riou, C. Lemaitre, and P. Peterlongo, “VCF_creator: Mapping and VCF Creation features in DiscoSnp++”. Poster at Jobim 2015
For remark and question, please use the biostar forum
github page: https://github.com/GATB/DiscoSnp
- discoSnp can be used on the GenOuest galaxy server
- A GenOuest account is needed
- discoSnp can be integrated in your galaxy instance using the GenOuest Toolshed (section Symbiose or Next generation Sequencing)
- directly via the toolshed (without authentification) by downloading the source code latest version (in zip, tar.gz or tar.bz2 format)
- by adding the GenOuest toolshed in the “tool_sheds_conf.xml” file in your Galaxy configuration and by installing discoSnp within the Admin panel (Search and browse tool sheds)
Packages debian and ubuntu:
discoSnp is also available via debian and ubuntu
NAR Paper datasets
We believe that the datasets that were used for testing discoSnp may be useful for testing similar tools. All simulated datasets presented in the NAR paper are available from this web site.
n Coli datasets
- Simulated genomes: http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/simulated_genomes/coli_genomes.zip
- Simulated reads: http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/simulated_reads/coli_reads.zip
- Reference snp sets (formatted as the discoSnp ouput): http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/reference_snp_bubbles/reference_snp_coli.zip
- DiscoSnp predictions: http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/discoSnp_results/coli_res_discoSnp.zip
- VCF files used for generating SNPs:
- Reference snp set (formatted as the discoSnp ouput)
- Simulated read sets (2x4GB)
- DiscoSnp predictions