discoSnp++

[page edited on June, 18, 2020] - github webpage

Important!

DiscoSnp major update: DiscoSnp becomes DiscoSnp++

Major modifications were made on DiscoSnp these last few weeks. Improvements are the following:

Thanks to the recoding using the GATB library:
- An even quicker execution speed, and the parallelization of the kissnp2 module.
- An improved progression message
- A unique file for storing the graph. This file (.h5) may be used in any GATB tool.
DiscoSnp is no more limited to isolated SNP detection:
- Up to P (parameter) close SNPs may be found within a unique bubble
- Insertions and deletions of length lower or equal to D (parameter) are also detected.

Regarding these modifications, and as DiscoSnp is not limited only to SNP prediction anymore, we decided to change also its name.
Thus DiscoSnp becomes DiscoSnp++.

Please post feedbacks and comments on the biostar forum.

Tutorial

A toturial can be followed from those slides demo_discosnp

Description

Software discoSnp++ is designed for discovering Single Nucleotide Polymorphism (SNP) and insertions/deletions (indels) from raw set(s) of reads obtained with Next Generation Sequencers (NGS).
Note that number of input read sets is not constrained, it can be one, two, or more. Note also that no other data as reference genome or annotations are needed.
The software is composed by two modules. First module, kissnp2, detects SNPs from read sets. A second module, kissreads2, enhance the kissnp2 results by computing per read set and for each variant found i/ its mean read coverage and ii/ the (phred) quality of reads generating the polymorphism.

A VCF file using or not a reference genome is also created.

Input (what about read pairs?)

discoSnp takes raw NGS datasets as inputs (fasta, fastq, gzipped or not). No reference genome is required. Read pairs can be given, however the pair information are useless in this framework. The detected SNPs are output in the contig they belong to and the contig length does not depend on pairing information. By the way, two reaf files correspond to paired reads should belong to the same file of files. (see documentations)

Short presentation:

Here are a few slides about discoSnp: colloque_GE_2013_discoSnp

Paper & Citation

Uricaru, Raluca; Rizk, Guillaume; Lacroix, Vincent; Quillery, Elsa; Plantard, Olivier; Chikhi, Rayan; Lemaitre, Claire; Peterlongo, Pierre. (2014). Reference-free detection of isolated SNPs. Nucleic Acids Research. doi:10.1093/nar/gku1187

Paper discoSnp_NAR_2014
Additional file discoSnp_NAR_add_file_2014

C. Riou, C. Lemaitre, and P. Peterlongo, “VCF_creator: Mapping and VCF Creation features in DiscoSnp++”. Poster at Jobim 2015

Forum

For remark and question, please use the biostar forum

Download

github page: https://github.com/GATB/DiscoSnp

Galaxy

discoSnp can be used on the GenOuest galaxy server
- A GenOuest account is needed
discoSnp can be integrated in your galaxy instance using the GenOuest Toolshed (section Symbiose or Next generation Sequencing)
- directly via the toolshed (without authentification) by downloading the source code latest version (in zip, tar.gz or tar.bz2 format)
- by adding the GenOuest toolshed in the “tool_sheds_conf.xml” file in your Galaxy configuration and by installing discoSnp within the Admin panel (Search and browse tool sheds)

Packages debian and ubuntu:

discoSnp is also available via debian and ubuntu

NAR Paper datasets

We believe that the datasets that were used for testing discoSnp may be useful for testing similar tools. All simulated datasets presented in the NAR paper are available from this web site.

n Coli datasets

Simulated genomes: http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/simulated_genomes/coli_genomes.zip
Simulated reads: http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/simulated_reads/coli_reads.zip
Reference snp sets (formatted as the discoSnp ouput): http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/reference_snp_bubbles/reference_snp_coli.zip
DiscoSnp predictions: http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/coli/discoSnp_results/coli_res_discoSnp.zip

Human dataset

VCF files used for generating SNPs:
- http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/human/HG00096.20101123.genotypes_all_notcommon.vcf
- http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/human/HG00100.20101123.genotypes_all_notcommon.vcf
Reference snp set (formatted as the discoSnp ouput)
- http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/human/reference_isloated_snps_human_chr1_96_100.fa
Simulated read sets (2x4GB)
- http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/human/individualHG00096_reads.fasta.gz
- http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/human/individualHG00100_reads.fasta.gz
DiscoSnp predictions
- http://www.irisa.fr/symbiose/people/ppeterlongo/discoSnp_data/human/res_disco_1.2.1_b0_k_31_c_4_coherent.fa