Return to Software

Compareads

Compareads is no more maintained. You can now use the new Commet software which does more and better than compareads.

[PAGE UPDATED on July, 29 2014]

Description:

Compareads is a tool designed to extract similar reads between potentially huge metagenomic datasets (i.e., hundreds of millions reads per dataset)

Download:

The last Compareads release may be downloaded here: compareads-2.1.1

Please read and accept the License before use and diffusion.
Any feedback and comment is warmly encouraged!

Documents:

Citation:

Nicolas Maillet, Claire Lemaitre, Rayan Chikhi, Dominique Lavenier, Pierre Peterlongo. Compareads: comparing huge metagenomic experiments. BMC Bioinformatics 2012 13(Suppl 19):S10
RECOMB Comparative Genomics 2012, Oct 2012, Niterói, Brazil.compareads_recomb_cg.pdf

Versions:

  • [29/07/2014] compareads-2.1.1
    • Bug fix: reading read files (thanks to Joshua David)
  • [28/05/2014] compareads-2.1
    • Clean-up of the archive
    • Fixes a few bugs (fastq format of extract_reads)
    • Add the -i option to bvop: prints information about a bv file (number of non-filtered reads, …)
  • [30/04/2014] compareads2.0.2
    • Fixed a bug with respect to the zlib version/
  • [18/04/2014] Compareads2
    • Add bit vector representation to reduce the disk space used to store the results
    • Add result manipulation (logical operation between bit vector)
    • Compareads2 has been rewritten in C++
  • [04/11/2013] Compareads1.3.1
    • Fixing OSX Mavericks issues
  • [29/01/2013] Compareads1.3
    • Compareads can know read gz files
    • Print a warning at the end of the process when input files have a significantly different number of reads
    • Both -a files (or -b files) can now be in different format
    • Add a minimum size of reads to use (-r option)
    • Add Shannon index to remove low complexity reads (-e option)
    • Add a new line in log explaining how many reads are indexed, searched and shared
    • Modify the way Compareads use -m option (max number of read to use).
      Before it used the first M reads of a dataset. For example, -a A1.fasta -a A2.fasta -b B1.fasta -b B2.fasta -m 10 was using the first 10 reads of A1.
      If there was less than 10 reads on A1, it completed with reads from A2 to reach the 10 reads required. Likewise for B1 and B2.
      Now it used the first M reads of EACH files, meaning that -a A1.fasta -a A2.fasta -b B1.fasta -b B2.fasta -m 10 will indeed index 20 reads (10 from A1 and 10 from A2) and search 20 reads too.
      But, if there is less than 10 reads in A1, there still be only 10 reads from A2 used, it will not complete the missing reads from A1 by reads from A2. Likewise for B1 and B2.
  • Fix:
    • Better behavior with \n lines inside fasta/fastq files
    • Empty output files are now properly handle with -s option
    • Fix crash due to spaces in file or directory names
    • Fix ignored -l and -o option if -a option was used after those
    • Removing times in log files. The last log file contained informations about the first run but the times of the last pass, it was confusing.
    • Updated help and guide
  • [06/02/2013] Compareads1.2.2
    • Fixed a minor bug when Compareads is not running on input files folder
  • [31/01/2013] Compareads1.2.1
  • Fix:
    • Different behavior of bash ‘readlink’ function on BSD and Linux. Used pwd/basename/dirname instead.
    • Fixed a possible overwriting of log files when multiple instance of Compareads ran in the same folder
    • Add example in help and guide
  • [29/01/2013] Compareads1.2
  • New features:
    • Added a max number of reads to index and search (in case of 2 files with very different number of reads)
    • Support multi-files for a single sample (Illumina paired-reads)
    • Log file and output files are in distinct folders defined by user, but log file names and output file names can’t anymore be modified by user
    • Generating _in_ files (with shared reads) and optionally (-n) _NOTin_ files (with NOT shared reads)
    • Add index/query/total time in log file
  • Fix:
    • Empty last lines were not rendering properly (infinite loop in fasta, bad sequence counting on fastq)
    • Possible bad last bit for reverse complement
    • Correction of an extremely rare bug when the last sequence match the exact KMERMAX size
    • Correction of default values for k and t both in comparead.c and Compareads1.1.sh
    • Fix a minor bug where normalized result was missing first 0 when needed (it was .45 instead of 0.45)
    • Updated help and guide
  • [21/12/2012] Compareads1.1.2
    • Correction of a glitch on last intersection (previous versions have more false positive)
  • [18/09/2012] Compareads1.1[19/06/2012] Compareads1.0
    • Increased speed
    • Removed few bugs
    • Added 4 output options for log files and output files (needed by Galaxy platform)