Hey, I tried the latest release of Bowtie, which now supports color space.
I downloaded the dataset from the link below and got the results further below after slightly less than 4 hours.
(I study sequence patterns in Nucleosomal DNA.)
ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000425/
(both files)
/Desktop/bowtie-0.12.0-beta1$ ./bowtie -C -q -a -m 1 c_elegans_ws200_c c_elegans_425.fastq c_elegans_425_positions.txt | awk -v OFS='\t' '{if($2 == "-") {$4 += (length($5)-1)} ; print $0}'
# reads processed: 107422570
# reads with at least one reported alignment: 35370418 (32.93%)
# reads that failed to align: 66794981 (62.18%)
# reads with alignments suppressed due to -m: 5257171 (4.89%)
Reported 35370418 alignments to 1 output stream(s)
I think the quality of reads using Solid is not as good as Solexa (i.e. more mismatches). I used the default 2 mismatch threshold.
I'm going to try to compare with BFAST someday once I figure out how to make proper masks for C. elegans. Bowtie is nice for those who have no idea how these programs work and are not computer science majors. The manual is very easy to understand and one can simply download the indices for commonly used genomes from the website.
-Clayton
I downloaded the dataset from the link below and got the results further below after slightly less than 4 hours.
(I study sequence patterns in Nucleosomal DNA.)
ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000425/
(both files)
/Desktop/bowtie-0.12.0-beta1$ ./bowtie -C -q -a -m 1 c_elegans_ws200_c c_elegans_425.fastq c_elegans_425_positions.txt | awk -v OFS='\t' '{if($2 == "-") {$4 += (length($5)-1)} ; print $0}'
# reads processed: 107422570
# reads with at least one reported alignment: 35370418 (32.93%)
# reads that failed to align: 66794981 (62.18%)
# reads with alignments suppressed due to -m: 5257171 (4.89%)
Reported 35370418 alignments to 1 output stream(s)
I think the quality of reads using Solid is not as good as Solexa (i.e. more mismatches). I used the default 2 mismatch threshold.
I'm going to try to compare with BFAST someday once I figure out how to make proper masks for C. elegans. Bowtie is nice for those who have no idea how these programs work and are not computer science majors. The manual is very easy to understand and one can simply download the indices for commonly used genomes from the website.
-Clayton
Comment