Bowtie, an ultrafast, memory-efficient, open source short read aligner

SillyPoint replied

01-20-2010, 11:25 AM
I'd just logged on here to post exactly the question acnoll poses above: "is it possible to have pairs with only one end mapping to the genome be included in the alignment file?"

The implication there, which after reading the manual and running Bowtie 0.12.1 I believe, is that only read pairs which both match, and fall within the -I/-X constraints, will be output. True?

The alternative for now is to specify the -a option to get all the mapped output, and post-process that to find what you're interested in, be that the best pair (for some definition of "best"), or reads where only one end matches.

To have the option to do that directly in Bowtie would be nice.

--TS
Leave a comment:
acnoll replied

01-18-2010, 02:18 PM
Option for output of pairs where only one end aligns

With bowtie's current set of options is it possible to have pairs with only one end mapping to the genome be included in the alignment file (e.g. sam file)? I am interested in identifying intra-read short indels through the
anchoring of one of a mate pair's ends.
Leave a comment:
xuying replied

01-13-2010, 07:04 AM
Hi Ben:
It seems I can't find a suitable place to put my csfastq file.
Here I just show some lines in the csfastq file generated from program "solid2fastq" of bfast. Do you think it is ok to go? Should I remove the first primer letter and 1st color to get a true base there?

@2292_469_84
T210002310010221002200330303002200201120221.2111.2.
+
8<;==:=@?=<<>>>;;??<=<;96:?:5<>;85:=7,,:5/",(/)"*"
@2292_469_216
T000111101020011320222113222200220200120202.2222.2.
+
/6=>=::>>=;==>;;6=;;9<6:8<(3:-<;/9:852=-7/"2(6)")"
@2292_469_274
T300101122322222232222222210222222222022220.2222.2.
+
,=#$$#@%#'#>$,&(;$*$*=)*'&6%,%##*,+#,4),#)",5'#","
Leave a comment:
xuying replied

01-12-2010, 09:12 PM
Oh, yes, sorry. I just confused the file with CIGAR notation.
Leave a comment:
Ben Langmead replied

01-12-2010, 04:14 PM
Originally posted by xuying View Post

There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)

Why? Given that M = "match or mismatch", when would you expect something other than 48M?

Ben
Leave a comment:
xuying replied

01-12-2010, 03:09 PM
Hi Ben:
I will put the csfastq (maybe part of it) later somewhere because it's huge.
And I am using bowtie 0.12.1 (but color index was built by using 0.12-beta).
There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)
Leave a comment:
Ben Langmead replied

01-12-2010, 06:19 AM
Hi xuying,

Thank you for the detailed report:

Originally posted by xuying View Post

I tried SAMtools on "base space" SAM files generated by aligning "color space" reads with Bowtie. Why I always get "A" in the 3rd column of the pileup file? It seems some kind of errors exists.

chr1 1185 g A 0 0 60 1 . !
chr1 1190 c A 0 0 60 1 . !
chr1 1191 t A 0 0 60 1 . !
chr1 1222 c A 0 0 60 2 .. !!
chr1 1231 t A 0 0 60 2 .. !!
chr1 1232 t A 0 0 60 2 .. !!
chr1 1509 c A 0 0 60 1 . !
chr1 1511 t A 0 0 60 1 . !
chr1 1512 t A 0 0 60 1 .$ !
chr1 1850 G A 0 0 60 3 ... !!!
chr1 2134 C A 0 0 60 1 . !

Is there any problem when calling SNPs from "base space" SAM file producted by aligning "color space" reads? Or the coverage is too low? I just want to try if bowtie -C and samtools works for calling SNPs in color space reads (I converted csfasta and qual files into csfastq file with "solid2fastq" in bfast).

If I understand correctly, that is very low coverage (~2), and the qualities of are also low. Can you send me the fastq file you're using? Also, are you using 0.12.1? Note that versions < 0.12.1 had an issue whereby Bowtie would fail to trim the first color from csfasta reads.

Originally posted by xuying View Post

some content in SAM file:

4_1246_1108 67 chr16 20648174 255 48M = 20650124 1998 CCTCTGGGTTTGTAGATTTGCCACTCTTAAGAGGCAAGGATTGACAGG OOQSQKOJJQECHAE?-=D82.893
'!/3!!0=2!!9F?)!!!=?'00 XA:i:0 MD:Z:48 NM:i:0 CM:i:8
4_1246_1108 131 chr16 20650125 255 48M = 20648173 -2000 AGTAAGTGGTCATCTATAAAGCAAAGACTGCCTGTGAAATAAATGGGA KEFJQTVSTVUSGIOSE2GJ!!@OO
SRNPJI@/ATF("/4:;@ACHG+ XA:i:1 MD:Z:48 NM:i:0 CM:i:3
4_1253_1656 179 chr17 66720558 255 48M = 66723289 2779 GACATGCTAAGGAAAGAGTGAAAATGGAGTCATATTAAAATGTTAAGT !&!!!!!:@N"'WSNNKHIMKORHA
DMQTTPOULFJMXUUZRRYXYIG XA:i:0 MD:Z:48 NM:i:0 CM:i:7
4_1253_1656 115 chr17 66723290 255 48M = 66720557 -2781 TAAAGAAATCTCCAGGCCCAAATGGTTTTACTTGTCAATTCTACCAAA !!!!/8NRJCGPULHBBPI@BLVND
AO[QNDK\NLTWVUVNNPNIGTL XA:i:0 MD:Z:48 NM:i:0 CM:i:3
4_1254_1557 67 chr1 40359166 255 48M = 40361009 1891 TACTGGACAACACAGTTCTAGTATGTAAGCTTTGAGAGAGCAGGGATT K??CGR>;JFHNL>@OA@GCF94<B
OF::;84=@C!!NML4/I;9&(A XA:i:0 MD:Z:48 NM:i:0 CM:i:3
4_1254_1557 131 chr1 40361010 255 48M = 40359165 -1893 CCTTTTTCTTGAATAATCTATTTCTTAGTATGTCTTAATTTACTAATA YTVXX[Y\^^^VJPZYMN[YRNLPV
NCKUWZUJLSD?;>IIA:FM!!! XA:i:0 MD:Z:48 NM:i:0 CM:i:2

all "48M" alignment? Mismatches should be reported Since I used "-C -q -n 2 -l 25 --snpfrac 0.001" to do the bowtie mapping. Can you help me identify my problem? Thanks a lot!

In CIGAR, "M" means "either match or mismatch". (See SAM paper). So that output is correct correct.

Thanks,
Ben
Leave a comment:
xuying replied

01-12-2010, 12:42 AM
Hi Ben:
I tried SAMtools on "base space" SAM files generated by aligning "color space" reads with Bowtie. Why I always get "A" in the 3rd column of the pileup file? It seems some kind of errors exists.

chr1 1185 g A 0 0 60 1 . !
chr1 1190 c A 0 0 60 1 . !
chr1 1191 t A 0 0 60 1 . !
chr1 1222 c A 0 0 60 2 .. !!
chr1 1231 t A 0 0 60 2 .. !!
chr1 1232 t A 0 0 60 2 .. !!
chr1 1509 c A 0 0 60 1 . !
chr1 1511 t A 0 0 60 1 . !
chr1 1512 t A 0 0 60 1 .$ !
chr1 1850 G A 0 0 60 3 ... !!!
chr1 2134 C A 0 0 60 1 . !

Is there any problem when calling SNPs from "base space" SAM file producted by aligning "color space" reads? Or the coverage is too low? I just want to try if bowtie -C and samtools works for calling SNPs in color space reads (I converted csfasta and qual files into csfastq file with "solid2fastq" in bfast).

some content in SAM file:

4_1246_1108 67 chr16 20648174 255 48M = 20650124 1998 CCTCTGGGTTTGTAGATTTGCCACTCTTAAGAGGCAAGGATTGACAGG OOQSQKOJJQECHAE?-=D82.893
'!/3!!0=2!!9F?)!!!=?'00 XA:i:0 MD:Z:48 NM:i:0 CM:i:8
4_1246_1108 131 chr16 20650125 255 48M = 20648173 -2000 AGTAAGTGGTCATCTATAAAGCAAAGACTGCCTGTGAAATAAATGGGA KEFJQTVSTVUSGIOSE2GJ!!@OO
SRNPJI@/ATF("/4:;@ACHG+ XA:i:1 MD:Z:48 NM:i:0 CM:i:3
4_1253_1656 179 chr17 66720558 255 48M = 66723289 2779 GACATGCTAAGGAAAGAGTGAAAATGGAGTCATATTAAAATGTTAAGT !&!!!!!:@N"'WSNNKHIMKORHA
DMQTTPOULFJMXUUZRRYXYIG XA:i:0 MD:Z:48 NM:i:0 CM:i:7
4_1253_1656 115 chr17 66723290 255 48M = 66720557 -2781 TAAAGAAATCTCCAGGCCCAAATGGTTTTACTTGTCAATTCTACCAAA !!!!/8NRJCGPULHBBPI@BLVND
AO[QNDK\NLTWVUVNNPNIGTL XA:i:0 MD:Z:48 NM:i:0 CM:i:3
4_1254_1557 67 chr1 40359166 255 48M = 40361009 1891 TACTGGACAACACAGTTCTAGTATGTAAGCTTTGAGAGAGCAGGGATT K??CGR>;JFHNL>@OA@GCF94<B
OF::;84=@C!!NML4/I;9&(A XA:i:0 MD:Z:48 NM:i:0 CM:i:3
4_1254_1557 131 chr1 40361010 255 48M = 40359165 -1893 CCTTTTTCTTGAATAATCTATTTCTTAGTATGTCTTAATTTACTAATA YTVXX[Y\^^^VJPZYMN[YRNLPV
NCKUWZUJLSD?;>IIA:FM!!! XA:i:0 MD:Z:48 NM:i:0 CM:i:2

all "48M" alignment? Mismatches should be reported Since I used "-C -q -n 2 -l 25 --snpfrac 0.001" to do the bowtie mapping. Can you help me identify my problem? Thanks a lot!

Last edited by xuying; 01-12-2010, 03:04 AM.
Leave a comment:
Ben Langmead replied

01-07-2010, 05:44 AM
Yes, it should be usable by tools (like samtools) that call SNPs from .sam files.

Thanks
Ben
Leave a comment:
xuying replied

01-06-2010, 08:59 PM
Hi Ben Langmead.
Can the resulted .SAM file in "base space" by mapping "color space" reads be used for SNP calling (samtools) or other tools that can be used for dealing with Solexa data? Thanks!
Leave a comment:

Ben Langmead replied

12-07-2009, 09:13 AM

Originally posted by Xi Wang View Post

Code:

Read1 16      chr1    7947971 255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:50 NM:i:0
Read1 16      chr12   48275260        255     50M     *       0       0       ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG      IIIIIIIIIIII
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:12A7T29    NM:i:2

Hi Xi,

In -n mode, the "stratum" referred to by --strata is the number of mismatches in the seed. The seed length is set with -l. In your case, the seed doesn't extend to those mismatches.

Thanks,
Ben

Leave a comment:

bioinfosm replied

12-07-2009, 09:05 AM
I think that is to do with the seed length. For your seed length, are both reads equally good hits!

Originally posted by Xi Wang View Post

Hi,

I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below:

Code:

Read1 16 chr1 7947971 255 50M * 0 0 ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG IIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:50 NM:i:0 Read1 16 chr12 48275260 255 50M * 0 0 ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG IIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:12A7T29 NM:i:2

The result shows that there are two hits for this read: one hits to chr1 (where the sequence from) perfectly, and the other hits to chr12 with 2 mismatches. However, my expectation is to make bowtie only report the best hit (namely the hit to chr1) by using the options "-a --best --strata". Why I get this weird result?
Thanks in advance.
--
Xi
Leave a comment:
Xi Wang replied

12-05-2009, 01:06 AM
Hi,

I am confused by the bowtie options again. I used the options "-a --best --strata", but got a result as below:

Code:

Read1 16 chr1 7947971 255 50M * 0 0 ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG IIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:50 NM:i:0 Read1 16 chr12 48275260 255 50M * 0 0 ATTAAGGTCACCGTTGCAGGCCTGGCTGGAAAAGACCCAGTACAGTGTAG IIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:12A7T29 NM:i:2

The result shows that there are two hits for this read: one hits to chr1 (where the sequence from) perfectly, and the other hits to chr12 with 2 mismatches. However, my expectation is to make bowtie only report the best hit (namely the hit to chr1) by using the options "-a --best --strata". Why I get this weird result?
Thanks in advance.
--
Xi
Leave a comment:
Ben Langmead replied

12-01-2009, 05:47 AM
Originally posted by bioinfosm View Post

When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?

And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp?

HTML Code:

$ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Seeded quality full-index search: 00:00:00 Reported 1 alignments to 1 output stream(s) Time searching: 00:00:00 Overall time: 00:00:00 $ cat ww HWI-E4:1:87:1633:1127#0/1 - Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B 0 10:G>T,18:C>G,27:T>A

Hi bioinfosm,

Try using the --maxbts or -y options to increase the amount of searching effort put in by Bowtie. Note that -n 2 and -n 3 modes are not fully fully sensitive by default to avoid excessive backtracking (see manual section on Maq-like alignment).

That alignment does have 3 mismatches in the seed (at 0-based offsets 10, 18 and 27 from the 5' end).

Hope that helps,
Ben
Leave a comment:

bioinfosm replied

11-30-2009, 02:46 PM

When I limited my reference sequence to the blat hit region, I got the hit with 3 mis-matches, however, not before I increased the -e option to -e 80. Why would I not get this hit previously, when I used -a -e 90 to report all hits?

And why do I have to do -n 3, when the seed length by default is 28, and there are no more than 2 mis-matches in 28bp?

HTML Code:

$ /home/m049157/build/bowtie-0.10.0/bowtie --best -p 4 -t -n 3 -e 80 -a www w ww
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 00:00:00
Reported 1 alignments to 1 output stream(s)
Time searching: 00:00:00
Overall time: 00:00:00
$ cat ww
HWI-E4:1:87:1633:1127#0/1       -       Zv7_scaffold910 5660144 AGTCTGCTTTTCCATATAAAACTGAGAAGAAGAGACTGCAGCCTTGAACAAACTTGGGAAGTCTTAACTTACACG     %%%%%%3=A;/-(8990(8<:9)<6:@,.4<A?A;28@24B/+<?B@4=BA><?@BBBBA@?70>@@=?@?724B       0       10:G>T,18:C>G,27:T>A

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News