Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xi Wang
    replied
    Originally posted by Ben Langmead View Post
    Hi Xi,

    to consider qualities, use -n/-l/-e.
    Thanks, Ben.
    I am still wondering whether the seed region is defined only for counting the mismatches or not. If I want to just use the quality score criterion, and set -l equal to 0, does it work?

    Best wishes,
    Xi

    Leave a comment:


  • Layla
    replied
    comparable parameters with maq

    Hi Ben,

    Excellent work with Bowtie - looking forward to cutting down data processing time. Working on a project in which I have used maq, but for subsequent paired end medip-seq of 45 bases I want to use Bowtie and parameters as close to maq as possible.

    Using maq I eliminate reads with a maq quality < 10 (the same read mapped to >1 location and hence ambiguous) and output to another file.
    I also keep only those flags 18 and 130 (correctly paired reads).
    Using ad-hoc script I only keep one hit if the same read is mapped to the same start and stop location multiple times (pcr bias)

    I'd like to create the same criteria using bowtie. Could you advise me? To begin with, the default in bowtie is good - 2MM in 28 base seed region with sum of e 70

    thank you

    Layla

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by lindseyjane View Post
    I cannot see an option for reporting an alignment for a read when its mate does not map? Is this possible?
    Your best bet is to run Bowtie in paired-end mode while using --un to dump unaligned reads to files. Then run again in unpaired mode using the unaligned reads as input.

    Let me know if that doesn't solve your problem.

    Thanks,
    Ben

    Leave a comment:


  • Ben Langmead
    replied
    Hi Xi,

    Originally posted by Xi Wang View Post
    I noticed that there are two parameters related to this issue. First, -n/--seedmms <int> indicates the maximum mismatches in seed, meaning that if a hit with greater than the mismatch cutoff it will not be reported by Bowtie. And second, -e/--maqerr <int> indicates the maximum sum of quality scores allowed at the mismatched bases (is it right?). However, I don't know whether the two criteria are the same or complemental.
    They're complementary. If either limit is exceeded, the alignment is invalid.

    Originally posted by Xi Wang View Post
    Further, the two measurements of mismatches are both counted in seed region. Even though the users can specify the seed length, I am wondering where does the seed locate: from the leftmost of a query (read) or a random region in the query.
    From the leftmost end of the read. -e applies to the entire alignment, not just the seed, exactly as in Maq.

    Originally posted by Xi Wang View Post
    Besides, there is another parameter -v <int>, which takes care the end-to-end mismatches, but does not consider the quality scores. Is it possible to make this consider the quality scores?
    No; to consider qualities, use -n/-l/-e.

    Thanks,
    Ben

    Leave a comment:


  • lindseyjane
    replied
    Question regarding bwt paired end alignment

    I am currently trying to aligned paired end Illumina reads using bowtie and I want to compare the results to those from maq.

    I cannot see an option for reporting an alignment for a read when its mate does not map? Is this possible?

    The maq software still reports alignments for a read even if its mate does not map and I wanted to do the same thing with bowtie. A lot of pairs end up unaligned (significantly more than with maq) if this is not possible.

    If any one knows hows to do this I would really appreciate it, thanks.

    Leave a comment:


  • Xi Wang
    replied
    Hi Ben,

    I am confused how Bowtie deals with the quality scores when counting mismatches.

    I noticed that there are two parameters related to this issue. First, -n/--seedmms <int> indicates the maximum mismatches in seed, meaning that if a hit with greater than the mismatch cutoff it will not be reported by Bowtie. And second, -e/--maqerr <int> indicates the maximum sum of quality scores allowed at the mismatched bases (is it right?). However, I don't know whether the two criteria are the same or complemental.

    Further, the two measurements of mismatches are both counted in seed region. Even though the users can specify the seed length, I am wondering where does the seed locate: from the leftmost of a query (read) or a random region in the query.

    Besides, there is another parameter -v <int>, which takes care the end-to-end mismatches, but does not consider the quality scores. Is it possible to make this consider the quality scores?

    Best regards!
    Xi

    Leave a comment:


  • axiom7
    replied
    Originally posted by axiom7 View Post
    Hi Ben,

    I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

    Thanks.
    Susan
    Sorry, I meant using -e 210 to simulate - not -3 210.

    Susan

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by amaer View Post
    What is the update on Bowtie doing gapped alignments?
    Hi amaer,

    Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

    Thanks,
    Ben

    Leave a comment:


  • amaer
    replied
    Hi Ben,

    What is the update on Bowtie doing gapped alignments?

    Thanks!

    Leave a comment:


  • Ben Langmead
    replied
    Hi Susan,

    Originally posted by axiom7 View Post
    I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.
    I'm glad! As I say, if the Pollinator can (eventually) be made to give you a longer stretch of unambiguous bases before the NNNNN gap, then you can bump -l up accordingly and performance should improve quite a bit.

    Thanks,
    Ben

    Leave a comment:


  • Ben Langmead
    replied
    Originally posted by para_seq View Post
    The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.
    Hi para_seq,

    The bias you see may or may not be due to alignment. Bowtie does have options that seek to remove strand bias, e.g. the --best option. If you still see the bias using --best, then the bias is probably inherent in your reads.

    Hope that helps,
    Ben

    Leave a comment:


  • axiom7
    replied
    Hi Ben,

    I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

    Thanks.
    Susan

    Leave a comment:


  • para_seq
    replied
    Hi, Ben,

    I got a question with using Bowtie to map Illumina transcriptom reads to a prokaryote genome. There are two copies of the identical gene encoded by '+' and '-' strands. What I don't understand is that both copies are able to be mapped with a large number of unique RNA_seq reads (value = 0 in column 7 of the bowie output) in both '+' and '-' orientations. The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

    Anything I did was wrong?
    Please help me to clarify my understanding. Thank you.

    Leave a comment:


  • axiom7
    replied
    Ben and sparks,

    Thanks for all the input. I will be working on this today and will respond back to you.

    Susan

    Leave a comment:


  • Ben Langmead
    replied
    Hi Susan,

    Originally posted by axiom7 View Post
    I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

    bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

    This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.
    Yes, I agree that this should work. And I agree that, because of the NNs, you are effectively asking for alignments with 1 additional mismatch.

    Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
    Thank you.
    Susan
    The answer to whether you're asking bowtie to do something it was not designed to do is "yes" . But it is definitely still possible to use Bowtie. My suggestion would be to use, for instance -n 1 -l X -e Y, where -l X is set so that the "seed" falls just short of the string of Ns, and -e Y is set according to the number of Ns + the number of mismatches you would like to allow beyond the Ns. (Your input is fasta, so every mismatch incurs a quality penalty of 30. So for 6 Ns + 1 mismatch, -e 210 is appropriate.) Here is an example where I align a read of the format you describe to the human genome:

    Code:
    sycamore:~/research/bowtie $ cat tmp.fa
    >r
    CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT
    sycamore:~/research/bowtie $ ./bowtie --best -n 1 -l 13 -e 210 -f /fs/szasmg/langmead/ebwts/h_sapiens_asm tmp.fa
    r	+	gi|89161187|ref|NC_000010.9|NC_000010	135373946	CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	5:G>T,13:G>N,14:C>N,15:G>N,16:A>N,17:A>N,18:G>N
    # reads processed: 1
    # reads with at least one reported alignment: 1 (100.00%)
    # reads that failed to align: 0 (0.00%)
    Reported 1 alignments to 1 output stream(s)
    That set of parameters is designed to effectively allow 1 mismatch beyond the mismatches forced by the Ns, as you can see in the above alignment.

    It's worth noting that if you can (eventually) get the Polonater to give you an anchor of, say, 20bp instead of 13bp, bowtie run in this mode will be substantially faster.

    I hope that's helpful; if it's still unclear, please feel free to email me.

    Thanks,
    Ben

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Working...
X