Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie maxbts

    Hi,

    I understand that Bowtie offers the 'maxbts' and 'pairtries' parameters to control the trade-off between processing time and comprehensive search.

    The following scenario is confusing me: if I map both ends of an ambiguous pair to hg19 independently, setting maxbts to 10billion, I get back 12k hits for end #1 and 14k hits for end #2.

    However, if I map the pair as a pair, using 'tryhard' to give a high 'pairtries', I end up getting back pairings where one of the ends maps to a location that wasn't picked up in the original independent search.

    My question is: what would cause the independent search to miss matches that the paired search picks up? I presumed that memory and maxbts are the only limiting factors?

    Thanks,
    Bio.X2Y

    Inputs:

    File: dodgy_input_1
    @HWI-EAS283:2:1:810:1386#0/1
    GGGAGTTCGAGACCAGCCTGACCAACATGGAGAAACCCTG
    +HWI-EAS283:2:1:810:1386#0/1
    ]bZ``aaabbbbb`bX`_babbaa`abb`aaaaaa``aaa

    File: dodgy_input_2
    @HWI-EAS283:2:1:810:1386#0/2
    CCCGGGTTCAACCAATTCTCCTGCCTCAGCCTCCTGAGTA
    +HWI-EAS283:2:1:810:1386#0/2
    a_a_Zbababaa`bbbbababb`babbb]_aaaabbabab

    Independent Search:

    time /Volumes/Thymine/bowtie-0.12.5/bowtie \
    -t \
    -n 2 \
    -m 100000 \
    -a \
    -p 4 \
    --maxbts 1000000000 \
    --best \
    --solexa1.3-quals \
    --chunkmbs 128 \
    hg19 \
    dodgy_input_1,dodgy_input_2 \
    --un dodgy_pair_independent_unmatched.bowtie \
    --max dodgy_pair_independent_ambiguous.bowtie \
    dodgy_pair_independent_matched.bowtie \
    &> dodgy_pair_independent_console.out

    Paired Search:

    time /Volumes/Thymine/bowtie-0.12.5/bowtie \
    -t \
    -n 2 \
    -m 1000 \
    -a \
    -p 4 \
    --best \
    --tryhard \
    --solexa1.3-quals \
    --chunkmbs 128 \
    hg19 \
    -1 dodgy_input_1 \
    -2 dodgy_input_2 \
    --un dodgy_pair_pe_unmatched.bowtie \
    --max dodgy_pair_pe_ambiguous.bowtie \
    dodgy_pair_pe_matched.bowtie \
    &> dodgy_pair_pe_console.out

    Results:

    For the independent search, I get 12,637 matches to end #1 and 13,946 matches to end #2.

    For the paired search, I get 919 pairs (although I expected 393 since that many pair combinations from the original results can be combined to form pairs that satisfy -I and -X).

    Example of result pair that has a mapping for end #2 that did not appear in the independent search:
    HWI-EAS283:2:1:810:1386#0/1 + chr11 107533126 GGGAGTTCGAGACCAGCCTGACCAACATGGAGAAACCCTG >C;AABBBCCCCCAC9A@CBCCBBABCCABBBBBBAABBB 344
    HWI-EAS283:2:1:810:1386#0/2 - chr11 107533223 TACTCAGGAGGCTGAGGCAGGAGAATTGGTTGAACCCGGG CBCBCCBBBB@>CCCBCACCBCBCCCCABBCBCBC;@B@B 344 11:C>G,13:C>T

  • #2
    Hi,

    Sorry for bumping this, but I'd still be very interested if anyone has any thoughts on this?

    To summarise, I'm finding that if I use Bowtie to map both ends of my paired-end reads independently, it fails to find some matches that it does find when I map them as a pair.

    I can't find anything in the manual that would explain this behaviour, and I'd like to get a feel for why it's happening,

    Thanks for your time,
    Bio

    Comment


    • #3
      Bio,

      I'm a little confused by the question, but I think the answer is that when you have a pair of mates and one mate lies in a repeat (repeated more times than your -m threshold, say) and the other lies in unique sequence, Bowtie can resolve the "true" location of the repetitive mate. In that way, a mate whose alignments would otherwise be suppressed by -m in unpaired mode might be reported in paired mode.

      Hope that helps,
      Ben

      Comment


      • #4
        Hi Ben,

        Thanks for the reply, but I think what I'm seeing is something slightly different.

        To summarise the above, I independently mapped both ends of a single pair using "-m 100000", and got 12,637 matches to end #1 and 13,946 matches to end #2. Since I used tryhard, it implies that neither read has more than 100,000 matches (because if it did, it would be suppressed by -m). It also implies that I should be seeing all the matches.

        When I mapped them as a pair, therefore, I didn't expect to see any hits for either end that did not appear in the original search.

        Does this mean that even when I use "-m" and "tryhard", I might still be missing out on a lot of matches?

        Thanks again,
        Bio

        Comment


        • #5
          Originally posted by Bio.X2Y View Post
          When I mapped them as a pair, therefore, I didn't expect to see any hits for either end that did not appear in the original search.
          This is the part I would dispute. You do expect to "recover" some alignments that were suppressed in the original search.

          Hope that helps,
          Ben

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          31 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          27 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X