Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • calatian
    Junior Member
    • Mar 2014
    • 5

    Introduction and request for BWA information

    Hello,

    I am a new student of Bioinformatics from Seattle and so far it's a fascinating field. I am starting to work on a school project and are still a little lost, since this is my first contact with the field and the tools.

    As part of the project, I would like to test BWA with a genome (it does not have to be as long as the human one, something smaller and easier to work with would be great) and reads of different lengths/error rates. The goal of the test would be to see how accurate BWA is when sequencing reads of different lengths and with different error rates, and how its performance is degraded as the length of the reads grows.

    I have the Windows versions of BWA and SAMtools from Codeplex, as recommended in a different thread.

    My question is, where can I find data to test BWA as mentioned above? How could I test different lengths/error rates? Any quick, general instructions on how to start would be greatly appreciated.

    Thanks again, it's a pleasure to be here.
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    The best way - in fact, I would say, the only way - to test how accurate an aligner is, would be with synthetic data. The e.coli reference would be nice for this, just download that file and rename it to ecoli.fa (alternately you could just use human chromosome 21).

    If you download BBMap, you can generate random reads like this:

    randomreads.sh -Xmx1g ref=ecoli.fa build=1 out=reads.fq maxq=10 minq=10 len=100 reads=100000

    That will generate 100000 reads of 100bp length, all of quality 10 (meaning 10% chance of error per base - quality 20 is 1%, quality 30 is 0.1%, etc). They will be randomly distributed around the e.coli genome, and every read will have a header indicating its genomic origin. You can also add insertions and deletions with other flags, like "delrate=0.5 maxdellen=20 maxdels=3" which would put deletions in 50% of the reads, of length 1 to 20, and up to 3 deletions per read - specifically, a 50% chance of 1+ deletions, a 25% chance of 2+ deletions, and a 25% chance of 3 deletions.

    After you map with an aligner, you will get a sam file. You can evaluate it like this:

    gradesam.sh in=mapped.sam reads=100000

    This will give you the true positive, false positive, and false negative mapping rates, both strict (requiring both read ends to map back to the exact origin) and loose (requiring at least 1 end to map back to within 20bp of the origin), as well as rate of ambiguous mapping.

    P.S. If you want to do everything in Windows, the shellscripts won't work. You have to have Java installed, and run the programs like this:

    java -Xmx1g -cp path/to/bbmap/current align2.RandomReads3 ref=ecoli.fa build=1 out=reads.fq maxq=10 minq=10 len=100 reads=100000

    and

    java -Xmx1g -cp path/to/bbmap/current align2.GradeSamFile in=mapped.sam reads=100000

    BBMap also runs in Windows. You can run it like this:

    java -Xmx1g -cp path/to/bbmap/current align2.BBMap ref=ecoli.fa in=reads.fq out=mapped.sam
    Last edited by Brian Bushnell; 03-19-2014, 10:21 PM.

    Comment

    • calatian
      Junior Member
      • Mar 2014
      • 5

      #3
      Originally posted by Brian Bushnell View Post
      The best way - in fact, I would say, the only way - to test how accurate an aligner is, would be with synthetic data. The e.coli reference would be nice for this, just download that file and rename it to ecoli.fa (alternately you could just use human chromosome 21).
      Brian,

      Thank you so much for such a thorough and clear response. This is exactly the kind of direction I needed (and much more than I expected). I will try it out right away. Thanks again!

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Originally posted by calatian View Post
        Brian,

        Thank you so much for such a thorough and clear response. This is exactly the kind of direction I needed (and much more than I expected). I will try it out right away. Thanks again!
        You're welcome

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        16 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        34 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        37 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        24 views
        0 reactions
        Last Post SEQadmin2  
        Working...