Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GAII low number of mapped reads

    Hi everyone,

    I tried a rather ambitious experiment in which I tried barcoding several samples of human DNA using a homemade barcodes, target selecting for a few genes by microarray followed by sequencing on the illumina GAII. I used 100bp paired end reads with an index cycle. I could parse my barcodes just fine but when I tried mapping my reads, I got a very low number that mapped back to the human genome (60%) and only 25% to my targeted region. I tried using both ELAND and BWA default settings for paired end reads (actually I added the -q15 in BWA). Is there anything I can do to "salvage" this experiment? Are there different parameters in BWA and Illumina that I could try or is my read quality just that bad. What is odd is that when I look at the quality score of my reads, I don't think they are that bad so I'm confused as to why so few would map back. Any help would be greatly appreciated!!

    Cheers,
    Ali

  • #2
    Have you done any QC on your data to see if there are obvious biases or quality problems?

    Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.

    Comment


    • #3
      Originally posted by simonandrews View Post
      Have you done any QC on your data to see if there are obvious biases or quality problems?

      Have you trimmed adapters off your reads? At 100bp you might be getting a reasonable portion of your library reading through into adapter, and this will mess up your ability to map your reads.
      I've looked with FastQC and it does seem that my quality score begins to drop off toward the middle of the read. Trimming by quality score in BWA does help but I still have a lot that don't map. My guess is that I have a library prep issue?

      Comment


      • #4
        If you have decent quality reads then if they're failing to map that's going to be due to one of:
        1. Your library is contaminated with DNA from a different source (Ecoli etc)
        2. Your library is partially contaminated with adapters or some part of your vector
        3. Your sequences come from repetitive sequence which doesn't allow them to map uniquely


        You say you're getting 60% of your reads mapping, so the library isn't a complete disaster, so it's just a case of figuring out where the rest went.

        If you have a contamination from another DNA source you could try to screen for it. We routinely put all of our libraries through a screen to see if they contain what they should.

        If you have partial conatmination with adapter or improperly removed barcodes then you should see this in your FastQC reports. Such biases would show up either in the per-base sequence content plot or the Kmer plots. Any non-insert sequence still in your library would mess up your mapping efficiency.

        If your sequences aren't mapping uniquely - but could map well in many places then you should be able to alter your mapping parameters to see this. I don't use BWA personally but I'm sure there will be an option to return a hit even if a sequence could have mapped in many places with high identity. This won't necessarily help your downstream analysis, but it will at least let you know why your sequences wouldn't map.

        If all else fails what we've done before is to remove from our library all of the sequences which we were able to map successfully and then do an assembly of whatever is left (we used velvet). This has worked well for us on a couple of occasions to identify sources of contamination which we'd been unable to identify in any other way.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 07:23 AM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-17-2024, 06:54 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-14-2024, 07:24 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-13-2024, 08:58 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Working...
        X