Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • large BAM, but very small mpileup file

    hi all,
    I applied samtools mpileup to 7 exom-seq samples(human), whose bam biles were generated using BWA. Since I used a for loop to process the samples, the output should be similar. However, for one of the samples, the mpileup file contains only ~1000 lines, with a few lines for each chromosome. Other samples' mpileup files look good with many many more lines.

    I used samtools flagstat to check the BAM file and found that
    very few reads (only 34) were properly paired. I wonder if this is the reason that cause the mpileup problem. More important, does that show something wrong for the library preparation in the sequencing experiment?

    95045628 + 0 in total (QC-passed reads + QC-failed reads)
    31755322 + 0 duplicates
    95045628 + 0 mapped (100.00%:nan%)
    95045628 + 0 paired in sequencing
    47522831 + 0 read1
    47522797 + 0 read2
    34 + 0 properly paired (0.00%:nan%)
    ..

    In contrast, the properly paired reads are many in other samples, e.g.,:
    120529538 + 0 in total (QC-passed reads + QC-failed reads)
    27401894 + 0 duplicates
    120529538 + 0 mapped (100.00%:nan%)
    120529538 + 0 paired in sequencing
    60469618 + 0 read1
    60059920 + 0 read2
    119251912 + 0 properly paired (98.94%:nan%)



    (In all BAM files, I removed unmapped reads, so do not be surprised that mapping rate is 100%.)
    Last edited by mrfox; 10-16-2012, 09:23 PM.

  • #2
    Originally posted by mrfox View Post
    hi all,

    I used samtools flagstat to check the BAM file and found that
    very few reads (only 34) were properly paired. I wonder if this is the reason that cause the mpileup problem.
    Possibly. But even if it is not the cause of the mpileup problem, the lack of pairing is indicative of a more basic problem that needs to be solved first.

    More important, does that show something wrong for the library preparation in the sequencing experiment?
    Likely. You really should dig deeper into the data so that you can tell the lab prep people what went wrong. My gut feeling is that you have just a handful of different fragments that were amplified and are thus suffering from a lack of complexity. But it also could be that many of the fragments were degraded to a point where they map but do not pair. Or perhaps, similar to the first idea, perhaps you just sequenced highly repetitive areas; these can be mapped but pairing would be questionable. Or ... well, dig in and let us know!

    Comment


    • #3
      Thanks for the hints Westerman. I loaded two BAM files to IGV, the upper is for a good sample G, the majority of its reads were properly paired, and the lower is for the bad sample B. The alignments were colored by pairing orientation. The region is a segment of chrM.



      The observation is that 1) the coverages of the two samples are similar, but 2) nearly no reads were properly paired in the bad sample. Actually if I move the mouse to an individual read, mostly likely I found "insert size = 0" and "Pair orientation=R1R2" or F1F2/F2F1, which looks weird.

      So how should we interprete the observation? How come the reads were not paired? Is the problem in library preparation?

      Comment


      • #4
        Originally posted by mrfox View Post

        The observation is that 1) the coverages of the two samples are similar, but 2) nearly no reads were properly paired in the bad sample. Actually if I move the mouse to an individual read, mostly likely I found "insert size = 0" and "Pair orientation=R1R2" or F1F2/F2F1, which looks weird.

        So how should we interprete the observation? How come the reads were not paired? Is the problem in library preparation?
        Is it possible that when making the .bam, you accidentally used read 1 twice, instead of read 1 and read 2? That would explain the insert sizes of 0, and both reads in the same direction.

        Comment


        • #5
          I'll agree with swbarnes -- probably your analysis was wrong. Alternatively the two files are the same; e.g R1 was copied to R2 or vice-versa. Other possibility is that you have an R1 from one sample and an R2 from another.

          Comment


          • #6
            I also realized this problem: I went back to check the bam files created half a year ago and found that indeed R2 was replaced by R1 by mistake. --I should have checked everything from the very beginning.Now the problem was solved. Thank you all for your help!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin



              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 07:24 AM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 08:58 AM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-12-2024, 02:20 PM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-07-2024, 06:58 AM
            0 responses
            184 views
            0 likes
            Last Post seqadmin  
            Working...
            X