Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard Markduplicates - High number of unmatched pairs :(

    Hi everyone

    I've run into a predicament lately, which I'm hoping to gather advice on. We've done paired end illumina whole genome sequencing on a human sample.

    I have 4 lanes of data for a sample, and 2 fastq files per lane (reads1fastq and reads2fastq) of which I split into 8-9 files of 10million reads each. When splitting the files, I made sure to split by 10million*4 lines, and the script I wrote compares the first line of each split reads1 fastq file to it's corresponding split reads2 fastq file. I then align each fastq file with BWA (trimming reads down with the parameter q=30) to create a .sai file and then generate alignments/bam files for each pair of split reads1/reads2 files. I then sort and index the small bam files and then merge them into one large bam file - of which I'm trying to run markduplicates on.

    One thing I've noticed is markduplicates is telling me I have a ridiculously high number of unmatched pairs. I ran markduplicates on the smaller bam files too, and the same is true. For example, for one of the smaller bam files:

    INFO 2012-07-19 13:20:48 MarkDuplicates Read 37393628 records. 28252673 pairs never matched.

    Now I'm relatively new to this whole NGS world of data analysis, but I can't imagine having such a high number of unmatched pairs is a good thing.

    Does anyone have any advice, or has encountered a similar problem? I'm wondering if I did something wrong with splitting and trimming/aligning split fastq files?

    I should note that this DNA was extracted from FFPE tissue so it will be of lower quality than the DNA you guys are used to working with. But I want to make sure this is not a technical error on my part before blaming DNA quality.

    Thanks!

  • #2
    Did you let BWA align them as paired-reads?
    Or did you align each of the mates separately?

    Comment


    • #3
      I have the same problem. Did you find a solution?
      Will I have the same result when I align 1 big file and split it, align the split files and merge them ?

      Comment


      • #4
        I have the same problem, too.
        It is a problem with picard MarkDuplicates, though, because when I run samtools flagstat the pairs appear properly matched. Did you find a solution?
        Thanks!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 07:03 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        37 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Working...
        X