Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tdoniger
    Member
    • Nov 2010
    • 13

    Picard MarkDuplicates - whole bam file marked as duplicates

    Hi,

    I ran an RNA-Seq data set using the RUM-pipeline to align the data. I tried to use the Picard's MarkDuplicates - and it tagged very single read in the bam files as a duplicate.
    I used the following parameters for MarkDuplicates:
    java -jar /private/software/packages/picard-tools-1.84/MarkDuplicates.jar I=RUM-sorted.bam O=RUM-sorted-dups_marked.bam METRICS_FILE=dups_metrics AS=true VALIDATION_STRINGENCY=SILENT

    (I had sorted the bam using samtools- but it was not recognized by MarkDuplicates that is why I used AS=true)

    In the dups_metrics file it lists the percent_duplication at 43%

    Any ideas?

    Thanks!
    Tirza
    --
    Tirza Doniger, Ph.D.
    Bioinformatics Unit
    The Mina and Everard Faculty of Life Sciences
    Bar Ilan University
  • whataBamBam
    Member
    • May 2013
    • 27

    #2
    Hey tdoniger

    Did you ever find the solution to your problem? I have the same problem and my metrics file only claims one percent!

    I'm currently running 3 different solutions to the problem and i'm waiting for the batches to finish..

    - use samtools rmdup instead
    - add VALIDATION_STRINGENCY=LENIENT to my command string for MarkDuplicates
    - don't bother to mark duplicates (I only did this becuase GATK requires it - my other pipeline will run happily without this step) and try to trick GATK into accepting my file by adding an @PG line into my header to say I ran MarkDuplicates (yeah I know this is probably not recommeneded I just thought I'd see what happened) i.e. use samtols reheader to take the header from the file that MarkDuplicates marked ever read as a duplicate in and put it onto the file I wuld of used as input.

    My command was

    java -Xmx2G -jar MarkDuplicates.jar INPUT=infile.sorted.bam OUTPUT=outfile.sorted.dedupe.bam METRICS_FILE=myMetricsFile

    I'd also previoulsy sorted the bam using samtools

    Comment

    • tdoniger
      Member
      • Nov 2010
      • 13

      #3
      I thought because every line contained: PG:Z:MarkDuplicates - that these were the reads marked as duplicates. This is not the case. It is the flag set in the second column that indicates whether it is a duplicate or not.

      try: samtools flagstat library_no_dups.bam
      You can find the flags that represent the duplicates. See- http://picard.sourceforge.net/explain-flags.html
      --
      Tirza Doniger, Ph.D.
      Bioinformatics Unit
      The Mina and Everard Faculty of Life Sciences
      Bar Ilan University

      Comment

      • whataBamBam
        Member
        • May 2013
        • 27

        #4
        I solved this and now I can't remember how. Really slack of me not to come back and post the solution but tommorow I'll check my pipelines and have a look

        Comment

        • tdoniger
          Member
          • Nov 2010
          • 13

          #5
          Hi,

          Thanks! But I was trying to explain that I managed to solve it. I had thought that every line marked by "PG:Z:MarkDuplicates" was a duplicate, but really this was not the case. The duplicates are marked in the flag in the second column of the same file.

          Best,
          Tirza
          --
          Tirza Doniger, Ph.D.
          Bioinformatics Unit
          The Mina and Everard Faculty of Life Sciences
          Bar Ilan University

          Comment

          • TonyBrooks
            Senior Member
            • Jun 2009
            • 303

            #6
            Going off piste here, but 43% duplication is pretty high.
            How much PCR did you do?

            Comment

            • tdoniger
              Member
              • Nov 2010
              • 13

              #7
              quite a bit. very little starting material
              --
              Tirza Doniger, Ph.D.
              Bioinformatics Unit
              The Mina and Everard Faculty of Life Sciences
              Bar Ilan University

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              100 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              121 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              114 views
              0 reactions
              Last Post SEQadmin2  
              Working...