Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jgSoton
    Member
    • Sep 2011
    • 12

    picard mark duplicates

    Hi,

    I'm using Picard to mark duplicates. This has worked fine for me previously on aglient50mb exomes using hg18. I've now updated to aglient51mb version4 and hg19, which also works fine.

    However, when I try to run the aglient50mb exomes aligned with novoalign on hg19. I get the output below. What does the Unknown Library line mean? ( I usually only get one "Library" line) and why do I not get a histogram??

    thanks for any help,
    Jane

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.sam.MarkDuplicates INPUT=S1_sorted.bam OUTPUT=S1_novoalign.bam METRICS_FILE=S1_metrics.out TMP_DIR=tmp2 VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Tue Oct 09 19:07:47 GMT 2012

    ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
    Unknown Library 560890 13587410 3485484 118573 1075343 0 0.081817 81249772
    Library 663607 14277689 3709621 149188 1179918 0 0.08587 81556300
  • fjrossello
    Member
    • Sep 2011
    • 30

    #2
    Originally posted by jgSoton View Post
    Hi,

    I'm using Picard to mark duplicates. This has worked fine for me previously on aglient50mb exomes using hg18. I've now updated to aglient51mb version4 and hg19, which also works fine.

    However, when I try to run the aglient50mb exomes aligned with novoalign on hg19. I get the output below. What does the Unknown Library line mean? ( I usually only get one "Library" line) and why do I not get a histogram??

    thanks for any help,
    Jane

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.sam.MarkDuplicates INPUT=S1_sorted.bam OUTPUT=S1_novoalign.bam METRICS_FILE=S1_metrics.out TMP_DIR=tmp2 VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Tue Oct 09 19:07:47 GMT 2012

    ## METRICS CLASS net.sf.picard.sam.DuplicationMetrics
    LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
    Unknown Library 560890 13587410 3485484 118573 1075343 0 0.081817 81249772
    Library 663607 14277689 3709621 149188 1179918 0 0.08587 81556300
    Hi Guys,

    Same issue with me when running a bam file resulting from merging 3 different samples.
    Any ideas?
    Thanks in advance.

    Cheers,

    Fernando

    Comment

    • fjrossello
      Member
      • Sep 2011
      • 30

      #3
      Hi Jane,

      As you could tell from my previous post, I had the same issue. I found a putative solution to your problem.
      I understand you have aligned your files using novoalign and I do not know if it creates the same problem as bowtie.
      As I said, I aligned my files using bowtie1 which I thought it added correctly read metadata such as library, platform and sample information. It looks OK if you check the RGs using samtools view -H yourbam.file. However if you check the group read by read by locating the Z tag in your bam - e. g., samtools view yourbam.file | less you will not be able to grab it.

      I solved this by replancing/adding the reads metadata using Picard's AddOrReplaceReadGroups (http://picard.sourceforge.net/comman...laceReadGroups).

      Please let me know if you need more help and if this solves your problem.

      Cheers,

      Fernando
      Last edited by fjrossello; 11-06-2012, 12:37 AM. Reason: typo/added info

      Comment

      • jgSoton
        Member
        • Sep 2011
        • 12

        #4
        Thanks Fernando,

        I just managed to work around this problem myself yesterday. It does seem to be a result of merging bam files in samtools and not being able to keep the readgroup info. the same for all reads.

        I have used the -r option in samtools merge (without specifiying a text file with the readgroups in). This seems to give me my metrics output from picard but I'm not sure what I'm doing to the readgroups?! Since I am not using GATK it doesn't matter to me too much, samtools mpileup still seems to give the correct sampleID in the *.vcf file.

        I think the picard option of add/replace readgroups would be a better solution. Thanks for your response.

        Jane

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 11:58 AM
        0 responses
        13 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        25 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        35 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        60 views
        0 reactions
        Last Post SEQadmin2  
        Working...