Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard fix mate information failed with bwa mem result

    Hello,

    I used the newest version of bwa mem (V0.7.12-r1039) to run alignment of pair-end sequencing data. The alignment is fine. However, since I added in read group information, and this was included into the command. So in the final sam file, at @PG line, it contains command line information (CL tag), it broke the SAM specification, which some tags in @RG lines show up in @PG lines because the command contains read group information. When I used picard to fix mate information, picard complained about this. Does anybody has some ways to work around this? Thanks!

  • #2
    Can you show the relevant part of the header (ideally with a couple lines on either side)?

    Comment


    • #3
      Thanks!

      The line which causes problem is:

      @PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem -t 10 -R @RG ID:Sample1 CN:SequencingCenter DS:Project1 DT:2015-02-09 PL:ILLUMINA -M human_g1k_v37_decoy.fasta read1.fastq.gz read2.fastq.gz

      all the items after "CL:" are separated by space, except items between "@RG" and "PL:ILLUMINA" which are separated by tab, since they are read group information.

      Comment


      • #4
        Ah, you must be running the command from within a script where you've explicitly input tabs rather than giving a string with "\t" tab designations. Try the following directly from the command line:

        Code:
        bwa mem -t 10 -R "@RG\tID:Sample1\tSM:Sample1\tCN:SequencingCenter\tDS:Project1\tDT:2015-02-09\tPL:ILLUMINA" -M human_g1k_v37_decoy.fasta read1.fastq.gz read2.fastq.gz
        That's likely to work. Also, I added the SM tag, which picard requires. I ran into an issue like this once when I was sending commands through multiple layers of programs on a cluster. It turned out that the "\t" indicators were getting expanded to actual tabs at some point, so I had to escape things ("\\t").

        Comment


        • #5
          There's a bug in bwa here: in main() in main.c it ought to sanitise argv[i] (say, by changing tabs and newlines to spaces) before adding it to the @PG header. (And similarly argv[0] really.)

          As dpryan notes, bwa will parse \t escape sequences so this would provide a workaround as \t will appear in the @PG line rather than actual tab characters. Depending on how many layers of scripting are involved, it may or may not be easy to get \t sequences to bwa without a previous layer turning them into tabs first!

          Comment


          • #6
            Indeed, BWA will also allow incorrect header tags like "asd:f".
            Last edited by dpryan; 03-05-2015, 05:50 AM. Reason: Not enough coffee

            Comment


            • #7
              Thanks, dpryan and jmarshall

              I have two layers of scripting. I will try to see whether it is possible to print "\t". It would be best if SAM format has some way to escape tabs in the command line.

              I tried to set "VALIDATION_STRINGENCY" as "LENIENT" in picard, it goes through with the warning. I hope the downstream tools can also accept this.
              Last edited by yinshe; 03-05-2015, 05:57 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                06-06-2024, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:54 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-14-2024, 07:24 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-13-2024, 08:58 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-12-2024, 02:20 PM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Working...
              X