Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    How to patch BWA

    Ok, I figured it out! Here is how I did it...

    go into the SVN checkout of bio-bwa/trunk/bwa, then run this:

    Code:
    patch bwape.c BWA_read_group_patch.diff
    then:

    Code:
    make
    then test it:

    Code:
    ./bwa sampe
    
    Usage:   bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>
    
    Options: -a INT   maximum insert size [500]
             -o INT   maximum occurrences for one end [100000]
             -n INT   maximum hits to output for paired reads [3]
             -N INT   maximum hits to output for discordant pairs [10]
             -c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
             -f FILE sam file to output results to [stdout]
    
             -P       preload index into memory (for base-space reads only)
             -s       disable Smith-Waterman for the unmapped mate
             -A       disable insert size estimate (force -s)
    
             -i       read group identifier (ID)
             -m       read group sample (SM), required if ID is given
             -l       read group library (LB)
             -p       read group platform (PL)
    Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
           2. For reads shorter than 30bp, applying a smaller -o is recommended to
              to get a sensible speed at the cost of pairing accuracy.
    the -i, -m, -l and -p options are the ticket!

    Comment


    • #17
      Originally posted by wjeck View Post
      Follow up question: Is there a way to edit the information in the @RG tag after the files have been merged in BAM format? I'd like to add and subtract information from these lines downstream, and I can't figure out an elegant way to get into them without writing out an entire SAM file and translating it back to BAM.
      Were you ever be able to figure it out (with the already a merged bam file)?

      Comment


      • #18
        The way to do this now is to use the Picard command line tool, in the latest picard version.

        Comment


        • #19
          Usage: bwa sampe [options] <prefix> <in1.sai> <in2.sai> <in1.fq> <in2.fq>

          Options: -a INT maximum insert size [500]
          -o INT maximum occurrences for one end [100000]
          -n INT maximum hits to output for paired reads [3]
          -N INT maximum hits to output for discordant pairs [10]
          -c FLOAT prior of chimeric rate (lower bound) [1.0e-05]
          -f FILE sam file to output results to [stdout]
          -r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]
          -P preload index into memory (for base-space reads only)
          -s disable Smith-Waterman for the unmapped mate
          -A disable insert size estimate (force -s)

          Notes: 1. For SOLiD reads, <in1.fq> corresponds R3 reads and <in2.fq> to F3.
          2. For reads shorter than 30bp, applying a smaller -o is recommended to
          to get a sensible speed at the cost of pairing accuracy.
          This is BWA sampe in 0.5.9-r16 version.
          With -r option followed by such kind of string you could insert RG directly during mapping.

          For editing RG lines use AddOrReplaceReadGroup in Picard.

          Comment


          • #20
            I can attest that both of these tools work well. The PICARD tool in particular is vastly quicker than my previous workaround.

            Comment


            • #21
              Originally posted by lh3 View Post
              You may try "samtools merge", using options -r and -h. You write your @RG header lines in a file provided to -h; -r will add RG:Z: tag to each of the alignment, based on file names.

              EDIT: for an example:

              http://sourceforge.net/apps/mediawik...rged_alignment

              I posted this on another thread, but

              -r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]

              gave me error: " malformated @RG line"

              Can you please help?

              Comment


              • #22
                Originally posted by jyli View Post
                I posted this on another thread, but

                -r STR read group header line such as `@RG\tID:foo\tSM:bar'[null]

                gave me error: " malformated @RG line"

                Can you please help?
                It's probably the "\t". I've had trouble with that before.

                Probably best to use the Picard tool for it.
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment


                • #23
                  `@RG\tID:foo\tSM:bar'
                  Maybe is the `quote. Try to copy and paste this string:

                  '@RG\tID:foo\tSM:bar'

                  Let me know.

                  Comment


                  • #24
                    Originally posted by freeseek View Post
                    @Michael.James.Clark the following two lines of bash code:
                    Code:
                    echo -e "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina" > rg.txt
                    samtools view -h ga.bam | cat rg.txt - | awk '{ if (substr($1,1,1)=="@") print; else printf "%s\tRG:Z:ga\n",$0; }' | samtools view -uS - | samtools rmdup - - | samtools rmdup -s - aln.bam
                    should add to the bam file the read group information in the same way samtools merge adds the read group information to the two bam files as described by javijevi. The idea is to unpack the bam file, add the read group header, add the read group information to every read, repack the file, and remove duplicates. Again, remove duplicates only if the coverage is not too deep.

                    hi please i got this error, how can i resolve it? i have added the readgroup bam files and used samtools to merge them but when i run the somaticindel detector from GATK it will give me the error below.
                    here are the commands that i used in adding the read group and merge the bam files
                    -rh rgmt.txt - genome_110506_SN13.bam genome_110506_SN132.bam genome_110506_SN132_A.bam > newmut.bam
                    and here is the GATK command i used for the somaticindeldetector
                    elendin@elendin-HP-Pavilion-dv6700-Notebook-PC:~/analysis of rnaseq bamfiles$ java -jar GenomeAnalysisTK.jar -R VitisVinifera.fasta -T SomaticIndelDetector -o indels.vcf -verbose indels.txt -I:normal wt.bam -I:tumor newmut.bam
                    and here is the error below
                    MESSAGE: SAM/BAM file SAMFileReader{/home/elendin/analysis of rnaseq bamfiles/newmut.bam} is malformed: Read HWI-ST132_0461:3:2201:1211:140854#GTCCTA is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://www.broadinstitute.org/gsa/wi...laceReadGroups to fix this problem
                    ##### ERROR ------------------------------------------------------------------------------------------

                    please help me
                    thanks a lot

                    Comment


                    • #25
                      Add read groups to bam files using bwa-0.6.2

                      Hi everybody,


                      I tried to add the read groups to bam files without successful.

                      I'm using bwa (bwasw 454 reads) and I have tried with the command merge but don't work.

                      I have read the other post http://seqanswers.com/forums/showthread.php?t=4180
                      and http://sites.duke.edu/rainbowblog/20...p-information/

                      But I still couldn't add the read groups.

                      Please Can someone help me?
                      Thanks in advance

                      Cris
                      Last edited by cfrias; 02-15-2013, 01:24 PM. Reason: I tried the solution!!!

                      Comment


                      • #26
                        I think I switched to using Picard tools AddOrReplaceReadGroups function.

                        Try looking here:

                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                        Haven't had to do this in a while, though, since I started putting read groups in at the beginning, during alignment. I believe BWA now does that with proper use of the alignment option. The best way to solve this problem is to make sure it doesn't happen in the first place.

                        Comment


                        • #27
                          OK, It's Friday nigth... But I try the solution

                          With the Brugger's script!!
                          THANK YOU VERY MUCH!!!

                          Comment


                          • #28
                            I wonder if these easiest thing would be for software that insists on Read Groups instead provided a parameter to ignore read groups. This Read Groups thing has turned out to be more of a hassle than a benefit. Software should be more robust.

                            Comment


                            • #29
                              Thank very much Wjeck and Richard!!!

                              Yes, Richard I think the same.
                              Always I read the manuals, but the useful information is in this forum because people
                              have similars problems.

                              I am only want to add the read groups to use GATK to improve the aligments from bwa bwasw.
                              So, I need to remove duplicates and I have a pool of reads.
                              For this purpose I need to put the read groups In order to avoid remove similar reads from differents individuals.
                              (duplicated read is a read that have the same maping coordinates and the same CIGAR string,
                              it isn't?)

                              Cris

                              Comment


                              • #30
                                Hi everyone,

                                I'm having problems with samples merged /w samtools.

                                I get this error when I start to run our pipeline:

                                MESSAGE: SAM/BAM file SAMFileReader{/csc/aaltonen/cg3/projects/Kaposi/Kapo93+94+95/09-1107/Kapo_93-95_09-1107_merged_s_99.nodup.bam} is malformed: Read ILLUMINA-8C38E9_0112:3:84:1445:15512#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://gatkforums.broadinstitute.org...lacereadgroups to fix this problem

                                I ran samtools with samtools merge -h rg.txt -r

                                Information in the rg.txt should be correct:

                                @RG ID:Kapo93+94+95_1_09-1107_091207_HWI-EAS418_9_s_7 DS:091207_HWI-EAS418_9 SM:09-1107
                                @RG ID:Kapo93+94+95_1_09-1107_100618_ILLUMINA-8C38E9_0112_s_3 DS:100618_ILLUMINA-8C38E9_0112 SM:09-1107
                                @RG ID:Kapo93+94+95_1_09-1107_110103_HWUSI-EAS1785_0223_s_2 DS:110103_HWUSI-EAS1785_0223 SM:09-1107
                                @RG ID:Kapo93+94+95_1_09-1107_110616_SN588_0054_AC00T5ABXX_s_4 DS:110616_SN588_0054_AC00T5ABXX SM:09-1107
                                @RG ID:Kapo93+94+95_1_09-1107_111028_SN653_0108_BB0418ABXX_s_3 DS:111028_SN653_0108_BB0418ABXX SM:09-1107
                                @RG ID:Kapo93+94+95_1_09-1107_120216_SN670_0098_AC04HRABXX_s_1 DS:120216_SN670_0098_AC04HRABXX SM:09-1107
                                @RG ID:Kapo93-95-1_09-1107_110714_SN588_0055_AB0A5UABXX_s_1 DS:110714_SN588_0055_AB0A5UABXX SM:09-1107

                                Any help here would be appreciated greatly!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                37 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X