Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dmacmillan
    Member
    • Jan 2012
    • 49

    BWA sampe to bam?

    So as everyone who uses bwa knows, the sampe function outputs a file in sam format. What I want to do is somehow convert that sam file to a bam file in some sort of pipe? It seems easy to implement, but I keep getting an error from samtools.

    cat file.sam | samtools view -Sb

    that does not work!
  • kopi-o
    Senior Member
    • Feb 2008
    • 319

    #2
    Look at the samtools manualpage: http://samtools.sourceforge.net/samtools.shtml

    You are looking for samtools view -bS or samtools view -bt

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      What you want is something like:

      bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSho out.bam -;

      Comment

      • xied75
        Senior Member
        • Feb 2012
        • 129

        #4
        Dear all,

        Just wondering, SAM is far bigger than BAM, and seems not much people will open the SAM and read it, if from BWA direct output BAM, it saves a lot effort and the disk I/O is faster due to smaller file size. Does this make sense or I forgot something?

        Best,

        dong

        Comment

        • arvid
          Senior Member
          • Jul 2011
          • 156

          #5
          Originally posted by xied75 View Post
          Dear all,

          Just wondering, SAM is far bigger than BAM, and seems not much people will open the SAM and read it, if from BWA direct output BAM, it saves a lot effort and the disk I/O is faster due to smaller file size. Does this make sense or I forgot something?

          Best,

          dong
          Theoretically, when your server is more CPU-limited than I/O-limited and you only need to sequentially read the whole file, SAM will be faster than BAM (due to the compression overhead in BAM). I found that this is never the case for our applications and therefore pipe aligners directly into a samtools chain (with the -m option to samtools sort to fit most alignments in memory, thus avoiding temporary files to be written to disk), to directly get a sorted BAM on disk.
          Last edited by arvid; 04-22-2012, 11:14 PM.

          Comment

          • dmacmillan
            Member
            • Jan 2012
            • 49

            #6
            Originally posted by swbarnes2 View Post
            What you want is something like:

            bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSho out.bam -;
            I understand what you are doing here, but what is with the '-;' at the end (ignoring the single quotations)?

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Originally posted by dmacmillan View Post
              I understand what you are doing here, but what is with the '-;' at the end (ignoring the single quotations)?
              the '-' means "the thing that's being piped". At least, that's how I understand it. That command works, I use it all the time just like I wrote it there, so would this:

              Code:
              bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSh - > out.bam;

              Comment

              • sdriscoll
                I like code
                • Sep 2009
                • 436

                #8
                I don't know if it's necessary from the BWA output or not but I like to use the -F option for output from bowtie to eliminate unaligned reads from making their way into the BAM file. Also the -h option isn't necessary in this example - the BAM header gets created appropriately..in fact I don't think samtools will allow you to create a BAM file from a SAM file without the SAM file already having the correct header information. I've only needed the -h option when I view BAM files. By default the header is left off when viewing a BAM file as SAM via Samtools.

                So what I always use is this:

                Code:
                bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bS -F 0x04 - > out.bam
                sometimes followed by this:

                Code:
                samtools sort out.bam out-sorted
                Bowtie doesn't properly sort its output and I don't remember if BWA does either. If you use the BAM file for any downstream analysis you usually need it to be sorted by chromosome and position.
                /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                Salk Institute for Biological Studies, La Jolla, CA, USA */

                Comment

                • dmacmillan
                  Member
                  • Jan 2012
                  • 49

                  #9
                  Interesting tips, I will try both, thanks!

                  Comment

                  • arvid
                    Senior Member
                    • Jul 2011
                    • 156

                    #10
                    To reduce the I/O load (and total CPU time as well) even further, this is my favourite:

                    Code:
                    bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSu -F 0x04 - | samtools sort -m 4294967296 - out.sorted 
                    samtools index out.sorted.bam
                    Set -m as high as you can afford; in my hands samtools sort needs RAM up to 2x the value specified there in bytes (I set this to 16 GB when running on a server, which is enough for most BAMs to be sorted without writing temporary files to disk). -u removes the compression/decompression overhead in the pipe between view and sort.

                    Comment

                    • swbarnes2
                      Senior Member
                      • May 2008
                      • 910

                      #11
                      piping into samtools sort works? I was afraid that that would get ugly.

                      How can I ask the server I'm on how much memory I can devote to sort?

                      Comment

                      • nilshomer
                        Nils Homer
                        • Nov 2008
                        • 1283

                        #12
                        Use the "-m" option in samtools sort instead.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Pathogen Surveillance with Advanced Genomic Tools
                          by seqadmin




                          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                          03-24-2025, 11:48 AM
                        • seqadmin
                          New Genomics Tools and Methods Shared at AGBT 2025
                          by seqadmin


                          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                          The Headliner
                          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                          03-03-2025, 01:39 PM
                        • seqadmin
                          Investigating the Gut Microbiome Through Diet and Spatial Biology
                          by seqadmin




                          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                          02-24-2025, 06:31 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-20-2025, 05:03 AM
                        0 responses
                        41 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-19-2025, 07:27 AM
                        0 responses
                        46 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-18-2025, 12:50 PM
                        0 responses
                        36 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-03-2025, 01:15 PM
                        0 responses
                        191 views
                        0 reactions
                        Last Post seqadmin  
                        Working...