Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe to bam?

    So as everyone who uses bwa knows, the sampe function outputs a file in sam format. What I want to do is somehow convert that sam file to a bam file in some sort of pipe? It seems easy to implement, but I keep getting an error from samtools.

    cat file.sam | samtools view -Sb

    that does not work!

  • #2
    Look at the samtools manualpage: http://samtools.sourceforge.net/samtools.shtml

    You are looking for samtools view -bS or samtools view -bt

    Comment


    • #3
      What you want is something like:

      bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSho out.bam -;

      Comment


      • #4
        Dear all,

        Just wondering, SAM is far bigger than BAM, and seems not much people will open the SAM and read it, if from BWA direct output BAM, it saves a lot effort and the disk I/O is faster due to smaller file size. Does this make sense or I forgot something?

        Best,

        dong

        Comment


        • #5
          Originally posted by xied75 View Post
          Dear all,

          Just wondering, SAM is far bigger than BAM, and seems not much people will open the SAM and read it, if from BWA direct output BAM, it saves a lot effort and the disk I/O is faster due to smaller file size. Does this make sense or I forgot something?

          Best,

          dong
          Theoretically, when your server is more CPU-limited than I/O-limited and you only need to sequentially read the whole file, SAM will be faster than BAM (due to the compression overhead in BAM). I found that this is never the case for our applications and therefore pipe aligners directly into a samtools chain (with the -m option to samtools sort to fit most alignments in memory, thus avoiding temporary files to be written to disk), to directly get a sorted BAM on disk.
          Last edited by arvid; 04-22-2012, 11:14 PM.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            What you want is something like:

            bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSho out.bam -;
            I understand what you are doing here, but what is with the '-;' at the end (ignoring the single quotations)?

            Comment


            • #7
              Originally posted by dmacmillan View Post
              I understand what you are doing here, but what is with the '-;' at the end (ignoring the single quotations)?
              the '-' means "the thing that's being piped". At least, that's how I understand it. That command works, I use it all the time just like I wrote it there, so would this:

              Code:
              bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSh - > out.bam;

              Comment


              • #8
                I don't know if it's necessary from the BWA output or not but I like to use the -F option for output from bowtie to eliminate unaligned reads from making their way into the BAM file. Also the -h option isn't necessary in this example - the BAM header gets created appropriately..in fact I don't think samtools will allow you to create a BAM file from a SAM file without the SAM file already having the correct header information. I've only needed the -h option when I view BAM files. By default the header is left off when viewing a BAM file as SAM via Samtools.

                So what I always use is this:

                Code:
                bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bS -F 0x04 - > out.bam
                sometimes followed by this:

                Code:
                samtools sort out.bam out-sorted
                Bowtie doesn't properly sort its output and I don't remember if BWA does either. If you use the BAM file for any downstream analysis you usually need it to be sorted by chromosome and position.
                /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                Salk Institute for Biological Studies, La Jolla, CA, USA */

                Comment


                • #9
                  Interesting tips, I will try both, thanks!

                  Comment


                  • #10
                    To reduce the I/O load (and total CPU time as well) even further, this is my favourite:

                    Code:
                    bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSu -F 0x04 - | samtools sort -m 4294967296 - out.sorted 
                    samtools index out.sorted.bam
                    Set -m as high as you can afford; in my hands samtools sort needs RAM up to 2x the value specified there in bytes (I set this to 16 GB when running on a server, which is enough for most BAMs to be sorted without writing temporary files to disk). -u removes the compression/decompression overhead in the pipe between view and sort.

                    Comment


                    • #11
                      piping into samtools sort works? I was afraid that that would get ugly.

                      How can I ask the server I'm on how much memory I can devote to sort?

                      Comment


                      • #12
                        Use the "-m" option in samtools sort instead.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Recent Advances in Sequencing Analysis Tools
                          by seqadmin


                          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                          05-06-2024, 07:48 AM
                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 05-10-2024, 06:35 AM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-09-2024, 02:46 PM
                        0 responses
                        26 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-07-2024, 06:57 AM
                        0 responses
                        21 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-06-2024, 07:17 AM
                        0 responses
                        21 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X