Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe to bam?

    So as everyone who uses bwa knows, the sampe function outputs a file in sam format. What I want to do is somehow convert that sam file to a bam file in some sort of pipe? It seems easy to implement, but I keep getting an error from samtools.

    cat file.sam | samtools view -Sb

    that does not work!

  • #2
    Look at the samtools manualpage: http://samtools.sourceforge.net/samtools.shtml

    You are looking for samtools view -bS or samtools view -bt

    Comment


    • #3
      What you want is something like:

      bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSho out.bam -;

      Comment


      • #4
        Dear all,

        Just wondering, SAM is far bigger than BAM, and seems not much people will open the SAM and read it, if from BWA direct output BAM, it saves a lot effort and the disk I/O is faster due to smaller file size. Does this make sense or I forgot something?

        Best,

        dong

        Comment


        • #5
          Originally posted by xied75 View Post
          Dear all,

          Just wondering, SAM is far bigger than BAM, and seems not much people will open the SAM and read it, if from BWA direct output BAM, it saves a lot effort and the disk I/O is faster due to smaller file size. Does this make sense or I forgot something?

          Best,

          dong
          Theoretically, when your server is more CPU-limited than I/O-limited and you only need to sequentially read the whole file, SAM will be faster than BAM (due to the compression overhead in BAM). I found that this is never the case for our applications and therefore pipe aligners directly into a samtools chain (with the -m option to samtools sort to fit most alignments in memory, thus avoiding temporary files to be written to disk), to directly get a sorted BAM on disk.
          Last edited by arvid; 04-22-2012, 11:14 PM.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            What you want is something like:

            bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSho out.bam -;
            I understand what you are doing here, but what is with the '-;' at the end (ignoring the single quotations)?

            Comment


            • #7
              Originally posted by dmacmillan View Post
              I understand what you are doing here, but what is with the '-;' at the end (ignoring the single quotations)?
              the '-' means "the thing that's being piped". At least, that's how I understand it. That command works, I use it all the time just like I wrote it there, so would this:

              Code:
              bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSh - > out.bam;

              Comment


              • #8
                I don't know if it's necessary from the BWA output or not but I like to use the -F option for output from bowtie to eliminate unaligned reads from making their way into the BAM file. Also the -h option isn't necessary in this example - the BAM header gets created appropriately..in fact I don't think samtools will allow you to create a BAM file from a SAM file without the SAM file already having the correct header information. I've only needed the -h option when I view BAM files. By default the header is left off when viewing a BAM file as SAM via Samtools.

                So what I always use is this:

                Code:
                bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bS -F 0x04 - > out.bam
                sometimes followed by this:

                Code:
                samtools sort out.bam out-sorted
                Bowtie doesn't properly sort its output and I don't remember if BWA does either. If you use the BAM file for any downstream analysis you usually need it to be sorted by chromosome and position.
                /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                Salk Institute for Biological Studies, La Jolla, CA, USA */

                Comment


                • #9
                  Interesting tips, I will try both, thanks!

                  Comment


                  • #10
                    To reduce the I/O load (and total CPU time as well) even further, this is my favourite:

                    Code:
                    bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq | samtools view -bSu -F 0x04 - | samtools sort -m 4294967296 - out.sorted 
                    samtools index out.sorted.bam
                    Set -m as high as you can afford; in my hands samtools sort needs RAM up to 2x the value specified there in bytes (I set this to 16 GB when running on a server, which is enough for most BAMs to be sorted without writing temporary files to disk). -u removes the compression/decompression overhead in the pipe between view and sort.

                    Comment


                    • #11
                      piping into samtools sort works? I was afraid that that would get ugly.

                      How can I ask the server I'm on how much memory I can devote to sort?

                      Comment


                      • #12
                        Use the "-m" option in samtools sort instead.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Exploring the Dynamics of the Tumor Microenvironment
                          by seqadmin




                          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                          07-08-2024, 03:19 PM
                        • seqadmin
                          Exploring Human Diversity Through Large-Scale Omics
                          by seqadmin


                          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                          06-25-2024, 06:43 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 07-16-2024, 05:49 AM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-15-2024, 06:53 AM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-10-2024, 07:30 AM
                        0 responses
                        40 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-03-2024, 09:45 AM
                        0 responses
                        205 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X