Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • genelab
    Member
    • Nov 2009
    • 27

    Find unmapped read from sam/bam file

    Hi guys

    Could anyone tell me how to extract the unmapped reads from sam/bam file and convert the result to fastq format using samtools?

    Thx!
  • Adrian_H
    Member
    • Feb 2010
    • 10

    #2
    samtools view -f 4 yourbamfile.bam will give you unmapped reads

    Then pull out the first column of read names (cut -f1 -d" ") and extract those reads from your original fastq files, or make an awk script to reformat the readid, sequence, and quality scores into fastq.

    Note that depending on the alignment program that you are using, unmapped reads may or may not be reported in the results. Also, some programs trim off the /1 or /2 of the read ID, if you are working with paired ends). Finally, keep in mind that if you use this to extract other flags, the sequence in the BAM file is only what aligned, and could be the reverse complement of the input. (Shouldn't be an issue for unmapped reads)
    Last edited by Adrian_H; 07-01-2010, 07:52 AM. Reason: changed filename

    Comment

    • Adamo
      Member
      • Jun 2010
      • 28

      #3
      If you've used bwasw then the command line suggested by Adrian won't work.
      You'll have to write your own perl or awk script to extract unmapped reads comparing your bam output with your fastq file and rewrite it omitting aligned reads.

      Comment

      • bhootnaath
        Junior Member
        • Jul 2009
        • 5

        #4
        can use bam2fastq

        Comment

        • csquared
          Member
          • May 2008
          • 67

          #5
          +1 on the BAM2FASTQ. Great tool...of course I'm biased as it came from my group but it is well documented and fast. Let us know if you have any questions or problems.
          HudsonAlpha Institute for Biotechnology
          http://www.hudsonalpha.org/gsl

          Comment

          • byb121
            Member
            • Aug 2009
            • 18

            #6
            Hi,

            I used bam2fastq tool to extract unmapped reads, it's really fast and better documented. but I had difficulties to address the cause of the warning message:

            Code:
            $ ./bam2fastq -o s_%#_extracted_reads.txt -f --no-aligned --unaligned --no-filter alignments.bam 
            [bam_header_read] EOF marker is absent.
            This looks like paired data from lane 1.
            Output will be in s_1_1_extracted_reads.txt and s_1_2_extracted_reads.txt
            55130926 sequences in the BAM file
            8238703 sequences exported
            WARNING: 5947209 reads could not be matched to a mate and were not exported
            Fastq files contain 1145747 reads each, which means those 5947209 unmapped reads are discarded. But I really would like to have them included in the result. Could you help me out here?

            PS: Reads are pair end, ranging from 25 - 78 after quality trimming.


            Originally posted by csquared View Post
            +1 on the BAM2FASTQ. Great tool...of course I'm biased as it came from my group but it is well documented and fast. Let us know if you have any questions or problems.

            Comment

            • vishal.rossi
              Member
              • Apr 2013
              • 25

              #7
              samtools view -bh -f 0*4 -o output.file input

              Comment

              • JonB
                Member
                • Jan 2010
                • 83

                #8
                What about reads mapping to the reverse strand? Should they be reverse complemented before converting to fastq?

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  Originally posted by JonB View Post
                  What about reads mapping to the reverse strand? Should they be reverse complemented before converting to fastq?
                  No, sequences and qualities are always the same as the source fastq, regardless of mapped strand.

                  Comment

                  • JonB
                    Member
                    • Jan 2010
                    • 83

                    #10
                    Good to know, thanks!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM
                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    26 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    19 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    187 views
                    0 reactions
                    Last Post seqadmin  
                    Working...