Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM flag field and removing unmapped reads from BFAST output

    Hi there,

    I'm using BFAST to align Solexa reads to a very small portion of a genome (~3kb), and have been considering the best way to remove unmapped reads from the output since these unnecessarily bulk up the output .sam file. I know that samtools can filter an incoming .sam file using the -F command. However, I've read some documentation on the SAM flag format and must admit I find it pretty confusing. Within the flag field I know there are fields for both "the mate is unmapped" and "the query sequence itself is unmapped", but for non-paired-end Solexa reads can either of these be used for removing unmapped reads? Furthermore, what would be the integer or string used in the -F command?

    Alternatively, there is the option in samtools view to filter by map quality (MAPQ). Would setting map quality filter to e.g. 1 remove all unmapped reads without affecting the filtered alignment from BFAST postprocess?

    Alternatively again, dbamfilter within the DNAA package has the capacity to remove unmapped reads, but if samtools can do the job I'd like to minimise the number of apps employed.

    What are thoughts on the best strategy?
    Aiden

  • #2
    Hi Aiden,

    I hope I'm not staing the obvious here, but are you familiar with Picard? They are some Java-based commandline tools to manipulate sam files and one of those may help you: ViewSam.jar. It basically prints a sam or bam file to the screen but you can set a flag to report all reads, just the aligned reads or just the unaligned reads.
    Take a look: http://picard.sourceforge.net/

    Cheers,
    Wil

    Comment


    • #3
      Originally posted by aiden View Post
      Hi there,

      I'm using BFAST to align Solexa reads to a very small portion of a genome
      (~3kb), and have been considering the best way to remove unmapped reads from
      the output since these unnecessarily bulk up the output .sam file. I know that
      samtools can filter an incoming .sam file using the -F command. However, I've
      read some documentation on the SAM flag format and must admit I find it pretty
      confusing. Within the flag field I know there are fields for both "the mate is
      unmapped" and "the query sequence itself is unmapped", but for non-paired-end
      Solexa reads can either of these be used for removing unmapped reads?
      Look at the BAM spec 2.2.2 (Notes):

      Code:
      1. Flag 0x02, 0x08, 0x20, 0x40 and 0x80 are only meaningful when flag 0x01 is present.
      Assuming you are using Fragment data, you want to filter using the 0x0004 flag.

      Furthermore, what would be the integer or string used in the -F command?
      Code:
      $ samtools view -F 4 ./foo.bam # display mapped reads only
      $ samtools view -f 4 ./foo.bam # display unmapped reads only
      Alternatively, there is the option in samtools view to filter by map quality
      (MAPQ). Would setting map quality filter to e.g. 1 remove all unmapped reads
      without affecting the filtered alignment from BFAST postprocess?
      Go for samtools as suggested.
      You need to do the postprocessing prior to be able to filter your reads anyway.

      Alternatively again, dbamfilter within the DNAA package has the capacity to
      remove unmapped reads, but if samtools can do the job I'd like to minimise the number of apps employed.
      samtools can do the job. But give dnaa a try too.
      -drd

      Comment


      • #4
        Thanks for the very helpful replies, much appreciated.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          Today, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 07:17 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-29-2024, 10:49 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Working...
        X