Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exclude certain tags from BAM file

    Wishing you guys have a nice Mayday.

    I want to exclude certain reference tags(genes) from BAM files and direct the output to a new BAM file. I tired with SAMtools commend as below:

    Code:
    foreach line (`cat tags2remain.txt `)
      samtools view 0H.ptMapped.bam | grep "$line" >>! 0H.tags2RemainMapped.bam
    end
    I obtained the output as

    Code:
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "0H.tags2RemainMapped.bam"
    Am I doing that correctly, kindly please give me some suggestion. OR, should I use Picard tools to do so? If yes, may I know what Picard tools I should use.

    Thank you very much for the help.

  • #2
    Tags is the wrong word (that would be things like MD, or RG - information about each read). You are asking about filtering SAM/BAM files by read name.

    It appears the for loop will try to concatenate BAM files together - that doesn't work.

    How many read names are you dealing with? A single grep using a regular expression to describe all the read names at once could work (if possible given limits on command line lengths).

    I'm not sure of any existing tool to tackle this particular need.

    Personally I'd write a simple Python script to load all the names in memory (as a Python set for efficiency), then loop over the reads and filter them. Something similar would be easy in Perl or Ruby if you prefer, or indeed in Java using Picard.

    Comment


    • #3
      bamtools can filter on sequence tags:

      Code:
      $ bamtools filter -h
      Description: filters BAM file(s).
      
      Usage: bamtools filter [-in <filename> -in <filename> ...] [-out <filename> | [-forceCompression]] [-region <REGION>] [ [-script <filename] | [filterOptions] ]
      
      ...
        -tag <TAG:VALUE>                  keep reads with this
      ...
      HTH

      d

      Comment


      • #4
        Thanks guys for your input.

        It is a bit strange I don't receive feedback alert email from the forum.

        Maybe I should rephrase my question. Put it in this way, I want to filter the genes (tags) which is let's say ribosomal protein from mapped reads .bam file.

        I'm thinking @SQ could be the way to do, however, I'm not sure this is the right to do.

        Thanks for your time and help.

        Comment


        • #5
          Why wouldn't this work:

          Code:
          samtools view unfiltered.bam | awk or grep or whatever | samtools view -bS > filtered.bam
          You will likely lose your headers this way, unless you are careful with the awk or grep or whatever.

          Comment


          • #6
            Thanks swbarnes2 for your suggestion.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X