Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering bam files by index

    Dear all,

    Is there any possibility to filter bam files by indexes?
    I've create in R a txt file with needed sequences indexes and now I have to extract these sequences and rewrite it as a new bam file.

    Does anybody know if this is possible? Preferably in R or samtools.

    If these is not possible, is there any way to create pileup file with Rsamtools simmilar to samtools mpileup?

    Thanks for any response

  • #2
    Are you aware that 'samtools view' can be called with region information to get a sub-file (SAM or BAM format) containing just the reads mapped in that region? That sounds like what you are asking for.

    Comment


    • #3
      Thank your for your reply, unfortunately this is not what we are looking for.

      Maybe we should make more clear what kind of data we have and what we wish to accomplish.

      We sequenced a short plasmid fragment (50 bp) with a GAx2. We used 100K copies of that plasmid to generate a library and got approx. 1M reads, which we interpret that 1 copy of plasmid yielded 10 clusters on the flow cell that were detected by the sequencer. When we analyzed the reads that mapped to our 50bp reference sequence we found that there are ~190.000 different variants of reads, with ~170.000 variants occuring only 10 or less times. We want to remove those low frequency reads from the BAM file, since we think those are artifacts from PCR, and create a mpileup for future use with Varscan to detect reliable variants of that plasmid.
      So far we created an R script to make an index of those low frequency reads - how can we write a new BAM file without those reads using Rsamtools?

      Thank you
      Stephan

      Comment


      • #4
        So essentially you have generated a list of about 170,000 read names you want to remove from the BAM file?

        In python I'd create a set object of the unwanted read names, then iterate over the SAM/BAM file in one pass filtering using this set. Python sets are much faster than the typically used list data structure because they use a hash for membership testing, rather than scanning the entire list. You could easily do this with pysam (the Python samtools API wrapper).

        I presume the same basic approach would work just as well in R, but I cannot give you any specific advice there.

        Comment


        • #5
          Thank you for Your response, we'll try Python

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 12:08 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Working...
          X