Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vladimirp
    Junior Member
    • May 2014
    • 7

    Consensus sequence for a subset of sequences in BAM

    Hi everyone,

    I have a BAM file containing aligned reads to a reference. I need to calculate a consensus sequence but not from all the reads in the BAM file but only from a subset, and I know in advance which sequences I am interested in.

    Is there a tool that would allow me to do this task?

    I think I can extract the required sequences from the BAM file and map them to a reference myself. I would like to avoid that (re-mapping part) because the reads in the original BAM file were already aligned based on specific criteria and I want to keep that alignment.

    Thanks.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Just use samtools mpileup:

    Code:
    samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    Comment

    • vladimirp
      Junior Member
      • May 2014
      • 7

      #3
      Doesn't this command generates a consensus sequence based on all reads in the BAM file? That's not what I need.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        The example command does, yes, but you don't have to use the exact example. You can either prefilter things or pipe (presumably, I've never tried with mpileup) in a filtered set or look at just one region or whatever else you want with that command as the basis.

        Comment

        • vladimirp
          Junior Member
          • May 2014
          • 7

          #5
          That's exactly my problem - I cannot figure out how to pre-filter BAM to extract the reads by their Id's.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            I assume you don't actually want to select reads based on their names, but something else. So just say what criterion you want to use.

            Comment

            • vladimirp
              Junior Member
              • May 2014
              • 7

              #7
              I actually do want to select reads based on their names. It happened that the original software encodes important information in the sequence title and among all the reads in the BAM file I only want a subset based on read Id's. I do not know if that's possible. Perhaps, not.

              There is a somewhat related question about samtools/vcf, etc. Let's say I have a BAM file. Is it possible to get a list of all nucleotides (from all reads) in a specific position? Something like:

              reference position 10
              read001 A
              read002 A
              read003 T
              ...

              Can it be done? Thanks.

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                There's generally no great way to extract reads by name other than sorting by name and then either grepping the output or passing it through a script (pysam would work nicely). Whether or not to presort would depend on the length of the list. Alternatively, if you're basing things off of the tile and XY coordinate (i.e., these are illumina reads), then just use something like pysam and the parse the qname.

                Regarding the list of calls at a given location, that's similar to the output of mpileup, except the reads aren't labeled. If you really want the reads, you can access the underlying code in C (the samtools C API) and python (via pysam) and should then have access to this.

                I guess a better question would be what you're actually trying to achieve. There's likely a better way to go about things in general.

                Comment

                • vladimirp
                  Junior Member
                  • May 2014
                  • 7

                  #9
                  I am doing some bioinformatics analysis where I need to look at different subsets of reads. It is a very specific task. Currently, the sequencer aligns all the reads to a reference and put this info into a BAM file. For my task, I would like to be able (1) to extract a subset of reads from the BAM file, and (2) get read bases at a specific position. The volume of the data is relatively low so efficiency is not a big deal.

                  Grepping BAM/SAM file for the step 1 actually works. Thanks.

                  For the step 2, I do not really need to know from what read exactly a given base comes from. Thanks to your hint, I found that mpileup does almost what I want, for example: ref XX X XXXXX ,.,-1c,...,,,,.,,...,,.,,,. I will try to figure out what the symbols in the encoding mean.

                  Lastly, is it possible to export reads aligned to the reference from BAM/SAM to a FASTA file? (there are no indels in the reads).
                  Last edited by vladimirp; 05-08-2014, 03:45 PM.

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #10
                    I guess it depends on what exactly you mean by "aligned to the reference". Lines in a SAM/BAM file generally are of alignments, so it's unclear what you want. Do you perhaps want something like samtools tview?

                    Comment

                    • vladimirp
                      Junior Member
                      • May 2014
                      • 7

                      #11
                      There are many scripts that just export sequences in FASTA format but the placement of the read relative to the reference is lost. For example,

                      >ref
                      agtgtgagtg
                      >read1
                      gagtg

                      while I want:
                      >ref
                      agtgtgagtg
                      >read1
                      -----gagtg

                      Something like samtools tview would be perfect but can I dump this type of alignment to a text file?

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        If nothing else, I think you can script IGV, which will work if an image (rather than ascii) is sufficient for your needs. Otherwise, I don't personally know of anything off-hand.

                        Comment

                        • vladimirp
                          Junior Member
                          • May 2014
                          • 7

                          #13
                          I need any parseable text file. Thanks for all the help anyways!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            New Genomics Tools and Methods Shared at AGBT 2025
                            by seqadmin


                            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                            The Headliner
                            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                            03-03-2025, 01:39 PM
                          • seqadmin
                            Investigating the Gut Microbiome Through Diet and Spatial Biology
                            by seqadmin




                            The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                            02-24-2025, 06:31 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-20-2025, 05:03 AM
                          0 responses
                          17 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-19-2025, 07:27 AM
                          0 responses
                          18 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-18-2025, 12:50 PM
                          0 responses
                          19 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-03-2025, 01:15 PM
                          0 responses
                          186 views
                          0 reactions
                          Last Post seqadmin  
                          Working...