Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compensating for multi-mapper count in mpileup?

    I use mpileup files to call variants for subsections in the genome at a time with a set of custom python/awk scripts. This works well for unique mapping reads, but if I have reads that map to the genome more than once, this information gets lost during mpileup generation.

    I have written some scripts to count the 'mappability' of a given read, as my mapper of choice, bowtie, does not report this in the NH or IH tags, but am struggling a bit incorporating this 'correction factor' in to mpileups.

    From what I can make out in the manuals samtools might not be able to do so. Am I missing something, or is there perhaps a different pileup-like format I could use to extract 'normalised'-coverage and variants?

  • #2
    Depending on your goal, you could simply place multi-mapping reads at a random location, which would accomplish the normalization you are looking for.

    Comment


    • #3
      Well, essentially, if a read maps to 2 loci, and I'm generating a pileup for one of these, I want the bases of this read to only count half towards the total coverage in the pileup.

      Does that make sense?

      I'm working with miRNAs, so mapping results to multiple identical loci, need to be accounted for in some way (ie dividing by the number of identical loci).

      Comment


      • #4
        Right... that's what random assignment does, on average. A multi-mapping read is only printed as a single line in the sam file, choosing one of the possible mapping locations at random. Many aligners support this as an output option (I think bowtie does).

        Comment


        • #5
          I see what your saying. I was working with those settings beforehand, but discussions with colleagues made me rather uncertain if the bowtie settings reported random positions, or rather just "the first occurrence in the genome".

          Probably why I ended up down the path of going for reporting of all mapping positions in the first place.

          Thanks.

          Comment


          • #6


            Looks like there may be some problem in Bowtie's random placement, so I guess you should not rely on it. But since you are postprocessing multimapped reads anyway, YOU could pick one at random and discard the others.

            Comment


            • #7
              Thanks for that link. I'll give it a go with bwa tomorrow and see how that works out.

              Comment


              • #8
                Originally posted by Brian.R View Post
                Thanks for that link. I'll give it a go with bwa tomorrow and see how that works out.
                Ok. Though I would feel remiss if I did not point you toward BBMap

                It supports random assignment for multi-mapping reads with the "ambig=random" flag.

                Comment


                • #9
                  Heh, cool, thanks for the suggestion. I'll compare the three of them once I have some time next week or so

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 06:35 AM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 02:46 PM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-07-2024, 06:57 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-06-2024, 07:17 AM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X