Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup consensus calls A instead of a gap

    Hello,

    I'm using samtools with the following command to call the consensus of my mapping:

    samtools mpileup -uf Ref SEQ | bcftools view -cg - | vcfutils.pl vcf2fq > SEQ.fq

    The problem is that I have sequences that have 1 and some have 2 copies of a short repeat - the reference has 2. If I use the reference with 2 copies I end up with 2 copies in the consensus since it sometimes maps the reads to the first and sometimes to the second copy of the repeat. So I exchanged one copy in the reference with N's., but end up with A's instead of the gap in the consensus.

    Example:

    Ref AANNNTATTATTA
    Read1 AA---TATTATTA
    Read2 AA---TATTATTA

    it calls AAAAATATTATTA instead of AATATTATTA.

    I also get the following error if I use the reference with N's instead of the second copy (that I don't get if I use the reference with 2 copies):

    Unknown command "vcf2fq".


    I've had the same problem with the previous pileup function!

    Any idea how to change that?

    Cheers,
    Stefan

  • #2
    I wonder if this is an issue with treating N in the reference as A, or down to the fact that the reads have an A before the gap? Do you have some other examples with different bases used instead of the N's?

    Comment


    • #3
      Hi,

      Replacing "N" by "-" seams to help. It looks like it's really reading N's as A's.

      Thanks!

      Stefan

      Comment


      • #4
        So if you replaced the "N" with "C" or "T" or "G" do you get those letters instead of "A"? That would pretty much confirm the nature of this bug / design limitation, and give a good clue about where in the code to start looking.

        Reminds me of this tweet from Vince Buffalo,
        ‏@vsbuffalo: Any time you only handle A, T, C, or G, the members of IUPAC kill an innocent kitten. We've all done it. #bioinfo
        Last edited by maubp; 12-01-2012, 06:16 AM. Reason: formatting

        Comment


        • #5
          Hi,

          Yes, I checked and it calls the reference in case the position is not covered by any read.

          I should probably report that as a bug.

          Thanks!

          Stefan

          Comment


          • #6
            It seams they changed it a little bit since the pileup function, which also interprets "-" as "A", but didn't fully resolve the problem.

            Cheers,
            Stefan

            Comment


            • #7
              It seams to be a problem only when positions are "covered" by gaps in the read. Since it's calling N in the consensus if the position is not covered by a single read.

              Stefan

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                Yesterday, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-24-2024, 06:58 AM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-23-2024, 08:43 AM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Working...
              X