Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kirstyn
    Junior Member
    • May 2012
    • 8

    Mpileup consensus Ns

    Hi

    I have been generating consensus sequences from a BWA-aligned BAM file using the following script:
    samtools mpileup -Euf Reference.fasta input.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    In Tablet view the BAM file shows decent coverage across the genome and my reads have been quality filtered. My issue is that the consensus sequence has Ns even in positions where I can clearly see reads that map to the reference. At first I thought this was to do with a default read depth cutoff, but even accounting for this the numbers don't add up!

    Does anyone know why mpileup is calling an N at certain positions?
  • auratus
    Junior Member
    • Jan 2013
    • 1

    #2
    16S and 23S consensus replaced with N

    I have the same issue. In particular with 16S and 13S in bacteria. I have plenty of coverage in this area but I got "N" on my consensus sequence.
    I used this:

    samtools mpileup -u foo.bam | bcftools view -cg - | vcfutils.pl vcf2fq > foo.fq

    Any answer?

    Comment

    • syfo
      Just a member
      • Nov 2012
      • 103

      #3
      Same here.
      Ns in the consensus fasta while the region seems to be well covered. I guess it is about some quality filtering but I do not manage to get the right parameters.
      Anyone with a proposition please?

      Command line:
      Code:
      samtools mpileup -u infile.bam | bcftools view -cg - | vcfutils.pl vcf2fq | seqtk fq2fa - > outfile.fa

      Comment

      • scami
        Member
        • Sep 2010
        • 55

        #4
        May be indexing.....

        Hi Guys
        the N in the consensus may me due to a missing indexed reference file. Try the following:

        Code:
        samtools index alignmentFile.bam
        and then try the mpileup command again.

        Hope it helps

        Comment

        • syfo
          Just a member
          • Nov 2012
          • 103

          #5
          Hi, thanks for your help.

          Well, in my case I precisely do not want samtools to use the reference sequence for the calling, I am looking for the consensus from the reads only.

          Also, most of the bases are correctly called but a couple of positions are missing and end up in Ns although base information seems available in the alignments. I am suspecting base quality to be an issue but I don't manage to get a decent set of parameters for the command line.

          Even with something like this
          samtools mpileup -RAB -d10000000 -Q0 -q0 -u | bcftools view -g -p1 - | vcfutils.pl vcf2fq -d0 -Q0 | seqtk fq2fa -

          some bases are discarded.

          I believe vcfutils.pl is involved because if I replace the "AF1=0" tag by "AF1=1" in the VCF (after the bcftools view command) N are replaced by a proper base information.

          Still, even small and "obvious" deletions in the reference sequence are not handled the way I want because the consensus includes "N" where it should be uninterrupted, but that is another issue I guess.
          Note: the bam files are already sorted and indexed.

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 07-02-2026, 11:08 AM
          0 responses
          7 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...