Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • vcf to consensus call reference instead of N

    Hi all

    I have a few genomes sequenced using illumina. I have used samtools and vcfutils to make a consensus for each. All pretty standard stuff. However using vctutils to make the consensus gives me far to many N nucleotides to be happy with.

    Is there a way that i can create this consensus but where it currently has provided an N nucleotide it actually inserts the reference nucleotide in its place?

    Your help would be much appreciated. Any ideas?

  • #2
    Have you tried samtools mpileup?

    Comment


    • #3
      Sorry thats what ive used in the process. I pretty much used this exact command:

      samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
      bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

      Anything i can change there to get it to call the ref seq rather than N's?

      Comment


      • #4
        Normally you get those Ns when you don't specify the reference or specify a different version of the reference file in the command line.

        One thing I would check is if the mpileup is producing the reference bases at each position and not Ns without piping the first output to bcftools.

        Comment


        • #5
          Thanks for the response i'll test the mpileup tonight.

          I'm pretty sure i have the reference sorted. I've used snpEff to analyse the SNPs in the genomes and for each SNP it has had the correct reference sequence.

          Could be be a problem of coverage? as in the sequencing reads might not cover this area?

          Comment


          • #6
            Could be coverage, but if that's the case then it shouldn't be making a variant call at that site right with a certain degree of confidence right?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            62 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X