Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpileup question...

    Hello-

    Here's a tricky one...I've been beating my head for several days trying to figure out what's going on. Here's the project: we're looking at a nuclear gene panel of 198 genes, enriched using RainDance (so there's no risk capturing homologous genes outside the gene set...RainDance reported that their primer sets are all unique). We ran one sample per HiSeq lane.

    That part was all easy. The alignment, which I expected to be straightforward has become a challenge. Aligning against the full human genome a lot of coverage is lost do regions of homology in the genome outside the gene set. To get around this problem and preserve genomic positions, I masked (replaced nucleotide with N) at nonamplified positions in the genome. The result is a genome full of N's except for the exons of the 198 genes of interest. The alignment (using BWA) works fine and looks great. However, variant detection has been an issue. I tried a variety of different settings, but ultimately settled on what's listed on the mpileup page:

    1. samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
    2. bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

    The only modification is I set -D to something like 1M (I have very deep coverage and didn't want to rule anything out because of high coverage).

    After running vcfutils there are zero variants detected. When I look at the raw bcf file there are >400000 variants. When I look at them, the reference allele at every position is 'N'. So, I expect some reference positions to be 'N' since since I masked the sequences under the primers, but these should be minimal to be sure. I double checked sequences as well by looking at the masked reference genome I built and even where the reference is not N, bcftools is reporting that it is.

    Any ideas about what could be happening to make bcftools think every position is N? Any suggestions would be greatly appreciated...thanks.

  • #2
    The simplest explanation is that there is a naming problem. Double-check that it's the exact same file that you aligned with, that the bwa index and the samtools index were really built from the exact same file, and that the names in the .bam file match that reference. That causes the all-N thing to happen, it's not realizing that the reference fasta you gave it matches what the .bam files says reads were aligned to.

    Comment


    • #3
      I had this problem a few months back. As swbarnes said it may be a naming problem with the headers.

      I seem to recall also having to generate a fasta index with the samtools faidx command in the same directory as well, which may have solved the problem.

      Comment


      • #4
        solved...

        Originally posted by colindaven View Post
        I had this problem a few months back. As swbarnes said it may be a naming problem with the headers.

        I seem to recall also having to generate a fasta index with the samtools faidx command in the same directory as well, which may have solved the problem.
        Headers were fine. The problem was that when I rewrote the masked genome out, I put each chromosome on one line, and that was causing a seg fault in samtools faidx. Once I broke it into multiple lines and removed blank lines it worked fine. Thanks for the help

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X