Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mapping to de novo scaffolds: good BAMs but empty VCFs. Why?

    I am unable to recover variants that I know exist when mapping to de novo assembled scaffolds, even though the BAMs (when checked in a genome browser (Artemis)) look great.

    I am mapping using bwa-0.6.1 and SAMtools for variants (indels and SNPs) to a non-standard reference, some assembled scaffolds I made myself in Velvet + SSPACE.

    Using standard methods, the VCFs end-up empty:

    Code:
    samtools mpileup -Euf scaffolds.fasta myfile_sorted.bam | bcftools view -bvcg -> myfile.bcf
    
    bcftools view myfile.bcf | vcfutils.pl varFilter > myfile.vcf
    Bypassing BCF/vcfultils and going right to a VCF finds SNPs, but the reference is always "N" (even though I know these regions have actual sequence, not NNNNN connectors).

    Example:

    Code:
    ##fileformat=VCFv4.1
    ##samtoolsVersion=0.1.18 (r982:295)
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	
    scaf.1	74	.	N	A,X	0	.	DP=111;I16=0,0,98,13,0,0,4267,165509,0,0,6133,352697,0,0,1284,21070;VDB=0.0808	PL	255,255,0,255,255,255
    scaf.1	76	.	N	C,X	0	.	DP=119;I16=0,0,105,14,0,0,4564,176946,0,0,6582,378738,0,0,1252,19002;VDB=0.0808	PL	255,255,0,255,255,255
    scaf.1	77	.	N	A,X	0	.	DP=129;I16=0,0,115,14,0,0,4881,186413,0,0,7182,414738,0,0,1242,18140;VDB=0.0801	PL	255,255,0,255,255,255
    scaf.1	78	.	N	A,X	0	.	DP=129;I16=0,0,115,13,0,0,4844,184614,0,0,7153,413897,0,0,1242,17464;VDB=0.0801	PL	255,255,0,255,255,255
    scaf.1	79	.	N	A,X	0	.	DP=138;I16=0,0,124,13,0,0,5152,195276,0,0,7662,443538,0,0,1246,16986;VDB=0.0801	PL	255,255,0,255,255,255
    scaf.1	80	.	N	A,X	0	.	DP=144;I16=0,0,131,13,0,0,5409,205033,0,0,8113,471497,0,0,1260,16742;VDB=0.0801	PL	255,255,0,255,255,255
    scaf.1	81	.	N	A,X	0	.	DP=143;I16=0,0,136,7,0,0,5411,206159,0,0,8239,484451,0,0,1236,16254;VDB=0.0801	PL	255,255,0,255,255,255
    scaf.1	82	.	N	G,T,X	0	.	DP=144;I16=0,0,136,8,0,0,5495,210839,0,0,8299,488051,0,0,1271,16481;VDB=0.0808	PL	255,255,0,255,255,255,255,255,255,255
    scaf.1	83	.	N	A,X	0	.	DP=146;I16=0,0,138,8,0,0,5587,214799,0,0,8419,495251,0,0,1307,16845;VDB=0.0808	PL	255,255,0,255,255,255
    scaf.1	84	.	N	A,X	0	.	DP=145;I16=0,0,137,8,0,0,5586,216040,0,0,8359,491651,0,0,1347,17347;VDB=0.0808	PL	255,255,0,255,255,255
    scaf.1	85	.	N	G,X	0	.	DP=151;I16=0,0,141,10,0,0,5817,225869,0,0,8688,510492,0,0,1390,18038;VDB=0.0808	PL	255,255,0,255,255,255
    scaf.1	86	.	N	G,X	0	.
    Looking at the BAM files with the scaffold reference in a browser and visualizing SNPs, I see good coverage and the SNPs are pretty obvious. Searching the no-BCF VCF files (example above) for positions in the reference where the sequence is good and I can see SNPs, the position is called as a SNP and the alternative allele is correct, but the reference is still N and Quality is 0.

    Using the same pipeline with a standard reference recovers SNPs in the VCF just as expected.
    Last edited by Genomics101; 11-27-2012, 10:30 AM. Reason: Clarity

  • #2
    Two quick things to check:

    1) remake the fasta index on your reference fasta with samtools faidx. Does it complete with no errors?

    2) double check that the names in your reference fasta match the names in your .bam. Maybe there are weird spaces or symbols that are being truncated in the .bam?

    Comment


    • #3
      @swbarnes2

      Thanks very much. faidx completes without errors. but I don't think my pipeline was making the .fai files as it should have been (I think they were empty). Making them separately this way makes nice big indices and I'll re-run the analyses with these. Any idea why samtools wasn't making the .fai properly? Thanks again!

      Later that same day....

      It worked! I still don't know why SAMtools was't making the .fais as it normally does automatically when I make the mpileup, but happy to have things working.
      Last edited by Genomics101; 11-27-2012, 12:19 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        Today, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 07:17 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-02-2024, 08:06 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-30-2024, 12:17 PM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-29-2024, 10:49 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Working...
      X