Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • R, vcfr, extrcat.gt, bcftools filter

    Hello,
    I have been unable to extract depth 'DP' info from my vcf files (generated with latest bcftools variant calling (SNPs)) after being converted to vcfr for R analyses.

    bcftools may break the vcf formatting. I cannot extract depth data in R packages, for e.g., all I get is 'NA' for each depth (DP). I can see the DP value in the vcf, but R packages cannot see it.

    VCF format after bcftools (DP values are clearly present):
    Chr10_nm_RagTag 187 . A T 4.82181 LowQual DP=38;VDB=0.407747;SGB=-0.0966945;RPB=0.0621765;MQB=0.984496;MQSB=0.618783;BQB=0.810528;MQ0F=0.157895;AC=2;AN=26;DP4=7,2,3,3;MQ=9 GT:PL:AD:QS ./.:0,0,0:0,0:0,0 0/0:0,3,4:1,0:4,0 ./.:0,0,0:0,0:0,0 ./.:0,0,0:0,0:0,0 ./.:0,0,0:0,0:0,0 ./.:0,0,0:0,0:0,0 ./.:0,

    but the often used function "extract.gt" misses the DP values and only returns items highlighted in green.

    > dp <- extract.gt(vcf, element = "DP", as.numeric=TRUE)
    > dp[1:4,1:6]
    MZ195912 MZ195913 MZ195914 MZ195915 MZ195916 MZ195930
    Chr10_nm_RagTag_187 NA NA NA NA NA NA
    Chr10_nm_RagTag_188 NA NA NA NA NA NA
    Chr10_nm_RagTag_194 NA NA NA NA NA NA
    Chr10_nm_RagTag_195 NA NA NA NA NA NA

    I think this is because bcftools filtering placed DP info in the wrong field and when data are converted to the required R format, the info is lost. If I do a head on the converted data I see no DP info:

    head(vcf)
    [1] "***** Object of class 'vcfR' *****"
    [1] "***** Meta section *****"
    [1] "##fileformat=VCFv4.2"
    [1] "##FILTER=<ID=PASS,Description="All filters passed">"
    [1] "##bcftoolsVersion=1.12+htslib-1.12"
    [1] "##bcftoolsCommand=mpileup -Ou -I -B -C 50 -a QS -a AD --threads 32 - [Truncated]"
    [1] "##reference=file:///home/bsimison/Projects/Neotoma/96_LC-genomes/Nbr [Truncated]"
    [1] "##contig=<ID=Chr10_nm_RagTag,length=45123732>"
    [1] "First 6 rows."
    [1]
    [1] "***** Fixed section *****"
    CHROM POS ID REF ALT QUAL FILTER
    [1,] "Chr10_nm_RagTag" "187" NA "A" "T" "4.82181" "LowQual"
    [2,] "Chr10_nm_RagTag" "188" NA "G" "A" "42.0559" "LowQual"
    [3,] "Chr10_nm_RagTag" "194" NA "G" "C" "3.74867" "LowQual"
    [4,] "Chr10_nm_RagTag" "195" NA "A" "C" "5.61943" "LowQual"
    [5,] "Chr10_nm_RagTag" "488" NA "G" "C" "32.0871" "PASS"
    [6,] "Chr10_nm_RagTag" "518" NA "T" "C" "157.534" "PASS"
    [1]
    [1] "***** Genotype section *****"
    FORMAT MZ195912 MZ195913
    [1,] "GT:PL:AD:QS" "./.:0,0,0:0,0:0,0" "0/0:0,3,4:1,0:4,0"
    [2,] "GT:PL:AD:QS" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [3,] "GT:PL:AD:QS" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [4,] "GT:PL:AD:QS" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [5,] "GT:PL:AD:QS" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [6,] "GT:PL:AD:QS" "1/1:29,3,0:0,1:0,29" "./.:0,0,0:0,0:0,0"
    MZ195914 MZ195915 MZ195916
    [1,] "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [2,] "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [3,] "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [4,] "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [5,] "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [6,] "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0" "./.:0,0,0:0,0:0,0"
    [1] "First 6 columns only."
    [1]
    [1] "Unique GT formats:"
    [1] "GT:PL:AD:QS"
    [1]

    you can see that only "GT:PL:AD:QS" are captured, but not DP.
    Anybody know what's going on here?
    Last edited by wbsimey; 08-19-2021, 09:11 AM.

  • #2
    Brian Knaus, on his vcfR Git site, helped me figure this out. Turns out if you want to include Hardy Weinberg in your population variant calling with bcftools, you must include some annotations with the -a option for bcftools mpileup. Any annotations not listed with -a will not be included in the 'gt' slot; they will be in the 'info' slot so "extract.gt" will ignore any annotations not in the 'gt' slot. Alternatively, you can try 'extract.info', but that did not pull DP stats for each individual for me.

    The bcftools commands that solved the above issue was:

    "bcftools mpileup -Ou -I -B -C 50 -a QS -a AD -a DP --threads $THREADS -f $REF $bam_files"
    call_cmd="bcftools call -mv -Oz --threads $THREADS -o $OUTFILE"

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 03-27-2024, 06:37 PM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-27-2024, 06:07 PM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    69 views
    0 likes
    Last Post seqadmin  
    Working...
    X