Hello guys,
please i cant see SNP ID in my data the ID column is represented with dot (.)
and now i am trying to filter out snp from the indel. samtools was used for the calling.
how can i do this.
thanks
chr1 8686 . T C 38.7 MfGtMis;AltSup AC1=12;AF1=1;DP4=0,0,1,5;DP=6;FQ=-28.6;MQ=16;MfGt=1/1;MinDP=0;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:0,0,0:0:0:5 1/1:40,9,0:3:0:13 1/1:0,0,0:0:0:5 1/1:0,0,0:0:0:5 1/1:34,9,0:3:0:13 1/1:0,0,0:0:0:5
chr1 10802 . T C,A 999 MfGtMis AC1=12;AF1=1;DP4=0,0,5,17;DP=284;FQ=-38.1;MQ=33;MfGt=1/1;MinDP=2;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:91,15,0,91,15,91:5:0:31 1/1:131,18,0,131,18,131:6:0:34 1/1:53,6,0,53,6,53:2:0:22 1/1:44,6,0,44,6,44:2:0:22 1/1:67,9,0,67,9,67:3:0:25 1/1:70,21,12,55,0,52:4:0:25
chr1 10815 . A G 999 MfGtMis AC1=12;AF1=1;DP4=0,0,26,11;DP=315;FQ=-42.4;MQ=38;MfGt=1/1;MinDP=3;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:109,18,0:6:0:38 1/1:188,39,0:13:0:59 1/1:120,15,0:5:0:35 1/1:69,9,0:3:0:29 1/1:43,9,0:3:0:29 1/1:89,21,0:7:0:41
chr1 10836 . C A 999 MfGtMis AC1=11;AF1=0.9836;DP4=2,0,32,5;DP=313;FQ=-28.4;MQ=33;MfGt=1/1;MinDP=1;NeqMfGt=0;PV4=1,4.1e-10,1,1 GT:PL : DP:SP:GQ 1/1:49,12,0:4:0:21 1/1:15,3,0:1:0:12 1/1:90,23,0:11:0:32 1/1:83,0,8:7:0:3 1/1:130,39,0:13:0:48 1/1:56,9,0:3:0:18
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
vcf-isec is exactly what you are looking for: intersections, complements etc. on VCF and TAB delimited files.
Leave a comment:
-
Originally posted by dawe View PostNevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d
I could not find how to perform this second step.
I am trying to filter out some known SNPs from dbSNP135. while I have both my variation file and the dbSNP file (both .vcf) I can't find a way to exclude the latter from the former.
Leave a comment:
-
@bioinfosm: Just use the -D parameter with a dbSNP .rod file. See the GATK wiki for more about how that works (it's part of their default variant calling flow).
Originally posted by dawe View PostNevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d
Use the --exclude parameter and feed it a list of all the SNP IDs you used to mark your VCF when you ran it through the Unified Genotyper.
Since you used GATK, you can make that list from the SNP .rod pretty easily:
Code:awk '{print $5}' dbsnp_130.rod > dbsnp_130_snpIDs.txt
Code:vcftools --exclude dbsnp_130_snpIDs.txt --vcf <in.vcf> --out <out.prefix>
Not bad, and doesn't take too long to run.
Of course, you can pretty easily grep out the lines that have an ID to accomplish the same thing almost instantaneously, but still.
Leave a comment:
-
@dawe,
It would be great if you could share the gatk functionality to do so. Better use use a standard and available tool than re-invent
Leave a comment:
-
I have a script you can mod to remove SNPs in dbSNP. Just holla' if you need/want it. Just tweak it to perform on a vcf of bed file.
Leave a comment:
-
Originally posted by bioinfosm View Postthat sounds like a useful thing to do. but looking at the GATK Unified genotyper, it seems more of a multiple samples tool than one to exclude known dbSNP variants, etc.
Did I miss something here
I could use it to identify variants *and* to filter out known ones.
d
Leave a comment:
-
that sounds like a useful thing to do. but looking at the GATK Unified genotyper, it seems more of a multiple samples tool than one to exclude known dbSNP variants, etc.
Did I miss something here
Leave a comment:
-
Nevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d
Leave a comment:
-
Ideas to filter known SNPs
Hi all, I have a VCF file which contains a raw list of mutations/snps for my study. I would like to exclude known SNPs from dbSNP131/hg19 (which I also have in VCF format).
I was thinking about BEDTools, something like
Code:intersectBed -a MyList.vcf -b hg19.snp131.vcf.gz -v > specific.vcf
Does anybody have an idea/processing pipeline to deal with this? I was looking at vcftools but couldn't find anything helpful
d
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: