Hello guys,
please i cant see SNP ID in my data the ID column is represented with dot (.)
and now i am trying to filter out snp from the indel. samtools was used for the calling.
how can i do this.
thanks
chr1 8686 . T C 38.7 MfGtMis;AltSup AC1=12;AF1=1;DP4=0,0,1,5;DP=6;FQ=-28.6;MQ=16;MfGt=1/1;MinDP=0;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:0,0,0:0:0:5 1/1:40,9,0:3:0:13 1/1:0,0,0:0:0:5 1/1:0,0,0:0:0:5 1/1:34,9,0:3:0:13 1/1:0,0,0:0:0:5
chr1 10802 . T C,A 999 MfGtMis AC1=12;AF1=1;DP4=0,0,5,17;DP=284;FQ=-38.1;MQ=33;MfGt=1/1;MinDP=2;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:91,15,0,91,15,91:5:0:31 1/1:131,18,0,131,18,131:6:0:34 1/1:53,6,0,53,6,53:2:0:22 1/1:44,6,0,44,6,44:2:0:22 1/1:67,9,0,67,9,67:3:0:25 1/1:70,21,12,55,0,52:4:0:25
chr1 10815 . A G 999 MfGtMis AC1=12;AF1=1;DP4=0,0,26,11;DP=315;FQ=-42.4;MQ=38;MfGt=1/1;MinDP=3;NeqMfGt=0 GT:PL : DP:SP:GQ 1/1:109,18,0:6:0:38 1/1:188,39,0:13:0:59 1/1:120,15,0:5:0:35 1/1:69,9,0:3:0:29 1/1:43,9,0:3:0:29 1/1:89,21,0:7:0:41
chr1 10836 . C A 999 MfGtMis AC1=11;AF1=0.9836;DP4=2,0,32,5;DP=313;FQ=-28.4;MQ=33;MfGt=1/1;MinDP=1;NeqMfGt=0;PV4=1,4.1e-10,1,1 GT:PL : DP:SP:GQ 1/1:49,12,0:4:0:21 1/1:15,3,0:1:0:12 1/1:90,23,0:11:0:32 1/1:83,0,8:7:0:3 1/1:130,39,0:13:0:48 1/1:56,9,0:3:0:18
Header Leaderboard Ad
Collapse
Ideas to filter known SNPs
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
vcf-isec is exactly what you are looking for: intersections, complements etc. on VCF and TAB delimited files.
Leave a comment:
-
Originally posted by dawe View PostNevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d
I could not find how to perform this second step.
I am trying to filter out some known SNPs from dbSNP135. while I have both my variation file and the dbSNP file (both .vcf) I can't find a way to exclude the latter from the former.
Leave a comment:
-
@bioinfosm: Just use the -D parameter with a dbSNP .rod file. See the GATK wiki for more about how that works (it's part of their default variant calling flow).
Originally posted by dawe View PostNevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d
Use the --exclude parameter and feed it a list of all the SNP IDs you used to mark your VCF when you ran it through the Unified Genotyper.
Since you used GATK, you can make that list from the SNP .rod pretty easily:
Code:awk '{print $5}' dbsnp_130.rod > dbsnp_130_snpIDs.txt
Code:vcftools --exclude dbsnp_130_snpIDs.txt --vcf <in.vcf> --out <out.prefix>
Not bad, and doesn't take too long to run.
Of course, you can pretty easily grep out the lines that have an ID to accomplish the same thing almost instantaneously, but still.
Leave a comment:
-
@dawe,
It would be great if you could share the gatk functionality to do so. Better use use a standard and available tool than re-invent
Leave a comment:
-
I have a script you can mod to remove SNPs in dbSNP. Just holla' if you need/want it. Just tweak it to perform on a vcf of bed file.
Leave a comment:
-
Originally posted by bioinfosm View Postthat sounds like a useful thing to do. but looking at the GATK Unified genotyper, it seems more of a multiple samples tool than one to exclude known dbSNP variants, etc.
Did I miss something here
I could use it to identify variants *and* to filter out known ones.
d
Leave a comment:
-
that sounds like a useful thing to do. but looking at the GATK Unified genotyper, it seems more of a multiple samples tool than one to exclude known dbSNP variants, etc.
Did I miss something here
Leave a comment:
-
Nevermind, I've realized that I can feed GATK Unified Genotyper with known SNPs and filter them out in a second step
d
Leave a comment:
-
Ideas to filter known SNPs
Hi all, I have a VCF file which contains a raw list of mutations/snps for my study. I would like to exclude known SNPs from dbSNP131/hg19 (which I also have in VCF format).
I was thinking about BEDTools, something like
Code:intersectBed -a MyList.vcf -b hg19.snp131.vcf.gz -v > specific.vcf
Does anybody have an idea/processing pipeline to deal with this? I was looking at vcftools but couldn't find anything helpful
d
Latest Articles
Collapse
-
by seqadmin
Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...-
Channel: Articles
03-21-2023, 01:49 PM -
-
by seqadmin
Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...-
Channel: Articles
03-10-2023, 05:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-24-2023, 02:45 PM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
03-24-2023, 02:45 PM
|
||
Started by seqadmin, 03-22-2023, 12:26 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
03-22-2023, 12:26 PM
|
||
Started by seqadmin, 03-17-2023, 12:32 PM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
03-17-2023, 12:32 PM
|
||
Started by seqadmin, 03-15-2023, 12:42 PM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
03-15-2023, 12:42 PM
|
Leave a comment: