Seqanswers Leaderboard Ad

**laura** · 12-05-2012, 02:29 PM

As far as chrY

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.chrY.phase1_samtools_si.20101123.snps.low_coverage.genotypes.vcf.gz

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.chrY.genome_strip_hq.20101123.svs.low_coverage.genotypes.vcf.gz

We provide all our variation data in VCF format which serves our needs quite well, if you have a better idea for your own needs then you should be able to get all the info you need from these files to do the conversion

Look at http://www.1000genomes.org/faq/how-d...your-vcf-files for streaming if you want to avoid downloading the entire data set

**rama** · 12-05-2012, 03:18 PM

vcf file of specific sample from 1000Genome data

Hi,

Can anyone help me how to access the vcf file of a specific sample from 1000Genome data. I found the consensus file at (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release) but couldn't find the individual samples.

I am trying to compare the variants found from our sequencing vs 1000Genome. if anyone has done similar analysis please let know I would to discuss wiht you offline.

Thanks in advance
Rama

**laura** · 12-05-2012, 11:10 PM

You should be able to get this info from our vcf files using a combination of tabix anc vcftools vcf-subset as described in our faq

1000genomes.org - 1000genomes Resources and Information.

http://www.1000genomes.org/faq/how-do-i-get-slice-your-vcf-files

1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

**rama** · 12-06-2012, 10:35 AM

Laura,

Thanks much for your reply. I am guessing this is the example for getting the vcf of sample.

tabix -h ftp://ftp-trace.ncbi.nih.gov/1000gen...804/ALL.2of4in... 17:1471000-1472000 | perl /nfs/1000g-work/G1K/work/bin/vcftools/perl/vcf-subset -c HG00098 | bgzip -c /tmp/HG00098.20100804.genotypes.vcf.gz

**laura** · 12-06-2012, 11:17 AM

That is correct

**rama** · 12-06-2012, 05:37 PM

Laura,

how/what should I specify, if I don't have particular region to look at and want to get all genome-wide variants?

Thanks so much for you kind help.

**laura** · 12-06-2012, 11:27 PM

You can give tabix a whole chromosome but be aware tabix can not cope with losed network connectivity so when streaming large data volumes that can cause lossed data which means you may need to download the whole file

1000 Genomes - Sample Subset Of A Vcf File

http://www.biostars.org/p/50752/

**gsgs** · 12-07-2012, 07:46 AM

there is another paper with 113 pages, "supplemental information"

http://www.nature.com/nature/journal/v491/n7422/extref/nature11632-s1.pdf

with a referrence:

...
38!
Garrison,!E.!K.!vcflib$K$a$simple$C++$library$for$parsing$and$manipulating$
VCF$files,!<https://github.com/ekg/vcflib>!(2012).!

pointing back to :

1000genomes.org - 1000genomes Resources and Information.

http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

which is 19 pages

**rama** · 12-07-2012, 12:36 PM

Laura,

I tried with downloading both the vcf.gz and tbi files. but it did not work and it is difficult to interpret the error. can you see what I am doing wrong here

./tabix -h /Volumes/Macintosh\ HD\ 3/1000Genome/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851

[tabix] the index file exists. Please use '-f' to overwrite.
Broken VCF header, no column names?
at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 177
Vcf::throw('Vcf4_1=HASH(0x7fa9d982f8d8)', 'Broken VCF header, no column names?') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 869
VcfReader::_read_column_names('Vcf4_1=HASH(0x7fa9d982f8d8)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 604
VcfReader:

arse_header('Vcf4_1=HASH(0x7fa9d982f8d8)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl/vcf-subset line 122
main::vcf_subset('HASH(0x7fa9d98288f0)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl/vcf-subset line 12

many thanks for your kind help

**laura** · 12-07-2012, 12:49 PM

You need to give tabix some sort of chromosome name otherwise it doesn't know what to fetch

If you just want to filter the whole file you will need to use zcat

That being said you downloaded the sites file which contains no genotype info and no columns with individual genotypes to filter

**rama** · 12-12-2012, 03:58 PM

Hi Laura,

I am still having trouble with extracting the variant calls of a specific sample.

As you pointed out earlier that I have downloaded sites file with no genotype column, so now I got this version ALL.2of4intersection.20100804.genotypes.vcf.gz vcf and tbi file from ftp site (release/20100804).

and I used the following command to subset the vcf file

tabix -fh /Volumes/Macintosh_HD_3/1000Genome/ALL.2of4intersection.20100804.genotypes.vcf.gz 1 | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851 > NA10851/NA10851_chr1

but strangely the out-put file has all genotype columns. I have been following the directions given on the 1000 genome except the I don't give the range for chromosome as I want to get all variants. So I tried with giving the coordinates (see below) and result file has just the header only.

here is the command i used,
tabix -fh /Volumes/Macintosh_HD_3/1000Genome/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:1-243199373 | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851 > NA10851/NA10851_chr2

so now I really don't know what I am doing wrong in trying to subset the vcf file. I really appreciate for your kind help so far and would be very grateful if you could help me how to solve this.

thank you so much.
Rama

**laura** · 12-13-2012, 01:17 AM

Hello Rama

Unfortunately I can not recreate your issue

Using your command I get a vcf file which just contains genotypes for NA10851

**rama** · 12-13-2012, 08:42 AM

Thanks Laura, for trying it out.

**papori** · 12-19-2012, 12:53 AM

Hi,
Sorry if this is already been asked, I didn't find it..

I try to figure out if I can do a search by read length.
I am looking for reads length 101.
Is there a way to know this information before downloading?
I looked in the sequence.index but I didn't find this.

Thanks in advance,
Pap

Thanks,

**southan** · 02-25-2013, 06:44 PM

I'm going to download Bam files from the Project.
I see two links:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/
and
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/

There are some overlapping files between the two links.

I would like to know which one I should use?

Thanks,

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News