Seqanswers Leaderboard Ad

**laura** · 02-15-2013, 03:13 AM

The best way to get a consensus sequence based on the 1000genomes project official snp set it to use the vcf file and the vcftools

Our genotypes can be found in our release directories, we always distribute one file which contains all the sites and then one file per chr which contain the genotypes for all the individuals types

The data referred to in our recent paper is available from here

Index of /vol1/ftp/phase1/analysis_results/integrated_call_sets

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/

(See here for the paper http://www.1000genomes.org/announcem...mes-2012-10-31 which is freely available)

You can get a specific piece of one of our files using

1000genomes.org - 1000genomes Resources and Information.

http://www.1000genomes.org/faq/how-do-i-get-sub-section-vcf-file

1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

You can generate a consensus using the vcftools perl script vcf-consensus

VCFtools: Perl tools and API

http://vcftools.sourceforge.net/perl_module.html#vcf-consensus

Please note

First this will reflect snps and indels but of course misses large copy number changes (I don't know if the script would handle our deletions)

Second the samtools mpileup version does work but it does not give you the consortium view on an individual but just the snps/indels predicted by samtools which is a different thing

**slengyel** · 01-13-2014, 03:23 PM

vcf to fasta

""

Originally posted by kriikku View Post

See here for one way to get a .vcf file with SNPs and indels from the .bam file, or a consensus sequence:

Multisample SNP Calling

http://samtools.sourceforge.net/mpileup.shtml

The consensus sequence generated by this method has the problem that it only applies the SNPs to the reference sequence, but not the indels.
The .vcf file is better since it includes both SNPs and indels.

The .vcf file can be converted to a .fasta sequence using this tool:

https://www.broadinstitute.org/gatk/...Reference.html

However, note that this tool will only take into account indels of length up to 2 bases (as of January 2013). You may want to write your own script to insert all the indels (including the longer ones) from the .vcf into the .fasta.

This method should get the whole sequence from the .bam file, however, I don't know how to extract individual chromosomes from it.

"""

The link you posted for .vcf to fasta seq is not found. Can you repost with proper link?

**gringer** · 01-14-2014, 12:35 AM

I feel like a broken record.... About a year ago I modified vcftools (and posted patches to sourceforge) to work with a 'variant-only' VCF file coupled with a reference sequence, and was able to generate a consensus sequence that includes both INDELs (of any length specified in the VCF file) as well as SNPs. Code can be found at this post:

Illumina - mapping genome on reference - how to extract assembly? - SEQanswers

http://seqanswers.com/forums/showthread.php?p=122628#post122628

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, 06-07-2024, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-07-2024, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 21 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 14 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News