Header Leaderboard Ad
Collapse
New Resources for 1000 Genomes
Collapse
Announcement
Collapse
SEQanswers June Challenge Has Begun!
The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
This topic is closed.
X
This is a sticky topic.
X
X
-
Please is there any one can help me how can I BLAST one FASTE file with more than 3000 DNA sequences generated from fungus community.
Leave a comment:
-
Mokhtar you would be better creating a new thread for your question, this isnt really related to the 1000 genomes project
If you let people know what your sequences (dna, cdna, protein?) are and what species you are working in they will probably be able to offer better advice
Leave a comment:
-
Please is there any one can help me how can I BLAST one FASTE file with more than 3000 sequences
Leave a comment:
-
@papori I had the same issue. You can download the "sequence.index" file from the ftp site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/). In Excel, I ended up making a new column where I divided BASE_COUNT by READ_COUNT. You can then filter the read length you are looking for.
Leave a comment:
-
These two data sets represent our most recent set of alignments and the frozen alignments used for the phase1 analysis effort
There will be overlapping individuals between the two sets but no bam files should be the same as an extended version of GRCh37 is being used for the post phase1 mapping
see http://www.1000genomes.org/faq/which...bly-do-you-use
You should be able to tell the difference between these files by the YYYYMMDD in their name as this points to the sequence index they were based on
Leave a comment:
-
I'm going to download Bam files from the Project.
I see two links:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/
and
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/
There are some overlapping files between the two links.
I would like to know which one I should use?
Thanks,
Leave a comment:
-
Hi,
Sorry if this is already been asked, I didn't find it..
I try to figure out if I can do a search by read length.
I am looking for reads length 101.
Is there a way to know this information before downloading?
I looked in the sequence.index but I didn't find this.
Thanks in advance,
Pap
Thanks,
Leave a comment:
-
Hello Rama
Unfortunately I can not recreate your issue
Using your command I get a vcf file which just contains genotypes for NA10851
Leave a comment:
-
Hi Laura,
I am still having trouble with extracting the variant calls of a specific sample.
As you pointed out earlier that I have downloaded sites file with no genotype column, so now I got this version ALL.2of4intersection.20100804.genotypes.vcf.gz vcf and tbi file from ftp site (release/20100804).
and I used the following command to subset the vcf file
tabix -fh /Volumes/Macintosh_HD_3/1000Genome/ALL.2of4intersection.20100804.genotypes.vcf.gz 1 | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851 > NA10851/NA10851_chr1
but strangely the out-put file has all genotype columns. I have been following the directions given on the 1000 genome except the I don't give the range for chromosome as I want to get all variants. So I tried with giving the coordinates (see below) and result file has just the header only.
here is the command i used,
tabix -fh /Volumes/Macintosh_HD_3/1000Genome/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:1-243199373 | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851 > NA10851/NA10851_chr2
so now I really don't know what I am doing wrong in trying to subset the vcf file. I really appreciate for your kind help so far and would be very grateful if you could help me how to solve this.
thank you so much.
Rama
Leave a comment:
-
You need to give tabix some sort of chromosome name otherwise it doesn't know what to fetch
If you just want to filter the whole file you will need to use zcat
That being said you downloaded the sites file which contains no genotype info and no columns with individual genotypes to filter
Leave a comment:
-
Laura,
I tried with downloading both the vcf.gz and tbi files. but it did not work and it is difficult to interpret the error. can you see what I am doing wrong here
./tabix -h /Volumes/Macintosh\ HD\ 3/1000Genome/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851
[tabix] the index file exists. Please use '-f' to overwrite.
Broken VCF header, no column names?
at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 177
Vcf::throw('Vcf4_1=HASH(0x7fa9d982f8d8)', 'Broken VCF header, no column names?') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 869
VcfReader::_read_column_names('Vcf4_1=HASH(0x7fa9d982f8d8)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 604
VcfReader:arse_header('Vcf4_1=HASH(0x7fa9d982f8d8)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl/vcf-subset line 122
main::vcf_subset('HASH(0x7fa9d98288f0)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl/vcf-subset line 12
many thanks for your kind help
Leave a comment:
-
there is another paper with 113 pages, "supplemental information"
with a referrence:
...
38!
Garrison,!E.!K.!vcflib$K$a$simple$C++$library$for$parsing$and$manipulating$
VCF$files,!<https://github.com/ekg/vcflib>!(2012).!
pointing back to :
which is 19 pagesLast edited by gsgs; 12-07-2012, 08:15 AM.
Leave a comment:
-
You can give tabix a whole chromosome but be aware tabix can not cope with losed network connectivity so when streaming large data volumes that can cause lossed data which means you may need to download the whole file
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Developments in sequencing technologies and methodologies have transformed the field of epigenetics, giving researchers a better way to understand the complex world of gene regulation and heritable modifications. This article explores some of the diverse sequencing methods employed in the study of epigenetics, ranging from classic techniques to cutting-edge innovations while providing a brief overview of their processes, applications, and advances.
Methylation Detect...-
Channel: Articles
05-31-2023, 10:46 AM -
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:14 AM
|
0 responses
4 views
0 likes
|
Last Post
by seqadmin
Today, 07:14 AM
|
||
Started by seqadmin, Yesterday, 01:08 PM
|
0 responses
6 views
0 likes
|
Last Post
by seqadmin
Yesterday, 01:08 PM
|
||
Started by seqadmin, 06-01-2023, 08:56 PM
|
0 responses
104 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 08:56 PM
|
||
Deep Sequencing Unearths Novel Genetic Variants: Enhancing Precision Medicine for Vascular Anomalies
by seqadmin
Started by seqadmin, 06-01-2023, 07:33 AM
|
0 responses
241 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 07:33 AM
|
Leave a comment: