Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • laura
    replied
    Start a new thread, this is not the right place for this question

    Leave a comment:


  • Mokhtar
    replied
    Please is there any one can help me how can I BLAST one FASTE file with more than 3000 DNA sequences generated from fungus community.

    Leave a comment:


  • laura
    replied
    Mokhtar you would be better creating a new thread for your question, this isnt really related to the 1000 genomes project

    If you let people know what your sequences (dna, cdna, protein?) are and what species you are working in they will probably be able to offer better advice

    Leave a comment:


  • Mokhtar
    replied
    Please is there any one can help me how can I BLAST one FASTE file with more than 3000 sequences

    Leave a comment:


  • jgibbons1
    replied
    @papori I had the same issue. You can download the "sequence.index" file from the ftp site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/). In Excel, I ended up making a new column where I divided BASE_COUNT by READ_COUNT. You can then filter the read length you are looking for.

    Leave a comment:


  • laura
    replied
    These two data sets represent our most recent set of alignments and the frozen alignments used for the phase1 analysis effort

    There will be overlapping individuals between the two sets but no bam files should be the same as an extended version of GRCh37 is being used for the post phase1 mapping

    see http://www.1000genomes.org/faq/which...bly-do-you-use

    You should be able to tell the difference between these files by the YYYYMMDD in their name as this points to the sequence index they were based on

    Leave a comment:


  • southan
    replied
    I'm going to download Bam files from the Project.
    I see two links:
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/
    and
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/

    There are some overlapping files between the two links.

    I would like to know which one I should use?

    Thanks,

    Leave a comment:


  • papori
    replied
    Hi,
    Sorry if this is already been asked, I didn't find it..

    I try to figure out if I can do a search by read length.
    I am looking for reads length 101.
    Is there a way to know this information before downloading?
    I looked in the sequence.index but I didn't find this.

    Thanks in advance,
    Pap

    Thanks,

    Leave a comment:


  • rama
    replied
    Thanks Laura, for trying it out.

    Leave a comment:


  • laura
    replied
    Hello Rama

    Unfortunately I can not recreate your issue

    Using your command I get a vcf file which just contains genotypes for NA10851

    Leave a comment:


  • rama
    replied
    Hi Laura,

    I am still having trouble with extracting the variant calls of a specific sample.

    As you pointed out earlier that I have downloaded sites file with no genotype column, so now I got this version ALL.2of4intersection.20100804.genotypes.vcf.gz vcf and tbi file from ftp site (release/20100804).

    and I used the following command to subset the vcf file

    tabix -fh /Volumes/Macintosh_HD_3/1000Genome/ALL.2of4intersection.20100804.genotypes.vcf.gz 1 | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851 > NA10851/NA10851_chr1

    but strangely the out-put file has all genotype columns. I have been following the directions given on the 1000 genome except the I don't give the range for chromosome as I want to get all variants. So I tried with giving the coordinates (see below) and result file has just the header only.

    here is the command i used,
    tabix -fh /Volumes/Macintosh_HD_3/1000Genome/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:1-243199373 | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851 > NA10851/NA10851_chr2

    so now I really don't know what I am doing wrong in trying to subset the vcf file. I really appreciate for your kind help so far and would be very grateful if you could help me how to solve this.

    thank you so much.
    Rama

    Leave a comment:


  • laura
    replied
    You need to give tabix some sort of chromosome name otherwise it doesn't know what to fetch

    If you just want to filter the whole file you will need to use zcat

    That being said you downloaded the sites file which contains no genotype info and no columns with individual genotypes to filter

    Leave a comment:


  • rama
    replied
    Laura,

    I tried with downloading both the vcf.gz and tbi files. but it did not work and it is difficult to interpret the error. can you see what I am doing wrong here

    ./tabix -h /Volumes/Macintosh\ HD\ 3/1000Genome/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz | perl ~/othertools/vcftools_0.1.10/perl/vcf-subset -c NA10851

    [tabix] the index file exists. Please use '-f' to overwrite.
    Broken VCF header, no column names?
    at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 177
    Vcf::throw('Vcf4_1=HASH(0x7fa9d982f8d8)', 'Broken VCF header, no column names?') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 869
    VcfReader::_read_column_names('Vcf4_1=HASH(0x7fa9d982f8d8)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl//Vcf.pm line 604
    VcfReader:arse_header('Vcf4_1=HASH(0x7fa9d982f8d8)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl/vcf-subset line 122
    main::vcf_subset('HASH(0x7fa9d98288f0)') called at /Users/molpathuser1/othertools/vcftools_0.1.10/perl/vcf-subset line 12

    many thanks for your kind help

    Leave a comment:


  • gsgs
    replied
    there is another paper with 113 pages, "supplemental information"


    with a referrence:

    ...
    38!
    Garrison,!E.!K.!vcflib$K$a$simple$C++$library$for$parsing$and$manipulating$
    VCF$files,!<https://github.com/ekg/vcflib>!(2012).!

    pointing back to :

    1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!


    which is 19 pages
    Last edited by gsgs; 12-07-2012, 08:15 AM.

    Leave a comment:


  • laura
    replied
    You can give tabix a whole chromosome but be aware tabix can not cope with losed network connectivity so when streaming large data volumes that can cause lossed data which means you may need to download the whole file

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:49 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X