Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes VCF Files

    Hi all,
    I am conducting whole genome analysis on 1K Genomes data and I need some help understanding the VCF file

    First, my VCF file only contains 1/0 and 1/1 genotypes. Why i am not seeing 0/0. Also, does it mean that if a genomic location (Chr:location) is not in the VCF then there is no variation detection at that location? (which would be the same as 0/0)

    Second, if i have to find out the genotype at a particular location how does one do that. We have some variants which are not in dbSNP, how can I view them in a VCF file?

    I am using SAMTools and VCFUtils all under standard alrorithm settings.

    Thanks,
    Ashwin

  • #2
    First there being no variation at a particular site is not the same as an individual being 0/0. The site might not be variable at all or the data you have may not contain evidence of variation so be very careful about the assumptions you make about the absence of data

    While it would seem unlikely that one individual was only homozygous non ref or heterozygous it isn't totally impossible especially if you are only considering a small range

    Can you actually give us your command lines and which individual you are considering as this will make it a lot easier for us to try and explain what is going on

    Comment


    • #3
      Hi there,
      I am following the following workflow:

      Sample ID: HG00096

      First, indexing the bam file using samtool index
      Second, sorting the bam file using samtools sort eg2.bam eg2.sorted
      Third, generating a bcf file using samtools mpileup -uf reference_genome_file eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

      The reference human genome file is the one obtained from 1kGenomes, Human_37_*.fasta

      Lastly, converting my bcf file to vcf using bcftools view *.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > *.vcf

      I am interested in knowing genotypes for rs numbered dbSNPs and other variants for which I have HG37 genomic coordinates.

      When i follow the above workflow, i am not getting 0/0 genotypes and at the same time some of the genomic locations also do not show up. This is why my first question was if absence of variation should be considered a 0/0 genotype?

      Thanks,
      Ashwin

      Comment


      • #4
        Is there a particular reason you are generating the snps yourself rather than using our release data?

        ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/

        If there are no homozygous ref snps across the whole genome I would run your process again to make sure you didn't make an error then I would report the problem to the samtools-help mailing list to see if they know what is going on as if it is a problem it is either with the way you ran the caller or with the results the caller produced not the vcf file

        Comment


        • #5
          You might want to consider retitling this thread or asking a new question about the best use of samtools mpileup as this is not my area of expertise

          Comment


          • #6
            the reason why i am calling SNPs is because i am trying to setup an analysis pipeline using the 1KGenomes data. I want to make sure we have got all the parameters correct but the 0/0 issue still remains unresolved.

            Comment


            • #7
              Ask a question about samtools and snp calling then rather than 1000 genomes and I suspect you will get a lot more help

              I would recommend looking at the genotypes we do have for the individual you are trying with as that will be a good indiciator of how accurate what you are doing is

              Comment


              • #8
                Already posted, titled Samtools mpileup, I will also check the genotypes as you suggested.

                Comment


                • #9
                  Im looking for a link of genotypes vcf of the latest 1000 genome release and the corresponding reference. I can see one at the link below but it seems to the older release (629 individuals and vcf 4.0). Can I know where I can find the same file in the newer release.

                  ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz

                  Thank you,
                  Teja

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X