Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes VCF Files

    Hi all,
    I am conducting whole genome analysis on 1K Genomes data and I need some help understanding the VCF file

    First, my VCF file only contains 1/0 and 1/1 genotypes. Why i am not seeing 0/0. Also, does it mean that if a genomic location (Chr:location) is not in the VCF then there is no variation detection at that location? (which would be the same as 0/0)

    Second, if i have to find out the genotype at a particular location how does one do that. We have some variants which are not in dbSNP, how can I view them in a VCF file?

    I am using SAMTools and VCFUtils all under standard alrorithm settings.

    Thanks,
    Ashwin

  • #2
    First there being no variation at a particular site is not the same as an individual being 0/0. The site might not be variable at all or the data you have may not contain evidence of variation so be very careful about the assumptions you make about the absence of data

    While it would seem unlikely that one individual was only homozygous non ref or heterozygous it isn't totally impossible especially if you are only considering a small range

    Can you actually give us your command lines and which individual you are considering as this will make it a lot easier for us to try and explain what is going on

    Comment


    • #3
      Hi there,
      I am following the following workflow:

      Sample ID: HG00096

      First, indexing the bam file using samtool index
      Second, sorting the bam file using samtools sort eg2.bam eg2.sorted
      Third, generating a bcf file using samtools mpileup -uf reference_genome_file eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

      The reference human genome file is the one obtained from 1kGenomes, Human_37_*.fasta

      Lastly, converting my bcf file to vcf using bcftools view *.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > *.vcf

      I am interested in knowing genotypes for rs numbered dbSNPs and other variants for which I have HG37 genomic coordinates.

      When i follow the above workflow, i am not getting 0/0 genotypes and at the same time some of the genomic locations also do not show up. This is why my first question was if absence of variation should be considered a 0/0 genotype?

      Thanks,
      Ashwin

      Comment


      • #4
        Is there a particular reason you are generating the snps yourself rather than using our release data?

        ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/

        If there are no homozygous ref snps across the whole genome I would run your process again to make sure you didn't make an error then I would report the problem to the samtools-help mailing list to see if they know what is going on as if it is a problem it is either with the way you ran the caller or with the results the caller produced not the vcf file

        Comment


        • #5
          You might want to consider retitling this thread or asking a new question about the best use of samtools mpileup as this is not my area of expertise

          Comment


          • #6
            the reason why i am calling SNPs is because i am trying to setup an analysis pipeline using the 1KGenomes data. I want to make sure we have got all the parameters correct but the 0/0 issue still remains unresolved.

            Comment


            • #7
              Ask a question about samtools and snp calling then rather than 1000 genomes and I suspect you will get a lot more help

              I would recommend looking at the genotypes we do have for the individual you are trying with as that will be a good indiciator of how accurate what you are doing is

              Comment


              • #8
                Already posted, titled Samtools mpileup, I will also check the genotypes as you suggested.

                Comment


                • #9
                  Im looking for a link of genotypes vcf of the latest 1000 genome release and the corresponding reference. I can see one at the link below but it seems to the older release (629 individuals and vcf 4.0). Can I know where I can find the same file in the newer release.

                  ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz

                  Thank you,
                  Teja

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    11-06-2024, 07:24 PM
                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 11:09 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Today, 06:13 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X