Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes VCF Files

    Hi all,
    I am conducting whole genome analysis on 1K Genomes data and I need some help understanding the VCF file

    First, my VCF file only contains 1/0 and 1/1 genotypes. Why i am not seeing 0/0. Also, does it mean that if a genomic location (Chr:location) is not in the VCF then there is no variation detection at that location? (which would be the same as 0/0)

    Second, if i have to find out the genotype at a particular location how does one do that. We have some variants which are not in dbSNP, how can I view them in a VCF file?

    I am using SAMTools and VCFUtils all under standard alrorithm settings.

    Thanks,
    Ashwin

  • #2
    First there being no variation at a particular site is not the same as an individual being 0/0. The site might not be variable at all or the data you have may not contain evidence of variation so be very careful about the assumptions you make about the absence of data

    While it would seem unlikely that one individual was only homozygous non ref or heterozygous it isn't totally impossible especially if you are only considering a small range

    Can you actually give us your command lines and which individual you are considering as this will make it a lot easier for us to try and explain what is going on

    Comment


    • #3
      Hi there,
      I am following the following workflow:

      Sample ID: HG00096

      First, indexing the bam file using samtool index
      Second, sorting the bam file using samtools sort eg2.bam eg2.sorted
      Third, generating a bcf file using samtools mpileup -uf reference_genome_file eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

      The reference human genome file is the one obtained from 1kGenomes, Human_37_*.fasta

      Lastly, converting my bcf file to vcf using bcftools view *.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > *.vcf

      I am interested in knowing genotypes for rs numbered dbSNPs and other variants for which I have HG37 genomic coordinates.

      When i follow the above workflow, i am not getting 0/0 genotypes and at the same time some of the genomic locations also do not show up. This is why my first question was if absence of variation should be considered a 0/0 genotype?

      Thanks,
      Ashwin

      Comment


      • #4
        Is there a particular reason you are generating the snps yourself rather than using our release data?

        ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/

        If there are no homozygous ref snps across the whole genome I would run your process again to make sure you didn't make an error then I would report the problem to the samtools-help mailing list to see if they know what is going on as if it is a problem it is either with the way you ran the caller or with the results the caller produced not the vcf file

        Comment


        • #5
          You might want to consider retitling this thread or asking a new question about the best use of samtools mpileup as this is not my area of expertise

          Comment


          • #6
            the reason why i am calling SNPs is because i am trying to setup an analysis pipeline using the 1KGenomes data. I want to make sure we have got all the parameters correct but the 0/0 issue still remains unresolved.

            Comment


            • #7
              Ask a question about samtools and snp calling then rather than 1000 genomes and I suspect you will get a lot more help

              I would recommend looking at the genotypes we do have for the individual you are trying with as that will be a good indiciator of how accurate what you are doing is

              Comment


              • #8
                Already posted, titled Samtools mpileup, I will also check the genotypes as you suggested.

                Comment


                • #9
                  Im looking for a link of genotypes vcf of the latest 1000 genome release and the corresponding reference. I can see one at the link below but it seems to the older release (629 individuals and vcf 4.0). Can I know where I can find the same file in the newer release.

                  ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz

                  Thank you,
                  Teja

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    06-06-2024, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:58 AM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-12-2024, 02:20 PM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-07-2024, 06:58 AM
                  0 responses
                  182 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-06-2024, 08:18 AM
                  0 responses
                  231 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X