Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes VCF Files

    Hi all,
    I am conducting whole genome analysis on 1K Genomes data and I need some help understanding the VCF file

    First, my VCF file only contains 1/0 and 1/1 genotypes. Why i am not seeing 0/0. Also, does it mean that if a genomic location (Chr:location) is not in the VCF then there is no variation detection at that location? (which would be the same as 0/0)

    Second, if i have to find out the genotype at a particular location how does one do that. We have some variants which are not in dbSNP, how can I view them in a VCF file?

    I am using SAMTools and VCFUtils all under standard alrorithm settings.

    Thanks,
    Ashwin

  • #2
    First there being no variation at a particular site is not the same as an individual being 0/0. The site might not be variable at all or the data you have may not contain evidence of variation so be very careful about the assumptions you make about the absence of data

    While it would seem unlikely that one individual was only homozygous non ref or heterozygous it isn't totally impossible especially if you are only considering a small range

    Can you actually give us your command lines and which individual you are considering as this will make it a lot easier for us to try and explain what is going on

    Comment


    • #3
      Hi there,
      I am following the following workflow:

      Sample ID: HG00096

      First, indexing the bam file using samtool index
      Second, sorting the bam file using samtools sort eg2.bam eg2.sorted
      Third, generating a bcf file using samtools mpileup -uf reference_genome_file eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

      The reference human genome file is the one obtained from 1kGenomes, Human_37_*.fasta

      Lastly, converting my bcf file to vcf using bcftools view *.bcf | perl /usr/share/samtools/vcfutils.pl varFilter -D100 > *.vcf

      I am interested in knowing genotypes for rs numbered dbSNPs and other variants for which I have HG37 genomic coordinates.

      When i follow the above workflow, i am not getting 0/0 genotypes and at the same time some of the genomic locations also do not show up. This is why my first question was if absence of variation should be considered a 0/0 genotype?

      Thanks,
      Ashwin

      Comment


      • #4
        Is there a particular reason you are generating the snps yourself rather than using our release data?

        ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/

        If there are no homozygous ref snps across the whole genome I would run your process again to make sure you didn't make an error then I would report the problem to the samtools-help mailing list to see if they know what is going on as if it is a problem it is either with the way you ran the caller or with the results the caller produced not the vcf file

        Comment


        • #5
          You might want to consider retitling this thread or asking a new question about the best use of samtools mpileup as this is not my area of expertise

          Comment


          • #6
            the reason why i am calling SNPs is because i am trying to setup an analysis pipeline using the 1KGenomes data. I want to make sure we have got all the parameters correct but the 0/0 issue still remains unresolved.

            Comment


            • #7
              Ask a question about samtools and snp calling then rather than 1000 genomes and I suspect you will get a lot more help

              I would recommend looking at the genotypes we do have for the individual you are trying with as that will be a good indiciator of how accurate what you are doing is

              Comment


              • #8
                Already posted, titled Samtools mpileup, I will also check the genotypes as you suggested.

                Comment


                • #9
                  Im looking for a link of genotypes vcf of the latest 1000 genome release and the corresponding reference. I can see one at the link below but it seems to the older release (629 individuals and vcf 4.0). Can I know where I can find the same file in the newer release.

                  ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz

                  Thank you,
                  Teja

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Addressing Off-Target Effects in CRISPR Technologies
                    by seqadmin






                    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                    08-27-2024, 04:44 AM
                  • seqadmin
                    Selecting and Optimizing mRNA Library Preparations
                    by seqadmin



                    Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                    08-07-2024, 12:11 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 08-27-2024, 04:40 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-22-2024, 05:00 AM
                  0 responses
                  293 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-21-2024, 10:49 AM
                  0 responses
                  135 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-19-2024, 05:12 AM
                  0 responses
                  124 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X