Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ethnicity check

    Hello, I have sequenced 20 exomes with the Ion Proton system and have to do an ethnicity check on all samples as a quality control step. I have the reported ethnicities of all the samples.

    Is there anyway I can use the variants from these samples to compare with the variants of 1000 genomes dataset? For instance, can I run a genotype concordance on the variants from my samples and those of a 1000 genomes european/african/asian etc...


    Thanks in advance!

  • #2
    I asked the same question and found the Similarity tool on Gentalk's website helpful for this. You can download from here, and it was easy to use:

    https://gene-talk.de/qc (and see referenced paper in the post below)

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    I was able to assign exomes to 1000 Genomes ethnic groups, and this works well if the exome is from an ethnic group represented in the 1000 Genomes data. The problems arise when the exome is from an ethnic group not represented in the data, e.g. in my case Aboriginal Australian.

    Phillipino, Tongan, Pacific Islanders tend to loosely group with Asians, which I guess seems reasonable. British people group with GBP in most cases, but some group more closely with CEU.
    Last edited by rbagnall; 10-03-2014, 02:54 PM.

    Comment


    • #3
      This is great, thank you very much.
      I am getting an error message: "unable to access similarity.jar"
      I downloaded the QC software and the jar file is present.

      Did you come across this?

      Comment


      • #4
        You need to move to the similarity folder where the jar file is located:

        1. change directory to the similarity folder

        cd path/to/Similarity_05022013

        2. Make a new folder for results, called ethnicity

        mkdir ethnicity

        3. create a vcf file of variants from a single Bamfile, and write it into the ethnicity folder. Call variants in the '20110225.exome.consensus.bed' file that comes with the Similarity tool (I use GATK)

        java -jar /path/to/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 10 -R /path/to/GRCh.37.fasta -I /path/to/bamfile.bam -o ethnicity/file1.vcf -L 20110225.exome.consensus.bed -G StandardAnnotation -stand_emit_conf 10.0 -stand_call_conf 20.0 -dcov 200 -l INFO -rf BadCigar -glm SNP

        4. Run similarity jar on the file1.vcf

        java -Xmx6g -jar similarity.jar -d ethnicity -o file1.txt

        5. Make plot, as per the manuscript

        Rscript --vanilla R/MDS.R ethnicity/file1.txt ethnicity/file1.pdf
        Last edited by rbagnall; 10-04-2014, 03:27 AM.

        Comment


        • #5
          Thanks again, this is very helpful. I already have my vcf files which were called using the torrent variant caller that were created from bam files. So do I need to call the variants again with the provided .bed file using the -L argument?

          Thank you!

          Comment


          • #6
            You could restrict variants in the vcf file to the provided .bed file using Bedtools (intersectBed)

            Comment


            • #7
              Hi again, may I ask what version of java you used to run similarity.jar? I am using 1.7 and I am getting a 'java.lang.NullPointerException'.

              Also, did your VCFs contain the homozygous (0/0) reference calls, or just heteroyzgous variant (0/1) and homozygous variant (1/1)

              Thanks
              Last edited by Rabu; 10-06-2014, 10:36 AM.

              Comment


              • #8
                java version "1.7.0_02"

                My vcf files were single sample, so no 0/0 calls.

                Perhaps show the full command that you write.

                Comment


                • #9
                  Hi,
                  I seemed to get everything to work, I had a small issue in my command line. I ran similarity.jar without first intersecting my VCFs with the consensus.bed file provided and my genotype accuracies are quite low (<0.9999). I imagine that intersecting my VCFs with the consensus.bed improve the genotyping accuracy since the variants that do not match will not be included in the analysis. Is this correct?

                  Thanks again!
                  Last edited by Rabu; 10-08-2014, 10:04 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Non-Coding RNA Research and Technologies
                    by seqadmin




                    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                    Nobel Prize for MicroRNA Discovery
                    This week,...
                    10-07-2024, 08:07 AM
                  • seqadmin
                    Recent Developments in Metagenomics
                    by seqadmin





                    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                    09-23-2024, 06:35 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 10-11-2024, 06:55 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-02-2024, 04:51 AM
                  0 responses
                  110 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-01-2024, 07:10 AM
                  0 responses
                  114 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-30-2024, 08:33 AM
                  1 response
                  121 views
                  0 likes
                  Last Post EmiTom
                  by EmiTom
                   
                  Working...
                  X