No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • ethnicity check

    Hello, I have sequenced 20 exomes with the Ion Proton system and have to do an ethnicity check on all samples as a quality control step. I have the reported ethnicities of all the samples.

    Is there anyway I can use the variants from these samples to compare with the variants of 1000 genomes dataset? For instance, can I run a genotype concordance on the variants from my samples and those of a 1000 genomes european/african/asian etc...

    Thanks in advance!

  • #2
    I asked the same question and found the Similarity tool on Gentalk's website helpful for this. You can download from here, and it was easy to use: (and see referenced paper in the post below)

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    I was able to assign exomes to 1000 Genomes ethnic groups, and this works well if the exome is from an ethnic group represented in the 1000 Genomes data. The problems arise when the exome is from an ethnic group not represented in the data, e.g. in my case Aboriginal Australian.

    Phillipino, Tongan, Pacific Islanders tend to loosely group with Asians, which I guess seems reasonable. British people group with GBP in most cases, but some group more closely with CEU.
    Last edited by rbagnall; 10-03-2014, 02:54 PM.


    • #3
      This is great, thank you very much.
      I am getting an error message: "unable to access similarity.jar"
      I downloaded the QC software and the jar file is present.

      Did you come across this?


      • #4
        You need to move to the similarity folder where the jar file is located:

        1. change directory to the similarity folder

        cd path/to/Similarity_05022013

        2. Make a new folder for results, called ethnicity

        mkdir ethnicity

        3. create a vcf file of variants from a single Bamfile, and write it into the ethnicity folder. Call variants in the '20110225.exome.consensus.bed' file that comes with the Similarity tool (I use GATK)

        java -jar /path/to/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 10 -R /path/to/GRCh.37.fasta -I /path/to/bamfile.bam -o ethnicity/file1.vcf -L 20110225.exome.consensus.bed -G StandardAnnotation -stand_emit_conf 10.0 -stand_call_conf 20.0 -dcov 200 -l INFO -rf BadCigar -glm SNP

        4. Run similarity jar on the file1.vcf

        java -Xmx6g -jar similarity.jar -d ethnicity -o file1.txt

        5. Make plot, as per the manuscript

        Rscript --vanilla R/MDS.R ethnicity/file1.txt ethnicity/file1.pdf
        Last edited by rbagnall; 10-04-2014, 03:27 AM.


        • #5
          Thanks again, this is very helpful. I already have my vcf files which were called using the torrent variant caller that were created from bam files. So do I need to call the variants again with the provided .bed file using the -L argument?

          Thank you!


          • #6
            You could restrict variants in the vcf file to the provided .bed file using Bedtools (intersectBed)


            • #7
              Hi again, may I ask what version of java you used to run similarity.jar? I am using 1.7 and I am getting a 'java.lang.NullPointerException'.

              Also, did your VCFs contain the homozygous (0/0) reference calls, or just heteroyzgous variant (0/1) and homozygous variant (1/1)

              Last edited by Rabu; 10-06-2014, 10:36 AM.


              • #8
                java version "1.7.0_02"

                My vcf files were single sample, so no 0/0 calls.

                Perhaps show the full command that you write.


                • #9
                  I seemed to get everything to work, I had a small issue in my command line. I ran similarity.jar without first intersecting my VCFs with the consensus.bed file provided and my genotype accuracies are quite low (<0.9999). I imagine that intersecting my VCFs with the consensus.bed improve the genotyping accuracy since the variants that do not match will not be included in the analysis. Is this correct?

                  Thanks again!
                  Last edited by Rabu; 10-08-2014, 10:04 AM.


                  Latest Articles


                  • seqadmin
                    Advanced Methods for the Detection of Infectious Disease
                    by seqadmin

                    The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                    11-27-2023, 01:15 PM
                  • seqadmin
                    Strategies for Investigating the Microbiome
                    by seqadmin

                    Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                    11-09-2023, 07:02 AM





                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:26 AM
                  0 responses
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 08:12 AM
                  0 responses
                  Last Post seqadmin  
                  Started by seqadmin, 11-27-2023, 08:12 AM
                  0 responses
                  Last Post seqadmin  
                  Started by seqadmin, 11-22-2023, 09:29 AM
                  1 response
                  Last Post VilliamPast