Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ethnicity check

    Hello, I have sequenced 20 exomes with the Ion Proton system and have to do an ethnicity check on all samples as a quality control step. I have the reported ethnicities of all the samples.

    Is there anyway I can use the variants from these samples to compare with the variants of 1000 genomes dataset? For instance, can I run a genotype concordance on the variants from my samples and those of a 1000 genomes european/african/asian etc...


    Thanks in advance!

  • #2
    I asked the same question and found the Similarity tool on Gentalk's website helpful for this. You can download from here, and it was easy to use:

    https://gene-talk.de/qc (and see referenced paper in the post below)

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    I was able to assign exomes to 1000 Genomes ethnic groups, and this works well if the exome is from an ethnic group represented in the 1000 Genomes data. The problems arise when the exome is from an ethnic group not represented in the data, e.g. in my case Aboriginal Australian.

    Phillipino, Tongan, Pacific Islanders tend to loosely group with Asians, which I guess seems reasonable. British people group with GBP in most cases, but some group more closely with CEU.
    Last edited by rbagnall; 10-03-2014, 02:54 PM.

    Comment


    • #3
      This is great, thank you very much.
      I am getting an error message: "unable to access similarity.jar"
      I downloaded the QC software and the jar file is present.

      Did you come across this?

      Comment


      • #4
        You need to move to the similarity folder where the jar file is located:

        1. change directory to the similarity folder

        cd path/to/Similarity_05022013

        2. Make a new folder for results, called ethnicity

        mkdir ethnicity

        3. create a vcf file of variants from a single Bamfile, and write it into the ethnicity folder. Call variants in the '20110225.exome.consensus.bed' file that comes with the Similarity tool (I use GATK)

        java -jar /path/to/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 10 -R /path/to/GRCh.37.fasta -I /path/to/bamfile.bam -o ethnicity/file1.vcf -L 20110225.exome.consensus.bed -G StandardAnnotation -stand_emit_conf 10.0 -stand_call_conf 20.0 -dcov 200 -l INFO -rf BadCigar -glm SNP

        4. Run similarity jar on the file1.vcf

        java -Xmx6g -jar similarity.jar -d ethnicity -o file1.txt

        5. Make plot, as per the manuscript

        Rscript --vanilla R/MDS.R ethnicity/file1.txt ethnicity/file1.pdf
        Last edited by rbagnall; 10-04-2014, 03:27 AM.

        Comment


        • #5
          Thanks again, this is very helpful. I already have my vcf files which were called using the torrent variant caller that were created from bam files. So do I need to call the variants again with the provided .bed file using the -L argument?

          Thank you!

          Comment


          • #6
            You could restrict variants in the vcf file to the provided .bed file using Bedtools (intersectBed)

            Comment


            • #7
              Hi again, may I ask what version of java you used to run similarity.jar? I am using 1.7 and I am getting a 'java.lang.NullPointerException'.

              Also, did your VCFs contain the homozygous (0/0) reference calls, or just heteroyzgous variant (0/1) and homozygous variant (1/1)

              Thanks
              Last edited by Rabu; 10-06-2014, 10:36 AM.

              Comment


              • #8
                java version "1.7.0_02"

                My vcf files were single sample, so no 0/0 calls.

                Perhaps show the full command that you write.

                Comment


                • #9
                  Hi,
                  I seemed to get everything to work, I had a small issue in my command line. I ran similarity.jar without first intersecting my VCFs with the consensus.bed file provided and my genotype accuracies are quite low (<0.9999). I imagine that intersecting my VCFs with the consensus.bed improve the genotyping accuracy since the variants that do not match will not be included in the analysis. Is this correct?

                  Thanks again!
                  Last edited by Rabu; 10-08-2014, 10:04 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X