Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ethnicity check

    Hello, I have sequenced 20 exomes with the Ion Proton system and have to do an ethnicity check on all samples as a quality control step. I have the reported ethnicities of all the samples.

    Is there anyway I can use the variants from these samples to compare with the variants of 1000 genomes dataset? For instance, can I run a genotype concordance on the variants from my samples and those of a 1000 genomes european/african/asian etc...


    Thanks in advance!

  • #2
    I asked the same question and found the Similarity tool on Gentalk's website helpful for this. You can download from here, and it was easy to use:

    https://gene-talk.de/qc (and see referenced paper in the post below)

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    I was able to assign exomes to 1000 Genomes ethnic groups, and this works well if the exome is from an ethnic group represented in the 1000 Genomes data. The problems arise when the exome is from an ethnic group not represented in the data, e.g. in my case Aboriginal Australian.

    Phillipino, Tongan, Pacific Islanders tend to loosely group with Asians, which I guess seems reasonable. British people group with GBP in most cases, but some group more closely with CEU.
    Last edited by rbagnall; 10-03-2014, 02:54 PM.

    Comment


    • #3
      This is great, thank you very much.
      I am getting an error message: "unable to access similarity.jar"
      I downloaded the QC software and the jar file is present.

      Did you come across this?

      Comment


      • #4
        You need to move to the similarity folder where the jar file is located:

        1. change directory to the similarity folder

        cd path/to/Similarity_05022013

        2. Make a new folder for results, called ethnicity

        mkdir ethnicity

        3. create a vcf file of variants from a single Bamfile, and write it into the ethnicity folder. Call variants in the '20110225.exome.consensus.bed' file that comes with the Similarity tool (I use GATK)

        java -jar /path/to/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 10 -R /path/to/GRCh.37.fasta -I /path/to/bamfile.bam -o ethnicity/file1.vcf -L 20110225.exome.consensus.bed -G StandardAnnotation -stand_emit_conf 10.0 -stand_call_conf 20.0 -dcov 200 -l INFO -rf BadCigar -glm SNP

        4. Run similarity jar on the file1.vcf

        java -Xmx6g -jar similarity.jar -d ethnicity -o file1.txt

        5. Make plot, as per the manuscript

        Rscript --vanilla R/MDS.R ethnicity/file1.txt ethnicity/file1.pdf
        Last edited by rbagnall; 10-04-2014, 03:27 AM.

        Comment


        • #5
          Thanks again, this is very helpful. I already have my vcf files which were called using the torrent variant caller that were created from bam files. So do I need to call the variants again with the provided .bed file using the -L argument?

          Thank you!

          Comment


          • #6
            You could restrict variants in the vcf file to the provided .bed file using Bedtools (intersectBed)

            Comment


            • #7
              Hi again, may I ask what version of java you used to run similarity.jar? I am using 1.7 and I am getting a 'java.lang.NullPointerException'.

              Also, did your VCFs contain the homozygous (0/0) reference calls, or just heteroyzgous variant (0/1) and homozygous variant (1/1)

              Thanks
              Last edited by Rabu; 10-06-2014, 10:36 AM.

              Comment


              • #8
                java version "1.7.0_02"

                My vcf files were single sample, so no 0/0 calls.

                Perhaps show the full command that you write.

                Comment


                • #9
                  Hi,
                  I seemed to get everything to work, I had a small issue in my command line. I ran similarity.jar without first intersecting my VCFs with the consensus.bed file provided and my genotype accuracies are quite low (<0.9999). I imagine that intersecting my VCFs with the consensus.bed improve the genotyping accuracy since the variants that do not match will not be included in the analysis. Is this correct?

                  Thanks again!
                  Last edited by Rabu; 10-08-2014, 10:04 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    11-06-2024, 07:24 PM
                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 11-08-2024, 11:09 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-08-2024, 06:13 AM
                  0 responses
                  36 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X