Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • aforntacc
    Member
    • Jun 2011
    • 48

    Interpreting INDELs

    HI all
    Please how can i interpret indels? i have no previous knowlegde about this.

    this is what the data looks like

    chr3 4466963 . TGGAG TGGAGGAG 999 PASS AC1=5;AF1=0.4167;DP4=7,197,3,84;DP=347;FQ=999;G3=0.1667,0.8333,8.319e-50;HWE=0.0465;INDEL;MQ=44;MfGt=0/1;MinDP=28;NeqMfGt=1;PV4=1,1.5e-70,2e-112,1 GT:PL: DP:SP:GQ 0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55 0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64

    thank you
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    Your question is too general -- e.g., what do you mean by "interpret"? I can tell you that you have a 'GAG' insert at a very low frequency (0.4); is that an "interpretation"?
    Plus you give no indication that you have looked at the manual for the program that created your file.

    In other words, if you want someone to help you then do your homework, give us more information and come back with a specific question(s).

    Comment

    • aforntacc
      Member
      • Jun 2011
      • 48

      #3
      Ok, i will like to say i am not a bio informatician, and i am sorry if i ask very stupid questions, its my first time of doing this. having said that
      my objective is to compare the genotypes and find out what indels are common and unique to eah genotype, i have filtered out the indels from the vcf file using the vcftools so now i want to be able to compare the genotypes i.e wildtype consisting of 3 libraries (0/1:93,0,255:33:0:95, 0/0:0,122,255:59:0:99 0/1:53,0,241:42:0:55) and mutant consisting of 3 libraries (0/1:139,0,250:59:7:99, 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64)

      i am sorry if this is also a stupid question but i dont know how else i can put it or right thing to do when you are looking for common and uniques in genotypes without regards to the reference.

      Comment

      • swbarnes2
        Senior Member
        • May 2008
        • 910

        #4
        Step 1) Understand your experiment well enough to ask sensible questions about it.

        It's going to be very hard for anyone here to help you with that.

        Comment

        • westerman
          Rick Westerman
          • Jun 2008
          • 1104

          #5
          Thank you for the additional information. At least it gets closer to us being able to help you. Additionally I suspect that your native language is not English thus is may be hard to formulate good questions. However your question is still too general. To put it in non-bioinformatic terms your first post was like asking someone:

          "I would like to travel, please help me."

          Your second post, with more information, is like asking:

          "I would like to travel to Italy, do you have any suggestions?"

          In other words it is a more clear question but still not suitable for a good answer.

          Going back to your post, it is still unclear to me what program created your VCF file. I do not know what all of those INFO tags mean. They should be clear from the header portion of the VCF file. All I can say at this point is that, yes, you have an InDel on Chromosome that is of high quality (999) and passes all filtering. You could certainly look for other Indels of similar characteristics and put this in your paper and thus have reasonable results. However I suspect that is not what you want to accomplish.

          Comment

          • aforntacc
            Member
            • Jun 2011
            • 48

            #6
            Ok, people is it realy about the english? or understanding the project? i dont think so, may be because its the first time of handling such data and its a learning curve and i am not ashamed to ask more and more or shade more light just to get much needed help. having said that

            the variants were called using samtools. i only have two individuals (with 3 libraries for each) which are Near Isogenic Lines. so i am only interested in seeing which indels(that passed the filtering criteria) that are in the wildtype and are not in the mutant and vise versa i.e (absolute difference meaning present in only one indivdual but not the other)

            at the moment i have filtered out all the indels that pass the filtering criteria (or with the pass tag) using a comand line.
            so my first question is, for snps i understand the REF= 0 and ALT = 1 can i interpret the indels in this way. e,g
            wt
            0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64
            TGGAG/ TGGAGGAG TGGAG/ TGGAGGAG TGGAG/ TGGAGGAG

            mt
            0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42
            TGGAG/ TGGAGGAG TGGAG/ TGGAG TGGAG/ TGGAGGAG

            secondly
            i want be compare a vcf file contining the 3 wt libraries ( 0/1:139,0,250:59:7:99 0/1:78,0,255:70:3:80 0/1:62,0,238:28:0:64)
            to another vcf file comtaining mt libraires.
            (0/1:93,0,255:33:0:95 0/0:0,122,255:59:0:99 0/1:53,0,241:42) to find chr positions that habour indels that are unique and common to both individuals. but i do not know how i can do this with the vcftools and i dont know if any body have done this kind of thing before that is why i came to this site
            so sorry again but we are all learning isnt it.


            vcf header
            ##fileformat=VCFv4.1
            ##samtoolsVersion=0.1.17 (r973:277)
            ##INFO=<ID=DP Number=1 Type=Integer Description="Raw read depth">
            ##INFO=<ID=DP4 Number=4 Type=Integer Description="# high-quality ref-forward bases ref-reverse alt-forward and alt-reverse bases">
            ##INFO=<ID=MQ Number=1 Type=Integer Description="Root-mean-square mapping quality of covering reads">
            ##INFO=<ID=FQ Number=1 Type=Float Description="Phred probability of all samples being the same">
            ##INFO=<ID=AF1 Number=1 Type=Float Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
            ##INFO=<ID=AC1 Number=1 Type=Float Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
            ##INFO=<ID=G3 Number=3 Type=Float Description="ML estimate of genotype frequencies">
            ##INFO=<ID=HWE Number=1 Type=Float Description="Chi^2 based HWE test P-value based on G3">
            ##INFO=<ID=CLR Number=1 Type=Integer Description="Log ratio of genotype likelihoods with and without the constraint">
            ##INFO=<ID=UGT Number=1 Type=String Description="The most probable unconstrained genotype configuration in the trio">
            ##INFO=<ID=CGT Number=1 Type=String Description="The most probable constrained genotype configuration in the trio">
            ##INFO=<ID=PV4 Number=4 Type=Float Description="P-values for strand bias baseQ bias mapQ bias and tail distance bias">
            ##INFO=<ID=INDEL Number=0 Type=Flag Description="Indicates that the variant is an INDEL.">
            ##INFO=<ID=PC2 Number=2 Type=Integer Description="Phred probability of the nonRef allele frequency in group1 samples being larger ( smaller) than in group2.">
            ##INFO=<ID=PCHI2 Number=1 Type=Float Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
            ##INFO=<ID=QCHI2 Number=1 Type=Integer Description="Phred scaled PCHI2.">
            ##INFO=<ID=PR Number=1 Type=Integer Description="# permutations yielding a smaller PCHI2.">
            ##FORMAT=<ID=GT Number=1 Type=String Description="Genotype">
            ##FORMAT=<ID=GQ Number=1 Type=Integer Description="Genotype Quality">
            ##FORMAT=<ID=GL Number=3 Type=Float Description="Likelihoods for RR RA AA genotypes (R=ref A=alt)">
            ##FORMAT=<ID=DP Number=1 Type=Integer Description="# high-quality bases">
            ##FORMAT=<ID=SP Number=1 Type=Integer Description="Phred-scaled strand bias P-value">
            ##FORMAT=<ID=PL Number=. Type=Integer Description="List of Phred-scaled genotype likelihoods number of values is (#ALT+1)*(#ALT+2)/2">
            ##FILTER=<ID=StrandBias Description="Min P-value for strand bias (given PV4) [0.0001]">
            ##FILTER=<ID=EndDistBias Description="Min P-value for end distance bias [0.0001]">
            ##FILTER=<ID=MaxDP Description="Maximum read depth [10000000]">
            ##FILTER=<ID=BaseQualBias Description="Min P-value for baseQ bias [1e-100]">
            ##FILTER=<ID=MinMQ Description="Minimum RMS mapping quality for SNPs [10]">
            ##FILTER=<ID=Qual Description="Minimum value of the QUAL field [10]">
            ##FILTER=<ID=MinAB Description="Minimum number of alternate bases [2]">
            ##FILTER=<ID=GapWin Description="Window size for filtering adjacent gaps [10]">
            ##FILTER=<ID=MapQualBias Description="Min P-value for mapQ bias [0]">
            ##FILTER=<ID=SnpGap Description="SNP within INT bp around a gap to be filtered [10]">
            ##FILTER=<ID=MinDP Description="Minimum read depth [5]">
            ##FILTER=<ID=RefN Description="Reference base is N []">
            ##source_20120121.1=./bin/vcftools/perl/vcf-annotate #NOME? +/d=5
            ##INFO=<ID=MinDP Number=1 Type=Integer Description="The smallest sample DP">
            ##INFO=<ID=MfGt Number=1 Type=String Description="The Most Frequent GenoType in the Cohort">
            ##INFO=<ID=NeqMfGt Number=1 Type=Integer Description="Number of sample mismatching The Most Frequent GenoType in the Cohort">
            ##FILTER=<ID=MfGtMis Description="Less than [0] (or equal) samples have a genotype mismatching MfGt">
            ##FILTER=<ID=AltSup Description="According to DP4 the alternative is observed in less than [2] reads in one of the direction">
            ##FILTER=<ID=SynAA Description="The mutation affect a CDS but all the alternatives imply a amino-acid sequence identical to the reference amino-acid sequence">
            #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 110506_SN132_A_s_3_seq_GQJ-3 110506_SN132_A_s_3_seq_GQJ-4 110506_SN132_A_s_4_seq_GQJ-5 110506_SN132_A_s_4_seq_GQJ-6 110616_SN365_A_s_7_seq_GQJ-1 110616_SN365_A_s_7_seq_GQJ-2

            Comment

            • Lluc
              Member
              • Aug 2010
              • 12

              #7
              So, if I understood well, this is about comparing samples and identifying the mutations (indels) that differentiate them, right? By the title of this thread, I would not have guessed. Here you have my suggestions.

              If you have two individuals and three libraries from each, you should pool together the libraries before discovering the indels, and have a vcf file with only two (not 6) samples. That way, you could easily use the most likely genotype of each sample to make the comparison, for example.

              If you want to be very specific and avoid as many false positives as possible, you may want to try tools designed for cancer genomics, which compare a tumor sample and a normal tissue sample and search for tumor-specific mutations. Even if your samples are not tumor and normal, you can just label them as such and give it a try. For indels, you have the SomaticIndelDetector in GATK. You feed it the bam files of your samples, identified as either normal or tumor, and you obtain a vcf file with all the indels (common and specific), but with the indels specific of the "tumor" sample labelled as "SOMATIC" in the INFO field.

              A limitation of this approach is that indels specific of the sample labelled as "normal" may not be marked. But you can just switch labels and repeat the analysis.

              Comment

              • clarissaboschi
                Member
                • Apr 2010
                • 63

                #8
                Dear aforntacc

                I know that this post is old, but reading about the filters that you used for filter the variants I noticed that you have one particular filter called "AltSup" that I am interested.

                ##FILTER=<ID=AltSup Description="According to DP4 the alternative is observed in less than [2] reads in one of the direction">

                I did not find this option in vcftools, so I think you created/edited this one. I am trying to do the same, can you tell me how is this filter please?
                I think what this filter will do is have sufficient alternate reads in both strands.

                Thanks
                Clarissa

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  Today, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM
                • seqadmin
                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                  by seqadmin




                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                  02-24-2025, 06:31 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                26 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                33 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                25 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                190 views
                0 reactions
                Last Post seqadmin  
                Working...