Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sbaheti
    Member
    • Jul 2010
    • 12

    Variant Calling for Exome Capture Analysis

    HI

    I am trying various tools to call variants in the exome data i have to get to standardize our work flow. I have few parameters to discuss, like what should be the min read depth to get less false positive variant calls. Ans when we do indel calls what should we use for indel-supported reads parameter.
    I am analyzing GATK and samtools for the same. If any body having experience in this area please let me know the acceptable parameters to use for comparison.
    Thanks

    Saurabh
  • JohnK
    Senior Member
    • Feb 2010
    • 106

    #2
    I'm interested in this too. Keep me posted on the indel/SNP options please? I'm not finding BioScope to be satisfactory for clients' needs.

    Comment

    • sbaheti
      Member
      • Jul 2010
      • 12

      #3
      I tried both samtools and GATK for indel calling for BWA alignment results. The numbers from Samtools are way more than by GATK. But when i see the overlap between them that looks promising. Samtools is covering allmost all the indel calls from GATK. I didn't check the exculsive calls from Samtools.
      For this I used 6X coverage criterion and min 5 indel-supported reads to call as a indel. Probably best thing is to check the exculsive calls from samtools using IGV and flag them as false positive.

      Comment

      • Lee Sam
        Member
        • Oct 2008
        • 57

        #4
        Originally posted by sbaheti View Post
        I tried both samtools and GATK for indel calling for BWA alignment results. The numbers from Samtools are way more than by GATK. But when i see the overlap between them that looks promising. Samtools is covering allmost all the indel calls from GATK. I didn't check the exculsive calls from Samtools.
        For this I used 6X coverage criterion and min 5 indel-supported reads to call as a indel. Probably best thing is to check the exculsive calls from samtools using IGV and flag them as false positive.
        I'm running both samtools and GATK as well for our exome-captured data. I'm curious why you would consider samtools-exclusive indels as false positives.

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #5
          Remember to set a quality threshold (about 50) on indels. Also the best indel caller so far is believed to be Dindel.

          Comment

          • sbaheti
            Member
            • Jul 2010
            • 12

            #6
            On spot check on the exculsive Indels from Samtools, i have noticed that there are two indels close to each other and samtools call one as a indel not the other. I am not sure why and GATK discards both of them. But i cant generalize this as i just checked few of those only. I am not sure we can get the comparision if we run default Samtools and GATK as both have different defulat parameters.

            Comment

            • sbaheti
              Member
              • Jul 2010
              • 12

              #7
              Thanks lh3, i will definately try threshold for Indel as 50 and will look how Dindel works..
              Last edited by sbaheti; 08-27-2010, 01:34 PM.

              Comment

              • Lee Sam
                Member
                • Oct 2008
                • 57

                #8
                Originally posted by lh3 View Post
                Remember to set a quality threshold (about 50) on indels. Also the best indel caller so far is believed to be Dindel.
                How did the belief that dindel was the best indel caller came about? I haven't seen any comparison papers (at the least I am not aware of them) so I would honestly like to know.

                Comment

                • Nomijill
                  Member
                  • Sep 2009
                  • 25

                  #9
                  Hi all,

                  For the indels that you are looking for, what are your read lengths and what are the sizes of the indels you expect to find?

                  Thanks,
                  Naomi

                  Comment

                  • sbaheti
                    Member
                    • Jul 2010
                    • 12

                    #10
                    Originally posted by Nomijill View Post
                    Hi all,

                    For the indels that you are looking for, what are your read lengths and what are the sizes of the indels you expect to find?

                    Thanks,
                    Naomi
                    I am looking for accuarte indel calls with less false positives. We want to use the indel caller for exome capture analysis. The read length we normally use are 50 and 75. I am not sure about the size of the indels, but we always target for short indels.

                    Thanks,
                    Saurabh

                    Comment

                    • sbaheti
                      Member
                      • Jul 2010
                      • 12

                      #11
                      I have one more question
                      We were looking at MAQ aligned reads for a dataset, calling variants using MAQ and Samtools, and the 2 results are not as similar as we expected. Could you comment on something we are missing here, or this is what you would expect too!

                      Results : (These numbers are for SNPs)
                      I am following convention as (Aligner_Variant caller)
                      MAQ_MAQ = 13525(Exclusive)
                      MAQ_SAMTOOLS = 21701 (Exclusive)
                      COMMON = 69689

                      SCRIPTS USED:
                      $ samtools pileup -vcf *.sorted.bam > *.pileup

                      Call the variant using samtool.pl perl script and using Varfilter and –N 3 max SNPs in a window
                      $ samtools.pl varFilter -N 3 -D 100 *.pileup > *.raw.variant

                      Discard the calls for variant if phred is less than 40
                      $ cat *.raw.snp | awk ‘$6>=40’ > *.raw.filtered.snp


                      $VER/maq cns2snp consensus.cns > cns.snp
                      $VER/maq indelpe $genome $map_file > cns.indel.pe
                      $VER/scripts/maq.pl SNPfilter -f cns.indel -F cns.indel.pe cns.snp > cns.filter.snp
                      $VER/maq cns2win consensus.cns > cns.win
                      awk '$5>=40' cns.filter.snp > cns.final.snp

                      Thanks,
                      Saurabh

                      Comment

                      • lh3
                        Senior Member
                        • Feb 2008
                        • 686

                        #12
                        Originally posted by Lee Sam View Post
                        How did the belief that dindel was the best indel caller came about? I haven't seen any comparison papers (at the least I am not aware of them) so I would honestly like to know.
                        If you were part of the 1000 genomes project, you would know. The paper is coming.

                        Comment

                        • dawe
                          Senior Member
                          • Apr 2009
                          • 258

                          #13
                          Originally posted by lh3 View Post
                          If you were part of the 1000 genomes project, you would know. The paper is coming.
                          Doh! I've just spent a weekend running GATK :-(

                          [ flame mode on ]
                          BTW, I'm not a master of variations and I'm probably the last who can say anything about, nevertheless I don't think 1000 Genomes team holds the Truth. I wish I was part of that project, but unfortunately I have to work somewhere else...
                          [ flame mode off ]

                          d
                          Last edited by dawe; 08-31-2010, 01:23 AM.

                          Comment

                          • nilshomer
                            Nils Homer
                            • Nov 2008
                            • 1283

                            #14
                            Originally posted by lh3 View Post
                            If you were part of the 1000 genomes project, you would know. The paper is coming.
                            I have to politely disagree, since I do not know and I am a member of the project. I have not seen a re-alignment method comparison within the 1000 genomes project.

                            Comment

                            • SeqAnswerSeeker
                              Junior Member
                              • Apr 2010
                              • 3

                              #15
                              Just to make sure -- we are talking about PINDEL here, right? Or does the 1000 genomes have a new indel caller that is still unpublished (called DINDEL)

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:58 AM
                              0 responses
                              13 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              60 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...