Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jane M
    Senior Member
    • Aug 2011
    • 239

    models and softwares for SNP and indel detections

    Hello,

    I'm rather new in NGS field. I previously did an internship about rnaseq data : gene and isoform expression level estimation and differential expression between two conditions. So I know some models and tools used for these issues.

    I start now my PhD and I have to work with dna seq data. I first must focuse on the SNP and indel detections issues. I haven't found a lot of information yet, because I don't know where to look for. I have only found SNVMix "predicting single nucleotide variants from next-generation sequencing of tumors", wich seems interesting.

    Is there some kind of blogs for dna seq like RNA-Seq blog ?

    I specify that my PhD is in cancer field. I'm interested in the models and softwares developped to solve these issues.

    Can you give me the references of papers or softwares that you have read/used in this field ?

    Thanks for your help,
    Jane
  • cjp
    Member
    • Jun 2011
    • 58

    #2
    Two SNP and indel callers that you can search for in seqAnswers are samtools mpileup:



    and GATK:



    sections: 5.1, 5.4 (Unified Genotyper) and 5.5.

    Chris

    Comment

    • Jane M
      Senior Member
      • Aug 2011
      • 239

      #3
      Thanks for your answer.

      I was also wondering about the reliability of the Illumina pipeline, especially for SNP and indel detections: I have the results of 2 dna seq experiments and for each, the list of SNP and indel.
      The results have been established through the illumina pipeline. I haven't managed to find information about the model used by illumina for such analyses.

      Do you know where I can find this information? Have you some information about the quality of these analyses?

      Comment

      • cjp
        Member
        • Jun 2011
        • 58

        #4
        There is something about CASAVA vs GATK here:

        We make Stack Overflow and 170+ other community-powered Q&A sites.


        It's probably worth reading all the replies - and there is a pre-publication paper as well about the comparison.

        Chris

        Comment

        • Jane M
          Senior Member
          • Aug 2011
          • 239

          #5
          Thanks for your answers Chris ! I read the papers about the comparison between CASAVA and GATK and I start to have an overview of the matter.
          I also read the paper about SNVMix and it seems to be a very interesting model !

          I will try to summarize what I've understood until now. Please correct me or add extra information. I have understood that the Illumina's tool for alignment, SNP and indel detection is CASAVA and the newest version seems to be CASAVA1.8.

          Other tools are:
          • GATK, which is related to BWA (aligner tool), for the three issues. What is the relation between GATK and BWA? I understood that both of them can be use to aligne the reads.
          • SNVMix for the SNV detections, which seems adapted for cancer data (as mine)
          • mpileup for SNV and Indel detections.


          Does anyone know other tools?


          About the comparison between GATK and CASAVA, the conclusion is:
          Code:
          We conclude that CASAVA1.8 has come a long way and can be considered a mature SNP calling approach. However, CASAVA1.8 does not deliver the same quality in the indel calling set compared to the newly incorporated Dindel-algorithm of GATK. It hence remains the best practice to use CASAVA1.8 for producing fastq les and switch at this stage to the academic tools for mapping, alignment improvement and variant calling.
          It seems that I should study the indel detection with an other tool than the one from Illumina, but the results for SNP detection should be acceptable.

          Finally, do you know if the models not adapted for cancer data, should be avoided when working in this field?

          Comment

          • Jane M
            Senior Member
            • Aug 2011
            • 239

            #6
            Thanks for your answers Chris ! I read the papers about the comparison between CASAVA and GATK and I start to have an overview of the matter.
            I also read the paper about SNVMix and it seems to be a very interesting model !

            I will summarize what I've understood until now. Please correct me or add extra information. The Illumina's tool for alignment, SNP and indel detection is CASAVA and the newest version seems to be CASAVA1.8.

            Other tools are:
            • GATK, which is related to BWA (aligner tool), for the three issues. What is the relation between GATK and BWA? I understood that both of them can be use to aligne the reads.
            • SNVMix for the SNV detections, which seems adapted for cancer data (as mine)
            • mpileup for SNV and Indel detections.


            Does anyone know other tools?


            About the comparison between GATK and CASAVA, the conclusion is:
            We conclude that CASAVA1.8 has come a long way and can be considered a mature SNP calling approach. However, CASAVA1.8 does not deliver the same quality in the indel calling set compared to the newly incorporated Dindel-algorithm of GATK. It hence remains the best practice to use CASAVA1.8 for producing fastq les and switch at this stage to the academic tools for mapping, alignment improvement and variant calling.
            It seems that I should study the indel detection with an other tool than the one from Illumina, but the results for SNP detection should be acceptable.


            Finally, do you know if the models not adapted specifically for cancer data, should be avoided when working in this field?

            Comment

            • cjp
              Member
              • Jun 2011
              • 58

              #7
              GATK, which is related to BWA (aligner tool), for the three issues. What is the relation between GATK and BWA? I understood that both of them can be use to aligne the reads.

              BWA is for mapping reads to a genomic reference. There are other tools like bowtie, stampy, novoalign that can do this as well. BWA is the standard in many places as it is open source, fast and can do gapped alignments.

              GATK is a set of tools for analysing exome and genomic DNA datasets. It does things like realign reads using multiple alignments where the mapper on its own doesn't do as well. This will eliminate some false positive SNPs by correcting alignments. GATK also recalibrates the theoretical base qualities. Mainly it is used to call SNPs and indels.

              The usual pipeline is:

              sequencer -> fastq -> BWA -> SAM -> samtools/picard -> sorted, dedupped BAM -> GATK -> realigned, recalibrated BAM -> GATK -> SNPs and indels (in VCF format) -> GATK -> recalibrated, filtered SNPs and indels.

              Chris

              Comment

              • ulz_peter
                Senior Member
                • Feb 2010
                • 219

                #8
                Feels strange to promote that, but have a look here:

                Comment

                • cjp
                  Member
                  • Jun 2011
                  • 58

                  #9
                  Thanks, I'd not seen that page on the SeqAnswers wiki. Looks like a nice summary of the GATK stuff on one page!

                  Chris

                  Comment

                  • swbarnes2
                    Senior Member
                    • May 2008
                    • 910

                    #10
                    The one bad thing about CASAVA is that since it was quite bad for some times (there were bugs in the perl scripts!), no one got in the habit of using it, so everyone learned Samtools or Maq or GATK. Now that CASAVA is better, no one really wants to go back and use it when they already know other suites. So if you have questions about CASAVA, you won't nearly as much support here as you would get if you were asking about SAMtools or GATK. You could always ask Illumina...but in my experience, they aren't terribly helpful.

                    Comment

                    • Jane M
                      Senior Member
                      • Aug 2011
                      • 239

                      #11
                      Thanks for your answers.
                      I didn't know Maq. I read few things about it and I understood that it's a mapping tool. Is it also for SNP and indel detections ?

                      I have the feeling that there are not a lot of tools to deal with these SNP and indel detection issues: 3-5. What would you suggest me to start with? What is the simplest to install, to run...?

                      Finally, has someone tried SNVMix? Do you know if there exist forums about NGS tools in cancer field?

                      Comment

                      • Simon Anders
                        Senior Member
                        • Feb 2010
                        • 995

                        #12
                        Just for completeness. Rather recently, papers describing the variant callers in both GATK and samtools have appeared. Reading them might clarify some of the question.

                        on samtools:
                        Heng Li: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (2011) 27 (21): 2987-2993. doi:10.1093/bioinformatics/btr509

                        on GATK:
                        Mark A DePristo et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 491–498 (2011). doi:10.1038/ng.806

                        Comment

                        • cjp
                          Member
                          • Jun 2011
                          • 58

                          #13
                          Heng Li mentions a few SNP callers in this biostar thread:



                          maq is for mapping and SNP detection I believe. It was written by Hen Li who also did BWA and samtools mpileup, so maq may be out of date now especially for SNP detection, but I'm not sure as have never used it. It may still be good for mapping, but is slower than BWA.

                          Also, have no experience with SNVMix or SNP detection in cancer. It may be harder to look for SNPs in cancer cells as they may no longer be diploid if you have heterogeneous cell populations: samtools and GATK rely on the fact that you are looking for SNPs and indels in diploid cells (so caution needs to be applied when looking for SNPs in X and Y chromosomes as well).

                          One of the main differences between GATK and samtools is that GATK tends to give many more SNPs and relies on the variant recalibration to find better quality SNPs. But in my experience, both perform well for finding most of the likely candidate SNPs in exome data. You'll still need to verify any novel SNPs found with something like sanger sequencing, etc.

                          Chris

                          Comment

                          • Jane M
                            Senior Member
                            • Aug 2011
                            • 239

                            #14
                            Thanks again !

                            From the given list, I've found news tools for SNP or indel detections:
                            • Atlas SNP
                            • Dindel
                            • FreeBayes
                            • QCALL
                            • Slider II
                            • SNP Seeker
                            • SPLINTER
                            • Syzygy
                            • VARiD
                            • VarScan


                            And there exist probably more...
                            I will go through this list to see what is worth to try, especially with my cancer data.

                            Comment

                            • Heisman
                              Senior Member
                              • Dec 2010
                              • 534

                              #15
                              Here's a new one I stumbled across. No idea if it's any good, but it might be applicable to cancer data if you have a heterogenous population. http://nar.oxfordjournals.org/conten...kr599.abstract

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...