Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Is GATK really suited for cancer genomes? Are their options to set the Unified Genotyper to call alleles in a population (of heterogeneous cancer cells) rather than an individual that would have up to 2 alleles?

    Comment


    • #17
      The Broad have written some SNP calling software (syzygy) for pooled heterogeneous samples:



      They're using it to look for SNPs where many individuals are in the same library and for targeting a smaller set of genes rather than doing whole exomes. So this may be better for cancer genomes than GATK, but I have no experience there.

      Chris

      Comment


      • #18
        Thanks Chris. This looks interesting, I just need to figure how it can plug into our existing GATK-based pipeline.

        Comment


        • #19
          If you have matched somatic tumour and germline data, see also:



          Note, however, that the MuTect somatic SNP caller is still in restricted beta, and the somatic indel detector is based on the old GATK Indel Genotyper v2 (superceded for all other purposes by the Unified Genotyper's indel mode).

          Comment


          • #20
            Thanks. I am looking forward to trying MuTect when it is available. And here is a possibly silly question: Should there be a different approach to genotyping heterogenous tumors versus pooled populations? In a pooled population you typically know how many genomes are included. Cells in tumors should likely be more related and probably differ in handful of mutations that helped facilitate tumorigenesis and in genomically unstable regions. Any thoughts?

            Comment


            • #21
              Hi I am also working for SNP differences between 2 different cell lines which is got from 3 disease and 3 normal individual. First I did bowtie for alignment. Then I used samtools mpileup for comparing multiple bam files.

              my question how can I change my parameters for mpileup command line to get more quality SNP between these 12 files.

              default is mpileup -uf

              Do you suggest another parameters? How can I find good paper for this?

              mpileup -6 -uDSf ?

              Could you explain? I really appreciate any help.

              Thanks

              Comment


              • #22
                from samtools mpileup:

                -6 assume the quality is in the Illumina-1.3+ encoding

                Depends on which quality values are in the BAM file:



                -D output per-sample DP in BCF (require -g/-u)

                This is an output option - depth per sample if you give samtools multiple samples to call SNPs. Usually depth is total depth over all samples.

                -S output per-sample strand bias P-value in BCF (require -g/-u)

                This is also an output option - strand bias per sample and not just over all samples.

                So no change to the SNP calling algorithm unless you call SNP's using illumina phred scores rather than sanger phred scores - this will probably make a big difference if wrong.

                Read the post of user ulz_peter where he suggests this link if you want to optimise your SNP calling parameters and the papers suggested by the user Simon Anders in this thread.



                Chris

                Comment


                • #23
                  Hi Chris,

                  Thanks so much for your comments. I will read them. I hope I will fıgure out soon.

                  For example, I would like to see whether individual and different tissue differences or not between samples and would like to get table DP4 values for every sample to compare each other.

                  So first after bowtie alignment,

                  mpileup -Euf ref.fa sample1.bam sample2.bam sample3.bam and goes on
                  view -bcvg
                  for filtering -D 100

                  And to get DP4 values specifically, I ran mpileup for each sample alone wıth same parameters like
                  mpileup -Euf ref.fa sample1.bam

                  So do you recommend any other parameters to get good DP4 values ?

                  Am I missing any point according to my parameters? Should I also get AF1 to see SNP differences between sample?

                  Should I add removing indel option by putting -I to command line?

                  I really appreciate any help. Thanks Chris

                  Aslihan

                  Comment


                  • #24
                    Hello Dear Chris

                    We received our exome data and now i have 2 files (snps and indels) in text format.
                    I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! Can i use annovar for next stage?? its header is not suitable for annovar ?!

                    #$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
                    chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
                    chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
                    chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
                    chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0


                    Best

                    Comment


                    • #25
                      @aslihan

                      To get better data, I'd recommend first to use BWA or Bowtie2 rather than the original Bowtie. See this post for the latest info about these different alignment programs:

                      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                      In terms of DP4 - these are pretty much set based on the alignments you get from bowtie/bwa, although you can filter them to make sure you get reads that have good alignments (-q flag - see below) only and also for bases that are good quality (-Q flag below)

                      -q INT skip alignments with mapQ smaller than INT [0]
                      -Q INT skip bases with baseQ/BAQ smaller than INT [13]

                      In GATK, they set -Q to be 17 by default.

                      The definition of AF1 is this:

                      AF1 EM [expectation maximum] estimate of the site allele frequency of the strongest non-reference allele.

                      there is a section about this on the mpileup page:



                      "the procedure to estimate AFS is:
                      bcftools view -NIbl cond.txt data.bcf > cond.bcf
                      bcftools view -cGP cond2 cond.bcf > round1.vcf 2> round1.afs
                      bcftools view -cGP round1.afs cond.bcf > /dev/null 2> round2.afs
                      bcftools view -cGP round2.afs cond.bcf > /dev/null 2> round3.afs
                      ......
                      until the AFS converges, which usually takes less than 10 rounds of EM iterations. The first command line above extracts sites in cond.txt for efficiency in later steps. Option -P specifies the initial AFS (in SNP calling, this is prior), which can be a file (as in the 3rd and 4th command lines) or 'full', 'cond2' or 'flat' (as in the 2nd command line). Choosing the right initial AFS helps accuracy and reduces iterations and potential overfitting"

                      For the -I option, this is only relevant if you are interested in SNP's only. Sometimes indels can be relevant in exome data, so it's probably worth not setting -I.

                      Chris

                      Comment


                      • #26
                        Originally posted by afaghalavi View Post
                        Hello Dear Chris

                        We received our exome data and now i have 2 files (snps and indels) in text format.
                        I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! Can i use annovar for next stage?? its header is not suitable for annovar ?!

                        #$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
                        chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
                        chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
                        chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
                        chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0


                        Best
                        Do you know what software made this data? I think annovar can start from VCF files - so some of your data in that format could be converted to something like this in VCF (some of the columns in your data need to be explained a bit more to go into the VCF format though):

                        #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
                        chr1 12783 . G A
                        chr1 13057 . G C
                        chr1 13351 . T G
                        chr1 14673 . G C

                        e.g., it looks like they are specifying hetorozygous or homozygous SNP's in this way: "AC" or "CC" (where the reference base is A). In VCF, they would say things like ref=A, alt=C, genotype=0/1 for "AC" or genotype=1/1 for "CC". And sometimes maybe the best one is things like ref=G, best allele=GG, but I can't tell from your file format without some more explanation.

                        Chris

                        Comment


                        • #27
                          You might also consider the BSNP Bayesian Genotype caller. It's been tested on Illumina, 454, SOLiD and Sanger human alignments ans has some technology specific bias correction. It requires a samtools pileup as input, but is fully Bayesian, considers both alignment and sequence quality ans doesn't bias towards the reference, and was designed for comparing data from differing technologies. If its helpful, have a look at: http://compgen.bscb.cornell.edu/GPhoCS/BSNP/
                          -- Brad

                          Comment


                          • #28
                            Originally posted by RDW View Post
                            If you have matched somatic tumour and germline data, see also:



                            Note, however, that the MuTect somatic SNP caller is still in restricted beta, and the somatic indel detector is based on the old GATK Indel Genotyper v2 (superceded for all other purposes by the Unified Genotyper's indel mode).
                            Hi,

                            Do you know how to annotate the output from MuTect? I have 3800 mutation calls and I am stuck for almost a day..

                            Comment


                            • #29
                              Originally posted by cjp View Post
                              Two SNP and indel callers that you can search for in seqAnswers are samtools mpileup:



                              and GATK:



                              sections: 5.1, 5.4 (Unified Genotyper) and 5.5.

                              Chris
                              does mpileup detect SNPs, indels and the like via finding regions of high homology to each other during alignments or does it look at the chromatographic data and detect peak-under-peak and offset peaks to come up with alternate calls for regions? Or something else? If it doesn't do peak-under-peak and offsets, is there a tool out there that DOES?

                              My base data is in ab1, but I'm assuming that can be converted to whatever format is needed.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X