Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Is GATK really suited for cancer genomes? Are their options to set the Unified Genotyper to call alleles in a population (of heterogeneous cancer cells) rather than an individual that would have up to 2 alleles?

    Comment


    • #17
      The Broad have written some SNP calling software (syzygy) for pooled heterogeneous samples:



      They're using it to look for SNPs where many individuals are in the same library and for targeting a smaller set of genes rather than doing whole exomes. So this may be better for cancer genomes than GATK, but I have no experience there.

      Chris

      Comment


      • #18
        Thanks Chris. This looks interesting, I just need to figure how it can plug into our existing GATK-based pipeline.

        Comment


        • #19
          If you have matched somatic tumour and germline data, see also:



          Note, however, that the MuTect somatic SNP caller is still in restricted beta, and the somatic indel detector is based on the old GATK Indel Genotyper v2 (superceded for all other purposes by the Unified Genotyper's indel mode).

          Comment


          • #20
            Thanks. I am looking forward to trying MuTect when it is available. And here is a possibly silly question: Should there be a different approach to genotyping heterogenous tumors versus pooled populations? In a pooled population you typically know how many genomes are included. Cells in tumors should likely be more related and probably differ in handful of mutations that helped facilitate tumorigenesis and in genomically unstable regions. Any thoughts?

            Comment


            • #21
              Hi I am also working for SNP differences between 2 different cell lines which is got from 3 disease and 3 normal individual. First I did bowtie for alignment. Then I used samtools mpileup for comparing multiple bam files.

              my question how can I change my parameters for mpileup command line to get more quality SNP between these 12 files.

              default is mpileup -uf

              Do you suggest another parameters? How can I find good paper for this?

              mpileup -6 -uDSf ?

              Could you explain? I really appreciate any help.

              Thanks

              Comment


              • #22
                from samtools mpileup:

                -6 assume the quality is in the Illumina-1.3+ encoding

                Depends on which quality values are in the BAM file:



                -D output per-sample DP in BCF (require -g/-u)

                This is an output option - depth per sample if you give samtools multiple samples to call SNPs. Usually depth is total depth over all samples.

                -S output per-sample strand bias P-value in BCF (require -g/-u)

                This is also an output option - strand bias per sample and not just over all samples.

                So no change to the SNP calling algorithm unless you call SNP's using illumina phred scores rather than sanger phred scores - this will probably make a big difference if wrong.

                Read the post of user ulz_peter where he suggests this link if you want to optimise your SNP calling parameters and the papers suggested by the user Simon Anders in this thread.



                Chris

                Comment


                • #23
                  Hi Chris,

                  Thanks so much for your comments. I will read them. I hope I will fıgure out soon.

                  For example, I would like to see whether individual and different tissue differences or not between samples and would like to get table DP4 values for every sample to compare each other.

                  So first after bowtie alignment,

                  mpileup -Euf ref.fa sample1.bam sample2.bam sample3.bam and goes on
                  view -bcvg
                  for filtering -D 100

                  And to get DP4 values specifically, I ran mpileup for each sample alone wıth same parameters like
                  mpileup -Euf ref.fa sample1.bam

                  So do you recommend any other parameters to get good DP4 values ?

                  Am I missing any point according to my parameters? Should I also get AF1 to see SNP differences between sample?

                  Should I add removing indel option by putting -I to command line?

                  I really appreciate any help. Thanks Chris

                  Aslihan

                  Comment


                  • #24
                    Hello Dear Chris

                    We received our exome data and now i have 2 files (snps and indels) in text format.
                    I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! Can i use annovar for next stage?? its header is not suitable for annovar ?!

                    #$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
                    chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
                    chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
                    chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
                    chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0


                    Best

                    Comment


                    • #25
                      @aslihan

                      To get better data, I'd recommend first to use BWA or Bowtie2 rather than the original Bowtie. See this post for the latest info about these different alignment programs:

                      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                      In terms of DP4 - these are pretty much set based on the alignments you get from bowtie/bwa, although you can filter them to make sure you get reads that have good alignments (-q flag - see below) only and also for bases that are good quality (-Q flag below)

                      -q INT skip alignments with mapQ smaller than INT [0]
                      -Q INT skip bases with baseQ/BAQ smaller than INT [13]

                      In GATK, they set -Q to be 17 by default.

                      The definition of AF1 is this:

                      AF1 EM [expectation maximum] estimate of the site allele frequency of the strongest non-reference allele.

                      there is a section about this on the mpileup page:



                      "the procedure to estimate AFS is:
                      bcftools view -NIbl cond.txt data.bcf > cond.bcf
                      bcftools view -cGP cond2 cond.bcf > round1.vcf 2> round1.afs
                      bcftools view -cGP round1.afs cond.bcf > /dev/null 2> round2.afs
                      bcftools view -cGP round2.afs cond.bcf > /dev/null 2> round3.afs
                      ......
                      until the AFS converges, which usually takes less than 10 rounds of EM iterations. The first command line above extracts sites in cond.txt for efficiency in later steps. Option -P specifies the initial AFS (in SNP calling, this is prior), which can be a file (as in the 3rd and 4th command lines) or 'full', 'cond2' or 'flat' (as in the 2nd command line). Choosing the right initial AFS helps accuracy and reduces iterations and potential overfitting"

                      For the -I option, this is only relevant if you are interested in SNP's only. Sometimes indels can be relevant in exome data, so it's probably worth not setting -I.

                      Chris

                      Comment


                      • #26
                        Originally posted by afaghalavi View Post
                        Hello Dear Chris

                        We received our exome data and now i have 2 files (snps and indels) in text format.
                        I copy and paste a part of that in below. Please let me know what is next stage for data analysis and what shall I do ??!!! Can i use annovar for next stage?? its header is not suitable for annovar ?!

                        #$ COLUMNS seq_name pos bcalls_used bcalls_filt ref Q(snp) max_gt Q(max_gt) max_gt|poly_site Q(max_gt|poly_site) A_used C_used G_used T_used
                        chr1 12783 2 0 G 24 AA 5 AA 5 2 0 0 0
                        chr1 13057 3 1 G 3 GG 4 CG 31 0 1 2 0
                        chr1 13351 1 0 T 1 TT 10 GT 3 0 0 1 0
                        chr1 14673 2 0 G 32 CC 5 CC 5 0 2 0 0


                        Best
                        Do you know what software made this data? I think annovar can start from VCF files - so some of your data in that format could be converted to something like this in VCF (some of the columns in your data need to be explained a bit more to go into the VCF format though):

                        #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
                        chr1 12783 . G A
                        chr1 13057 . G C
                        chr1 13351 . T G
                        chr1 14673 . G C

                        e.g., it looks like they are specifying hetorozygous or homozygous SNP's in this way: "AC" or "CC" (where the reference base is A). In VCF, they would say things like ref=A, alt=C, genotype=0/1 for "AC" or genotype=1/1 for "CC". And sometimes maybe the best one is things like ref=G, best allele=GG, but I can't tell from your file format without some more explanation.

                        Chris

                        Comment


                        • #27
                          You might also consider the BSNP Bayesian Genotype caller. It's been tested on Illumina, 454, SOLiD and Sanger human alignments ans has some technology specific bias correction. It requires a samtools pileup as input, but is fully Bayesian, considers both alignment and sequence quality ans doesn't bias towards the reference, and was designed for comparing data from differing technologies. If its helpful, have a look at: http://compgen.bscb.cornell.edu/GPhoCS/BSNP/
                          -- Brad

                          Comment


                          • #28
                            Originally posted by RDW View Post
                            If you have matched somatic tumour and germline data, see also:



                            Note, however, that the MuTect somatic SNP caller is still in restricted beta, and the somatic indel detector is based on the old GATK Indel Genotyper v2 (superceded for all other purposes by the Unified Genotyper's indel mode).
                            Hi,

                            Do you know how to annotate the output from MuTect? I have 3800 mutation calls and I am stuck for almost a day..

                            Comment


                            • #29
                              Originally posted by cjp View Post
                              Two SNP and indel callers that you can search for in seqAnswers are samtools mpileup:



                              and GATK:



                              sections: 5.1, 5.4 (Unified Genotyper) and 5.5.

                              Chris
                              does mpileup detect SNPs, indels and the like via finding regions of high homology to each other during alignments or does it look at the chromatographic data and detect peak-under-peak and offset peaks to come up with alternate calls for regions? Or something else? If it doesn't do peak-under-peak and offsets, is there a tool out there that DOES?

                              My base data is in ab1, but I'm assuming that can be converted to whatever format is needed.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X