Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioinfosm
    Senior Member
    • Jan 2008
    • 483

    SNP calling on 454 data

    Anyone has ideas on how to make variation calls on 454 re-sequencing data?

    perhaps using the Alldiffs or HCDiffs files from gsmapper software? or some other tools. I believe there needs to be some downstream analysis after Marth lab's mosaik tool, in order to get variation positions and % calls for A C G Ts
    --
    bioinfosm
  • cariaso
    Member
    • Jan 2008
    • 31

    #2
    I'd hoped we were working on similar things, but it seems not. Your problem seem to be more about recognizing novel snps, which is substantially different from my need to recognized named snps.

    Specifically I need to turn the PGP10 exome fasta into a series of dbSNP rs#s and report observed genotypes. Results will be tab delimited and look something like

    Since this is about recognizing named entities, I'd like to extend it to also recognize non-SNP features such as Huntington's, and possibly CNVs.

    Sorry I can't be more helpful, but if anyone has code or advice on either topic I'm interested in both.

    Comment

    • Tom Bair
      Member
      • Oct 2008
      • 28

      #3
      We are working with this, we use mostly the HCDiffs file with alot of post processing. Key things we look at are read depth (hcdiffs is a depth of 3, 2 one way 1 the other)I would say 5 is a better minimum, 15 if you are looking for hets. We also filter for known snps using the dbsnp track from ucsc database and if it is in an exon (also from ucsc) since most people I am working with are looking at nimblegen capture experiments, primarily focused on exons. If you are looking outside exons conservation score appears somewhat useful.

      don't know if that helps at all

      Comment

      • bioinfosm
        Senior Member
        • Jan 2008
        • 483

        #4
        Thanks Tom, that was helpful.

        Any others looking for SNPs from 454 data? I heard brute blast approach with no gaps also works! lots of try-it-out-yourself
        --
        bioinfosm

        Comment

        • timread
          Member
          • Oct 2008
          • 14

          #5
          We are primarily looking for SNPs in bacterial genomes (ie no heterozygotes). For a first look we parse the HCDifs file for differences with >85% agreement. We then proceed to validation. Most of the single base insertions and deletions turn out to be false positives.

          Comment

          • Tom Bair
            Member
            • Oct 2008
            • 28

            #6
            timread,

            Could you give some parameters on read depth for the false vs true positives? Or do you find no correlation.

            Thanks

            Tom

            Comment

            • timread
              Member
              • Oct 2008
              • 14

              #7
              No correlation I can see in the differences called by newbler runmapper that we validated (which are generally high quality calls). I dont think we have a large enough sample size though. We have noted trends in the raw output from runmapper for calls that fall underneath our cutoof filter. Like a large number of 1 bp insertions and deletions are <25-fold read coverage and <50% concordance.

              tim

              Comment

              • Josliu
                Junior Member
                • Nov 2008
                • 4

                #8
                SNP calling for 454 data

                You may use NextGENe software to call SNPs using 454 data. The software links the calling to dbSNP database if GenBank format is provided. SoftGenetics may provide a demo to use NextGENe to your own data.

                josliu

                Comment

                • Layla
                  Member
                  • Sep 2008
                  • 58

                  #9
                  Capture and beyond

                  This is quite a tricky process, especially without the support of bioinformaticians. The downstream analysis is much more complex than carrying out the capture array itself. The HCDiffs file does seem very promising for extracting useful information for SNPs.

                  Tim, could please say what you mean when you say that you parse the HCDifs file for ""differences with >85% agreement"". and also the kind of validation you do? As I am also certain that alot of our indels will be false positives.
                  (Thankyou)
                  Has anybody attempted denovo contig assembly from their capture array data?

                  Layla

                  Comment

                  • timread
                    Member
                    • Oct 2008
                    • 14

                    #10
                    Originally posted by Layla View Post
                    Tim, could please say what you mean when you say that you parse the HCDifs file for ""differences with >85% agreement"". and also the kind of validation you do? As I am also certain that alot of our indels will be false positives.


                    Layla
                    Layla - by '85% agreement', I mean 85% of the 454 reads agree with the variant call. This is the final column on the header line of the HCDifs file. Verification is by Sanger sequencing.

                    Comment

                    • Layla
                      Member
                      • Sep 2008
                      • 58

                      #11
                      Thank you Tim, I realized what you meant 2 seconds after I had posted the question! Yes, I have been focusing on that file and using diffs > 75% agreement. Cheers, Layla

                      Comment

                      • RockChalkJayhawk
                        Senior Member
                        • Mar 2009
                        • 192

                        #12
                        denovo contig assembly from capture array

                        Originally posted by Layla View Post
                        This is quite a tricky process, especially without the support of bioinformaticians. The downstream analysis is much more complex than carrying out the capture array itself. The HCDiffs file does seem very promising for extracting useful information for SNPs.

                        Tim, could please say what you mean when you say that you parse the HCDifs file for ""differences with >85% agreement"". and also the kind of validation you do? As I am also certain that alot of our indels will be false positives.
                        (Thankyou)
                        Has anybody attempted denovo contig assembly from their capture array data?

                        Layla
                        Layla,

                        Have you found anyone that has done the contig assembly? I'm curious...

                        Comment

                        • Layla
                          Member
                          • Sep 2008
                          • 58

                          #13
                          Nope, Sorry! I am no longer working on this project

                          Comment

                          • Tuxido
                            Member
                            • Jun 2009
                            • 22

                            #14
                            We also use the hcdiffs in combination with our own downstream analysis where we annotate the data with known SNPs and other useful info. Seems to work fine as long as you have sufficient coverage and there's not too many variants close to each other. With lower coverage you start getting more false positives but you also start missing variants. Actually we once did a comparison with a SNP array and the HCDiffs of version 1.0 of the mapper software and that didn't look that good, as we were missing quite a few variants.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Pathogen Surveillance with Advanced Genomic Tools
                              by seqadmin




                              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                              Today, 11:48 AM
                            • seqadmin
                              New Genomics Tools and Methods Shared at AGBT 2025
                              by seqadmin


                              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                              The Headliner
                              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                              03-03-2025, 01:39 PM
                            • seqadmin
                              Investigating the Gut Microbiome Through Diet and Spatial Biology
                              by seqadmin




                              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                              02-24-2025, 06:31 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-20-2025, 05:03 AM
                            0 responses
                            26 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-19-2025, 07:27 AM
                            0 responses
                            33 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-18-2025, 12:50 PM
                            0 responses
                            25 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-03-2025, 01:15 PM
                            0 responses
                            190 views
                            0 reactions
                            Last Post seqadmin  
                            Working...