Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    Slider - Maximum use of probability information for alignment of short sequence reads

    A new paper describing an improved solexa aligner / SNP caller just came out. Looks interesting.

    *****************************

    Slider - Maximum use of probability information for alignment of short sequence reads and SNP detection.


    Malhis N, Butterfield Y, Ester M, Jones SJ.

    Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada.

    MOTIVATION: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this paper, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. RESULTS: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality. CONTACT: nmalhis *(<AT>)*bcgsc.ca Supplementary information and availability: http://www.bcgsc.ca/platform/bioinfo/software/slider.
  • bioinfosm
    Senior Member
    • Jan 2008
    • 483

    #2
    Looks interesting.. using .prb instead of the fastq. There are tools that optionally take .prb files as input, but I am not sure if they use probability information for each base!
    --
    bioinfosm

    Comment

    • nmalhis
      Member
      • Nov 2008
      • 11

      #3
      from the author

      This release of Slider was prepared for the Oxford Bioinformatics paper reviewers as a proof of concept:


      I’m working now on a beta release with much improvements and capabilities. This new release should be ready by the end of this month (Nov. 2008).

      Nawar Malhis
      Last edited by nmalhis; 11-05-2008, 09:15 AM.

      Comment

      • nmalhis
        Member
        • Nov 2008
        • 11

        #4
        SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage:

        is now available from:

        High quality SNP calling using Illumina data at minimal coverage


        Sorry for the delay,

        Nawar

        Comment

        • ohofmann
          Member
          • Jan 2009
          • 37

          #5
          Also going to follow up via email, but just in case: Illumina seems to be moving towards a change in the .prb files; the new workflow does not seem to produce the four-channel probabilities anymore.

          Is there a workaround? This would also affect other probabilistic aligners.

          -- Oliver

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #6
            Oliver,

            You can rerun the base calling, starting the pipeline with Bustard using the intensity files generated by RTA. Bustard will accept as optional arguments --with-seq, --with-qval, --with-sig2 and --with-prb which will instruct Bustard to generate these legacy files. You can also add these arguments to the goat.py command line if you are restarting the pipeline from the image analysis step.

            Comment

            • ohofmann
              Member
              • Jan 2009
              • 37

              #7
              Glad to hear, thanks for the information! Going to report back on how SliderII handles very deep sequence coverage soon-ish.

              -- Oliver

              Comment

              • sparks
                Senior Member
                • Mar 2008
                • 126

                #8
                Hi,

                Novoalign will take prb format read files. It will use prb values as probabilities both when generating seeds and in calculating penalties for the Needleman-Wunsch alignment. This usually gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls.

                Colin

                Comment

                • zee
                  NGS specialist
                  • Apr 2008
                  • 249

                  #9
                  Wouldnt it be better in the long run to use calibrated base calls rather than second-guessing with the PRB base calls?
                  The 1000 genomes project recalibrated their FASTQ files using prior alignment information to improve the data quality.


                  Originally posted by sparks View Post
                  gives more alignments than running off the fastq files but has been criticised by some as the Illumina fastq files have been quality calibrated but the prb files are not. I have never seen any test comparing SNP calls with Genotype that would show whether using prb files improves SNP calls.
                  Colin

                  Comment

                  • ohofmann
                    Member
                    • Jan 2009
                    • 37

                    #10
                    Colin, good meeting you at ISMB! Should have some comparative data for FASTQ vs PRB files soon. Zee, tend to agree, but we are looking at data with 2+ SNPs per read on average, and in many cases at high frequency, and from more than two clones. Was hoping that in these cases the underlying PRB data might be informative.

                    Comment

                    • nmalhis
                      Member
                      • Nov 2008
                      • 11

                      #11
                      I’d like to add that Slider II calibrate prb data before calling SNPs.
                      Regarding the storage space of prb files, since these files contain reparative data, compressing these files to .gz while reduce the size by 7 to 10 times. Slider II reads .gz files.
                      When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good.

                      Nawar

                      Comment

                      • ohofmann
                        Member
                        • Jan 2009
                        • 37

                        #12
                        Originally posted by nmalhis View Post
                        When we have more than 2 SNPs in a read, Slider II, like other SNPs calling tools, filter dense SNPs so results might not be good.

                        Nawar
                        Yep, that's going to be a problem no matter what tool we use -- four to five SNPs per read on average. Having said that, as we are only aligning against 10kb of reference sequence most reads should still be align-able. Now, if we could stop the genomic center from deleting the intensity and PRB files after each run...

                        Comment

                        • nmalhis
                          Member
                          • Nov 2008
                          • 11

                          #13
                          "four to five SNPs per read on average" and "10kb of reference sequence ", This is about 10% of the reference is unknown, I would assemble these reads since the reference is short enough not to have a repeat issues.

                          Comment

                          • ohofmann
                            Member
                            • Jan 2009
                            • 37

                            #14
                            Interesting. Hadn't even thought about reference-based or de novo assemblies as an alternative. Will keep it in mind, thanks again!

                            Comment

                            • korifuenc7933
                              Junior Member
                              • Sep 2010
                              • 1

                              #15
                              Very usefully... I heard about using .prb instead of the fastq. Now working on it.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...