Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Third Party Software for Colorspace data?

    Hi:

    besides ABI software, what software are people using for colorspace data? I heard some people working with MAQ (is this true?) Can Bowtie work with colorspace data?

    Thanks

  • #2


    there are some others, but I dont think they are using CS correctly.

    Comment


    • #3
      thanks for your response. Have you had experience with NextGENe? It doesnt seem that many people are using it. I'm trying it as well, but it seems to translate CS data to fasta before matching - doesn't this loose the ability to match to mismatches (since CS mismatches change all the subsequent nucleotides)?

      Comment


      • #4
        Disclaimer: I work at CLC bio

        We have just included native color space assembly in our NGS cell software
        Welcome to QIAGEN Digital Insights LabCorp uses QCI and HGMD to improve identification and interpretation of genetic variants within inhereited diseases.Read...


        You can grab a white paper with benchmarks at http://clcbio.com/index.php?id=1368

        Cheers

        Roald

        Comment


        • #5
          I am now trying NextGenE and it seems that it translates colorspace data to fasta first before the analysis. Is this correct? If so, it seems that there is much potential error (0-2) and not a method recommended by ABI. Does this seem right?

          Comment


          • #6
            Thanks Ronald. I'll take a look at CLC bio.

            Comment


            • #7
              Originally posted by doxologist View Post
              I am now trying NextGenE and it seems that it translates colorspace data to fasta first before the analysis. Is this correct? If so, it seems that there is much potential error (0-2) and not a method recommended by ABI. Does this seem right?
              I am not familiar with NextGenE but if they are indeed translating to base-space instead of doing their work within color-space then, yes, there is a great potential for error. Unlike traditional sequencing technologies where a single miscall would only affect that particular base, in the Solid a miscall will affect all downstream bases. Also by not working in color-space then one missing a large strength of the Solid -- great SNP calling.

              Comment


              • #8
                You are both absolutely right that a huge amount of information is lost by aligning SOLiD data in sequence space, rather than in color space.
                The benchmarks we have made (see http://clcbio.com/index.php?id=1368 ) showed that the number of aligned reads increase by over 80% when reads are aligned in color space rather than in sequence space. This example is for reads of length 35 and the tendency will only increase as reads get longer.

                Comment


                • #9
                  Hmm... great.. thanks for the info. Perhaps it is already addressed... how does CLC Bio compare with Zoom and BFAST?

                  Comment


                  • #10
                    Allow me to ask what may be a dumb question...

                    If I "double encode" (to use the ABI term) both my reads and my reference sequence (so that colors are represented by ACGTs), then why can't I use bowtie, blat, blastall or whatever alignment program I like and expect success? Sure there would be some post-alignment work involved in distinguishing biological variants from sequencing errors but I don't see why the alignment itself wouldn't be valid and useful.

                    Thanks

                    Comment


                    • #11
                      Originally posted by Mr Mutundes View Post
                      Allow me to ask what may be a dumb question...

                      If I "double encode" (to use the ABI term) both my reads and my reference sequence (so that colors are represented by ACGTs), then why can't I use bowtie, blat, blastall or whatever alignment program I like and expect success? Sure there would be some post-alignment work involved in distinguishing biological variants from sequencing errors but I don't see why the alignment itself wouldn't be valid and useful.

                      Thanks
                      Hey! Your answer is in Post #7 above!

                      Comment


                      • #12
                        no no no! "Double encoding" doesn't put you in base space!

                        Let me put the question again: a sequence of colors is often represented by digits, but can just as easily be represented by characters ACGT (somewhere in the AB corona lite stuff this is referred to as "double encoding") . If I have a query sequence and a target sequence both encoded this way then because they both "look like" nucleotide sequences they are acceptable as input to standard nucleotide alignment programs. But what is being aligned are two color sequences, not two base sequences. So if there is a color sequencing error the alignment will NOT be perturbed as it would be in an alignment done in " base space". (I think...) So - why can't we use traditional alignment programs?

                        Happy to be corrected!

                        Comment


                        • #13
                          There are at least three problems, Mr Mutundes, with using double-encoded sequences with traditional alignment programs.

                          (1) As I mentioned above, a single color (or double-encoded) change in the start of the sequence will decode to entirely different base sequences.

                          (2) Related to the above, opposite strands do not match. Thus you have to tell your traditional program to align to one strand at a time.

                          (3) Traditional programs expect that a SNP to a single base change. Sequencing errors are also a single base. However in color space (and thus double-encoded space) SNPs are sequential changes and errors are a single change.

                          In summary the problem is not double-encoding per se -- as you point out it should not matter if the alphabet 0, 1, 2, 3 or the alphabet A, C, G, T is used. Rather the problem is that traditional programs do not know how to cope with the power and weakness of color-space.

                          Sitting down in front of a chalkboard with another person does a lot for the 'ah-ha!' discovery moment. Since I can not do that with you I will instead use my next couple of messages as a way to convey the above ideas. I assume that you know how color-space encoding is done by the sequencer. Also for ease of typing I will use runs of 7 bases instead of the normal 25 or 35 or (eventually) more.

                          Comment


                          • #14
                            Single change causes big problems.

                            If I have two reads in color space

                            (1 CS) T3232032
                            (2 CS) T1232032

                            Which are the actual bases in base space

                            (1 BS) ACGTTAG
                            (2 BS) GATCCGA

                            And in double-encoded space without primer trimming:

                            (1 DEN) TTGTGATG
                            (2 DEN) TCGTGATG

                            Or in the more proper primer trimmed double-encoding (since the primer means something different than the double-encoding; e.g., the 'T' primer is actually a 'T' and not a substitute for the number '3'):

                            (1 DET) GTGATG
                            (2 DET) GTGATG

                            So now you take the double-encoded trimmed (DET) reads and put them into a traditional assembler. Congratulations, you have now assembled ACGTTAG and GATCCGA together!

                            Even if you take the double-encoded non-trimmed reads and put them through a traditional assembler then you end up with the same incorrect assembly since 7 of the 8 double-encoded bases align. Note that this percentage is even more against you if you are using 25- or 35-base reads. If you insist that your assembler make exact matches (8 of 8 in this case) then you never get adjacent overlaps and thus no contigs.

                            Comment


                            • #15
                              Opposite strand reads do not align

                              I am using a repetitive sequence here but the same idea is true for non-repeat areas.

                              In color space there are two reads:

                              (CS 1) T0000000
                              (CS 2) T3000000

                              These represent in base space:

                              (BS 1) TTTTTTT
                              (BS 2) AAAAAAA

                              If these are reads on opposite strands then they should align. So let's convert them into double-encoding and put them through a traditional alignment program.

                              (DET 1) AAAAAA
                              (DET 2) AAAAAA

                              Ooops! It is going to be hard to find any alignment that way!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X