Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare contigs between libraries (Newbler)

    Hello all,
    I am using a 454 Flx Titanium sequencer to sequence data from 4 libraries. The libraries represent the control and the stressed transcriptome of 2 tissues of the same plant.
    Using Newbler, I compared the contigs that I generated but I find no sequence similarity among them.

    Does anybody have an idea of why this is happening?

    Thank you in advance.

  • #2
    Hey FelipeAd,

    Can you perhaps explain what you have done in more detail? How have you compared the contigs? Do you use -cdna option in the assembly?

    Comment


    • #3
      Thank you for your reply.

      Well, I analyzed the 4 libraries separately with Newbler but the sequence of the resulting contigs had no similarity.

      Comment


      • #4
        No similarity using what? Sequence-level using some program like BLAST? Or do you mean something else.

        How many contigs do you have and what total amount of contig bases?

        Two libraries from the same organism should have some sequence similarity.

        --
        Phillip

        Comment


        • #5
          I compared the libraries using blastx similarity and the result was no similarity, particularly in one pair of libraries (in the other there was similarity but with a very relaxed threshold).

          Comment


          • #6
            Hi FelipeAd,
            We are trying to help you out, but you are not answering all the questions we ask. Would you please re-read my previous post and answer all the questions?

            --
            Phillip

            Comment


            • #7
              Well, I have a small number of contigs. The total amount of (large, meaning >100 bp length) contigs were 120 and 140 for each library, with a total number of bases in these contigs 78,000 and 100,000 respectively.

              A large number of bases, though, seems to be unassembled (?), at least in large contigs since i start with almost 20,000,000 bases for each library...

              Comment


              • #8
                My guess is that you are not sequencing very deep into those libraries. But, that said, I would expect at least some of the contigs to be similar unless this "stress" is very drastically changing the transcription profile of the tissue you are studying.

                Just to be clear, if you make a blast database from one set of contigs and blast the other set of contigs against it (just blastn -- tblastx should not be necessary), you are saying you get no significant hits? Or is your analysis different from what I describe here?

                --
                Phillip
                Last edited by pmiguel; 01-12-2011, 08:41 AM.

                Comment


                • #9
                  Originally posted by FelipeAd View Post
                  I compared the libraries using blastx similarity and the result was no similarity, particularly in one pair of libraries (in the other there was similarity but with a very relaxed threshold).
                  If you are truly using blastx to compare the contigs of two libraries then I believe you are using the wrong tool. blastx compares nucleotide sequences against a protein database. You would want to use either tblastx or blastn to compare nucleotides vs. nucleotides.

                  As Phillip says, comparing tissues of the same plant to each other should bring up some level of similarity. Ergo you are either (1) doing something wrong or (2) created libraries (and thus sequences) from two different species. Even in case #2 I would suspect that some similarity would occur so I am leaning towards case #1.

                  Perhaps if you can write out the specific steps and commands you are using then we can help.

                  As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

                  Comment


                  • #10
                    I took his post to mean that most of the reads had not, in fact, assembled. If this is the case, then you take the (say) 1% most highly expressed transcripts from a tissue that is stressed and one that is not and blast them against each other it is not completely impossible that you would not get a hit. But it would be surprising.

                    --
                    Phillip

                    Comment


                    • #11
                      What about assembling all reads in one go, and tearing apart which contig contains reads from what library afterward (using the ace file)? Requires some scripting though...

                      Comment


                      • #12
                        Actually that is what we do at Purdue. I like it because you annotate the contigs and then have read counts and depths from each component sample. So you get DGE-like results with zero reference information.

                        Of course Newbler's read tearing behavior makes this questionable -- but you can force Newbler to assemble a read into only one contig.

                        For someone who can't script though, this would be a tough one. The assembly would be trivial if you use the GUI to gsAssembler. But then figuring out which read came from which sample afterward...

                        --
                        Phillip

                        Comment


                        • #13
                          Originally posted by pmiguel View Post
                          My guess is that you are not sequencing very deep into those libraries. But, that said, I would expect at least some of the contigs to be similar unless this "stress" is very drastically changing the transcription profile of the tissue you are studying.

                          Just to be clear, if you make a blast database from one set of contigs and blast the other set of contigs against it (just blastn -- tblastx should not be necessary), you are saying you get no significant hits? Or is your analysis different from what I describe here?

                          --
                          Phillip
                          Thank you for your reply. My initial approach (using blastx) was to apply a Fisher's exact test between the two libraries, say stress-unstressed. In this test i get to significant maches. I am running now the blastn to see if i can get some similarity.

                          I am very concerned, though, about the non-similarity even with the blastx, since both of the libraries were blasted against known databases with several GO hits.

                          Comment


                          • #14
                            Originally posted by flxlex View Post
                            What about assembling all reads in one go, and tearing apart which contig contains reads from what library afterward (using the ace file)? Requires some scripting though...
                            This is my next step. I am in the part of running again all the libraries as one. This is the reason I posted another post asking how I find which reads comrise which contigs. I could not find this in the output files, I did not check the .ace files, though.
                            I suppose it is better also to have each contig in a different .ace file, is that right?

                            Comment


                            • #15
                              Originally posted by westerman View Post
                              If you are truly using blastx to compare the contigs of two libraries then I believe you are using the wrong tool. blastx compares nucleotide sequences against a protein database. You would want to use either tblastx or blastn to compare nucleotides vs. nucleotides.

                              As Phillip says, comparing tissues of the same plant to each other should bring up some level of similarity. Ergo you are either (1) doing something wrong or (2) created libraries (and thus sequences) from two different species. Even in case #2 I would suspect that some similarity would occur so I am leaning towards case #1.

                              Perhaps if you can write out the specific steps and commands you are using then we can help.

                              As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

                              More specifically, what I did is I used blast2go (blastx) initially and then a Fisher's exact test of my two libraries of interest. I do not know if this was the wrong part but i will try using a blastn of one against another.

                              Another issue that i did not mention is that I used very strict criteria for the assembly parameters. Namely, I set the Minimum overlap length to be 90 and the Minimum overlap identity to be 95.

                              Maybe this gives highly specific contigs (???) for each library?
                              I mean since a small part of the genome is highly expressed in combination with a requirement for strict alignment results could lead to this problem?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X