Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • FelipeAd
    Member
    • Jan 2011
    • 17

    Compare contigs between libraries (Newbler)

    Hello all,
    I am using a 454 Flx Titanium sequencer to sequence data from 4 libraries. The libraries represent the control and the stressed transcriptome of 2 tissues of the same plant.
    Using Newbler, I compared the contigs that I generated but I find no sequence similarity among them.

    Does anybody have an idea of why this is happening?

    Thank you in advance.
  • dschika
    Member
    • Mar 2010
    • 56

    #2
    Hey FelipeAd,

    Can you perhaps explain what you have done in more detail? How have you compared the contigs? Do you use -cdna option in the assembly?

    Comment

    • FelipeAd
      Member
      • Jan 2011
      • 17

      #3
      Thank you for your reply.

      Well, I analyzed the 4 libraries separately with Newbler but the sequence of the resulting contigs had no similarity.

      Comment

      • pmiguel
        Senior Member
        • Aug 2008
        • 2328

        #4
        No similarity using what? Sequence-level using some program like BLAST? Or do you mean something else.

        How many contigs do you have and what total amount of contig bases?

        Two libraries from the same organism should have some sequence similarity.

        --
        Phillip

        Comment

        • FelipeAd
          Member
          • Jan 2011
          • 17

          #5
          I compared the libraries using blastx similarity and the result was no similarity, particularly in one pair of libraries (in the other there was similarity but with a very relaxed threshold).

          Comment

          • pmiguel
            Senior Member
            • Aug 2008
            • 2328

            #6
            Hi FelipeAd,
            We are trying to help you out, but you are not answering all the questions we ask. Would you please re-read my previous post and answer all the questions?

            --
            Phillip

            Comment

            • FelipeAd
              Member
              • Jan 2011
              • 17

              #7
              Well, I have a small number of contigs. The total amount of (large, meaning >100 bp length) contigs were 120 and 140 for each library, with a total number of bases in these contigs 78,000 and 100,000 respectively.

              A large number of bases, though, seems to be unassembled (?), at least in large contigs since i start with almost 20,000,000 bases for each library...

              Comment

              • pmiguel
                Senior Member
                • Aug 2008
                • 2328

                #8
                My guess is that you are not sequencing very deep into those libraries. But, that said, I would expect at least some of the contigs to be similar unless this "stress" is very drastically changing the transcription profile of the tissue you are studying.

                Just to be clear, if you make a blast database from one set of contigs and blast the other set of contigs against it (just blastn -- tblastx should not be necessary), you are saying you get no significant hits? Or is your analysis different from what I describe here?

                --
                Phillip
                Last edited by pmiguel; 01-12-2011, 08:41 AM.

                Comment

                • westerman
                  Rick Westerman
                  • Jun 2008
                  • 1104

                  #9
                  Originally posted by FelipeAd View Post
                  I compared the libraries using blastx similarity and the result was no similarity, particularly in one pair of libraries (in the other there was similarity but with a very relaxed threshold).
                  If you are truly using blastx to compare the contigs of two libraries then I believe you are using the wrong tool. blastx compares nucleotide sequences against a protein database. You would want to use either tblastx or blastn to compare nucleotides vs. nucleotides.

                  As Phillip says, comparing tissues of the same plant to each other should bring up some level of similarity. Ergo you are either (1) doing something wrong or (2) created libraries (and thus sequences) from two different species. Even in case #2 I would suspect that some similarity would occur so I am leaning towards case #1.

                  Perhaps if you can write out the specific steps and commands you are using then we can help.

                  As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

                  Comment

                  • pmiguel
                    Senior Member
                    • Aug 2008
                    • 2328

                    #10
                    I took his post to mean that most of the reads had not, in fact, assembled. If this is the case, then you take the (say) 1% most highly expressed transcripts from a tissue that is stressed and one that is not and blast them against each other it is not completely impossible that you would not get a hit. But it would be surprising.

                    --
                    Phillip

                    Comment

                    • flxlex
                      Moderator
                      • Nov 2008
                      • 412

                      #11
                      What about assembling all reads in one go, and tearing apart which contig contains reads from what library afterward (using the ace file)? Requires some scripting though...

                      Comment

                      • pmiguel
                        Senior Member
                        • Aug 2008
                        • 2328

                        #12
                        Actually that is what we do at Purdue. I like it because you annotate the contigs and then have read counts and depths from each component sample. So you get DGE-like results with zero reference information.

                        Of course Newbler's read tearing behavior makes this questionable -- but you can force Newbler to assemble a read into only one contig.

                        For someone who can't script though, this would be a tough one. The assembly would be trivial if you use the GUI to gsAssembler. But then figuring out which read came from which sample afterward...

                        --
                        Phillip

                        Comment

                        • FelipeAd
                          Member
                          • Jan 2011
                          • 17

                          #13
                          Originally posted by pmiguel View Post
                          My guess is that you are not sequencing very deep into those libraries. But, that said, I would expect at least some of the contigs to be similar unless this "stress" is very drastically changing the transcription profile of the tissue you are studying.

                          Just to be clear, if you make a blast database from one set of contigs and blast the other set of contigs against it (just blastn -- tblastx should not be necessary), you are saying you get no significant hits? Or is your analysis different from what I describe here?

                          --
                          Phillip
                          Thank you for your reply. My initial approach (using blastx) was to apply a Fisher's exact test between the two libraries, say stress-unstressed. In this test i get to significant maches. I am running now the blastn to see if i can get some similarity.

                          I am very concerned, though, about the non-similarity even with the blastx, since both of the libraries were blasted against known databases with several GO hits.

                          Comment

                          • FelipeAd
                            Member
                            • Jan 2011
                            • 17

                            #14
                            Originally posted by flxlex View Post
                            What about assembling all reads in one go, and tearing apart which contig contains reads from what library afterward (using the ace file)? Requires some scripting though...
                            This is my next step. I am in the part of running again all the libraries as one. This is the reason I posted another post asking how I find which reads comrise which contigs. I could not find this in the output files, I did not check the .ace files, though.
                            I suppose it is better also to have each contig in a different .ace file, is that right?

                            Comment

                            • FelipeAd
                              Member
                              • Jan 2011
                              • 17

                              #15
                              Originally posted by westerman View Post
                              If you are truly using blastx to compare the contigs of two libraries then I believe you are using the wrong tool. blastx compares nucleotide sequences against a protein database. You would want to use either tblastx or blastn to compare nucleotides vs. nucleotides.

                              As Phillip says, comparing tissues of the same plant to each other should bring up some level of similarity. Ergo you are either (1) doing something wrong or (2) created libraries (and thus sequences) from two different species. Even in case #2 I would suspect that some similarity would occur so I am leaning towards case #1.

                              Perhaps if you can write out the specific steps and commands you are using then we can help.

                              As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

                              More specifically, what I did is I used blast2go (blastx) initially and then a Fisher's exact test of my two libraries of interest. I do not know if this was the wrong part but i will try using a blastn of one against another.

                              Another issue that i did not mention is that I used very strict criteria for the assembly parameters. Namely, I set the Minimum overlap length to be 90 and the Minimum overlap identity to be 95.

                              Maybe this gives highly specific contigs (???) for each library?
                              I mean since a small part of the genome is highly expressed in combination with a requirement for strict alignment results could lead to this problem?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...