Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pmiguel
    Senior Member
    • Aug 2008
    • 2328

    #16
    Originally posted by FelipeAd View Post
    More specifically, what I did is I used blast2go (blastx) initially and then a Fisher's exact test of my two libraries of interest. I do not know if this was the wrong part but i will try using a blastn of one against another.

    Another issue that i did not mention is that I used very strict criteria for the assembly parameters. Namely, I set the Minimum overlap length to be 90 and the Minimum overlap identity to be 95.

    Maybe this gives highly specific contigs (???) for each library?
    I mean since a small part of the genome is highly expressed in combination with a requirement for strict alignment results could lead to this problem?
    Okay, non-mammal sequence generally does not get very comprehensive annotation from blast2go. It get some but, it is limited. Further, your assembly parameters may have resulted in a lower percentage of your reads being assembled into contigs. So, in effect, you may be looking at a tiny percentage of your total sequence, and it doesn't happen to match.

    Of course that does not rule out your sample getting mixed up with something else in the sequencing core. But blastn will tell you that.

    --
    Phillip

    Comment

    • pmiguel
      Senior Member
      • Aug 2008
      • 2328

      #17
      Originally posted by FelipeAd View Post
      This is my next step. I am in the part of running again all the libraries as one. This is the reason I posted another post asking how I find which reads comrise which contigs. I could not find this in the output files, I did not check the .ace files, though.
      I suppose it is better also to have each contig in a different .ace file, is that right?
      You probably want to use the parameter that forces gsAssembler to put a read into no more than one contig. Its default behavior is to rip reads into parts and assemble those parts into different contigs. You could compensate for the ripping behavior -- with a RPKM approach, or such. But otherwise I would advise that you turn it off.

      .ace is pretty easy to parse. (You can find the specs at phrap.org -- Roche follows them fairly well.)

      Even without doing individual .ace files you can find which reads belong to which contigs with

      egrep '^CO|RD '

      Not that you want to. But the above shows the record structure of .ace is easy to parse.

      --
      Phillip

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #18
        Originally posted by pmiguel View Post
        You probably want to use the parameter that forces gsAssembler to put a read into no more than one contig. Its default behavior is to rip reads into parts and assemble those parts into different contigs. You could compensate for the ripping behavior -- with a RPKM approach, or such. But otherwise I would advise that you turn it off.
        If you are using the cDNA assembly mode of gsAssembler I'm pretty sure that this option is ignored. That is to say it will always allow reads to be split across multiple contigs. This is fundamental to the model that reads may cross exon junctions.

        Comment

        • pmiguel
          Senior Member
          • Aug 2008
          • 2328

          #19
          Originally posted by kmcarr View Post
          If you are using the cDNA assembly mode of gsAssembler I'm pretty sure that this option is ignored. That is to say it will always allow reads to be split across multiple contigs. This is fundamental to the model that reads may cross exon junctions.
          I am talking gsAssembler, not gsMapper.

          The switch works, even in combination with -cdna.

          --
          Phillip

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #20
            Originally posted by pmiguel View Post
            I am talking gsAssembler, not gsMapper.

            The switch works, even in combination with -cdna.

            --
            Phillip
            I am talking about gsAssembler too. I have tested using the -rip option in a -cdna project and can confirm by checking the 454ReadStatus.txt file that there are reads split between contigs.

            Comment

            • dschika
              Member
              • Mar 2010
              • 56

              #21
              The -rip option is in Newbler version 2.5 no longer available with the -cdna option.
              (Newbler says: Warning: The -rip option has no effect for cDNA assembly projects.)

              Comment

              • kmcarr
                Senior Member
                • May 2008
                • 1181

                #22
                Originally posted by dschika View Post
                The -rip option is in Newbler version 2.5 no longer available with the -cdna option.
                (Newbler says: Warning: The -rip option has no effect for cDNA assembly projects.)
                Yes, I was just testing my newly installed 2.5.3 and when I tried to run a -cdna project with the -rip option I got the warning

                I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.
                Last edited by kmcarr; 01-14-2011, 07:53 AM.

                Comment

                • pmiguel
                  Senior Member
                  • Aug 2008
                  • 2328

                  #23
                  Complaint about Newbler intra-read ripping.

                  Originally posted by kmcarr View Post
                  Yes, I was just testing my newly installed 2.5.3 and when I tried to run a -cdna project with the -rip option I got the warning

                  I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.
                  Okay, I guess I'm being forced to accept the Newbler assembly model unless I want to switch to another assembler.

                  But why is it desirable to ignore the information implicit in a read as to what sequence is directly juxtaposed next to another with this ripping behavior?

                  --
                  Phillip

                  Comment

                  • kmcarr
                    Senior Member
                    • May 2008
                    • 1181

                    #24
                    Originally posted by pmiguel View Post
                    Okay, I guess I'm being forced to accept the Newbler assembly model unless I want to switch to another assembler.

                    But why is it desirable to ignore the information implicit in a read as to what sequence is directly juxtaposed next to another with this ripping behavior?

                    --
                    Phillip
                    I would counter that the assembler does not ignore it. It uses this information as it is constructing isotigs by finding valid path traversals across contigs. The valid traversals are defined (in part) by reads which link contig ends.

                    Comment

                    • pmiguel
                      Senior Member
                      • Aug 2008
                      • 2328

                      #25
                      Originally posted by kmcarr View Post
                      I would counter that the assembler does not ignore it. It uses this information as it is constructing isotigs by finding valid path traversals across contigs. The valid traversals are defined (in part) by reads which link contig ends.
                      But that would not be necessary if the assembler had not torn the read apart in the first place. What is the value added by tearing?

                      --
                      Phillip

                      Comment

                      • flxlex
                        Moderator
                        • Nov 2008
                        • 412

                        #26
                        Originally posted by pmiguel View Post
                        But that would not be necessary if the assembler had not torn the read apart in the first place. What is the value added by tearing?
                        The actual tearing does not happen until the contigs are built. Newbler creates a graph of all read alignments, with the contigs as nodes, and reads that go from one to the next as edges. Repeats are one reason for the fact that a genome assembles into a graph in the first place.

                        In a way, creating contigs, i.e. tearing apart the graph and listing the nodes only, is a necessary evil, but the actual assembly is the whole graph. In contrast to other assembler, newbler chooses to tear apart within the reads, instead of assigning reads to a single contig. The 'tearing' information can be used to find which contigs are neighbors of each other.

                        Transcriptome assembly is somewhat special, as each gene is expected to result in a small contig graph of its own. This graph is than traversed in order to create isotigs (transcript variants).

                        Pardon the self-promotion, but I try to explain all of this here and here

                        Comment

                        • FelipeAd
                          Member
                          • Jan 2011
                          • 17

                          #27
                          Originally posted by westerman View Post
                          As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."
                          A more detailed description of my 'metrics' for my four libraries is the following:
                          Bases: 20,000,000
                          contigs: 150
                          Bases in contigs: 100,000

                          Bases: 26,000,000
                          contigs: 220
                          Bases in contigs: 140,000

                          Bases: 25,000,000
                          contigs: 80
                          Bases in contigs: 40,000

                          Bases: 30,000,000
                          contigs: 120
                          Bases in contigs: 60,000


                          I assume again there is something wrong with my assembly unless there is something that i did not take into consideration

                          Comment

                          • dschika
                            Member
                            • Mar 2010
                            • 56

                            #28
                            Originally posted by kmcarr View Post
                            I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.
                            v2.3 didn't ignore it. I started a thread about that a while ago...


                            FelipeAd:
                            What is the numAlignedReads or numAlignedBases in your 454NewblerMetrics.txt file (section consenusResults) ?

                            Comment

                            • FelipeAd
                              Member
                              • Jan 2011
                              • 17

                              #29
                              Originally posted by dschika View Post
                              FelipeAd:
                              What is the numAlignedReads or numAlignedBases in your 454NewblerMetrics.txt file (section consenusResults) ?
                              The number of aligned reads range from 75% to 90% for all my libraries. But maybe i would not count on this very much since the output includes contigs of length even 2nt. That is why i refer only to the number of bases that are included in 'large' contigs only

                              Comment

                              • Jeremy
                                Senior Member
                                • Nov 2009
                                • 190

                                #30
                                Since your samples all come from the same plant then an assembly using all of the samples together will give you the most information. This has downstream benefits also as you will have only one set of Isotigs to annotate using Blast2GO or similar.

                                I have done something very similar and mapped the raw reads from each sample back against the contigs from the combined assembly in order to get read counts. Summing read counts for each contig used in an isotig or isogroup gives you isotig and isogroup read counts. Since gsMapper only maps each read once it should get around the problem of multiple contig assignment that you run into by just using the output file (although potentially introduces another problem where some reads will not be counted at all).

                                Also note that the 454Contigs.fna file has an error in the sequence where it appends the previous contigs' sequence for status=isotig contigs, as mentioned about half way through this thread:
                                Pyrosequencing in picotiter plates, custom arrays for enrichment/decomplexing. (Roche)


                                This thread is somewhat related to a question I have recently posted:
                                Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

                                Comment

                                Latest Articles

                                Collapse

                                • GATTACAT
                                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by GATTACAT
                                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                  Yesterday, 11:43 AM
                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM
                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-30-2026, 05:37 AM
                                0 responses
                                9 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-26-2026, 11:10 AM
                                0 responses
                                18 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                52 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                110 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...