Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by FelipeAd View Post
    More specifically, what I did is I used blast2go (blastx) initially and then a Fisher's exact test of my two libraries of interest. I do not know if this was the wrong part but i will try using a blastn of one against another.

    Another issue that i did not mention is that I used very strict criteria for the assembly parameters. Namely, I set the Minimum overlap length to be 90 and the Minimum overlap identity to be 95.

    Maybe this gives highly specific contigs (???) for each library?
    I mean since a small part of the genome is highly expressed in combination with a requirement for strict alignment results could lead to this problem?
    Okay, non-mammal sequence generally does not get very comprehensive annotation from blast2go. It get some but, it is limited. Further, your assembly parameters may have resulted in a lower percentage of your reads being assembled into contigs. So, in effect, you may be looking at a tiny percentage of your total sequence, and it doesn't happen to match.

    Of course that does not rule out your sample getting mixed up with something else in the sequencing core. But blastn will tell you that.

    --
    Phillip

    Comment


    • #17
      Originally posted by FelipeAd View Post
      This is my next step. I am in the part of running again all the libraries as one. This is the reason I posted another post asking how I find which reads comrise which contigs. I could not find this in the output files, I did not check the .ace files, though.
      I suppose it is better also to have each contig in a different .ace file, is that right?
      You probably want to use the parameter that forces gsAssembler to put a read into no more than one contig. Its default behavior is to rip reads into parts and assemble those parts into different contigs. You could compensate for the ripping behavior -- with a RPKM approach, or such. But otherwise I would advise that you turn it off.

      .ace is pretty easy to parse. (You can find the specs at phrap.org -- Roche follows them fairly well.)

      Even without doing individual .ace files you can find which reads belong to which contigs with

      egrep '^CO|RD '

      Not that you want to. But the above shows the record structure of .ace is easy to parse.

      --
      Phillip

      Comment


      • #18
        Originally posted by pmiguel View Post
        You probably want to use the parameter that forces gsAssembler to put a read into no more than one contig. Its default behavior is to rip reads into parts and assemble those parts into different contigs. You could compensate for the ripping behavior -- with a RPKM approach, or such. But otherwise I would advise that you turn it off.
        If you are using the cDNA assembly mode of gsAssembler I'm pretty sure that this option is ignored. That is to say it will always allow reads to be split across multiple contigs. This is fundamental to the model that reads may cross exon junctions.

        Comment


        • #19
          Originally posted by kmcarr View Post
          If you are using the cDNA assembly mode of gsAssembler I'm pretty sure that this option is ignored. That is to say it will always allow reads to be split across multiple contigs. This is fundamental to the model that reads may cross exon junctions.
          I am talking gsAssembler, not gsMapper.

          The switch works, even in combination with -cdna.

          --
          Phillip

          Comment


          • #20
            Originally posted by pmiguel View Post
            I am talking gsAssembler, not gsMapper.

            The switch works, even in combination with -cdna.

            --
            Phillip
            I am talking about gsAssembler too. I have tested using the -rip option in a -cdna project and can confirm by checking the 454ReadStatus.txt file that there are reads split between contigs.

            Comment


            • #21
              The -rip option is in Newbler version 2.5 no longer available with the -cdna option.
              (Newbler says: Warning: The -rip option has no effect for cDNA assembly projects.)

              Comment


              • #22
                Originally posted by dschika View Post
                The -rip option is in Newbler version 2.5 no longer available with the -cdna option.
                (Newbler says: Warning: The -rip option has no effect for cDNA assembly projects.)
                Yes, I was just testing my newly installed 2.5.3 and when I tried to run a -cdna project with the -rip option I got the warning

                I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.
                Last edited by kmcarr; 01-14-2011, 07:53 AM.

                Comment


                • #23
                  Complaint about Newbler intra-read ripping.

                  Originally posted by kmcarr View Post
                  Yes, I was just testing my newly installed 2.5.3 and when I tried to run a -cdna project with the -rip option I got the warning

                  I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.
                  Okay, I guess I'm being forced to accept the Newbler assembly model unless I want to switch to another assembler.

                  But why is it desirable to ignore the information implicit in a read as to what sequence is directly juxtaposed next to another with this ripping behavior?

                  --
                  Phillip

                  Comment


                  • #24
                    Originally posted by pmiguel View Post
                    Okay, I guess I'm being forced to accept the Newbler assembly model unless I want to switch to another assembler.

                    But why is it desirable to ignore the information implicit in a read as to what sequence is directly juxtaposed next to another with this ripping behavior?

                    --
                    Phillip
                    I would counter that the assembler does not ignore it. It uses this information as it is constructing isotigs by finding valid path traversals across contigs. The valid traversals are defined (in part) by reads which link contig ends.

                    Comment


                    • #25
                      Originally posted by kmcarr View Post
                      I would counter that the assembler does not ignore it. It uses this information as it is constructing isotigs by finding valid path traversals across contigs. The valid traversals are defined (in part) by reads which link contig ends.
                      But that would not be necessary if the assembler had not torn the read apart in the first place. What is the value added by tearing?

                      --
                      Phillip

                      Comment


                      • #26
                        Originally posted by pmiguel View Post
                        But that would not be necessary if the assembler had not torn the read apart in the first place. What is the value added by tearing?
                        The actual tearing does not happen until the contigs are built. Newbler creates a graph of all read alignments, with the contigs as nodes, and reads that go from one to the next as edges. Repeats are one reason for the fact that a genome assembles into a graph in the first place.

                        In a way, creating contigs, i.e. tearing apart the graph and listing the nodes only, is a necessary evil, but the actual assembly is the whole graph. In contrast to other assembler, newbler chooses to tear apart within the reads, instead of assigning reads to a single contig. The 'tearing' information can be used to find which contigs are neighbors of each other.

                        Transcriptome assembly is somewhat special, as each gene is expected to result in a small contig graph of its own. This graph is than traversed in order to create isotigs (transcript variants).

                        Pardon the self-promotion, but I try to explain all of this here and here

                        Comment


                        • #27
                          Originally posted by westerman View Post
                          As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."
                          A more detailed description of my 'metrics' for my four libraries is the following:
                          Bases: 20,000,000
                          contigs: 150
                          Bases in contigs: 100,000

                          Bases: 26,000,000
                          contigs: 220
                          Bases in contigs: 140,000

                          Bases: 25,000,000
                          contigs: 80
                          Bases in contigs: 40,000

                          Bases: 30,000,000
                          contigs: 120
                          Bases in contigs: 60,000


                          I assume again there is something wrong with my assembly unless there is something that i did not take into consideration

                          Comment


                          • #28
                            Originally posted by kmcarr View Post
                            I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.
                            v2.3 didn't ignore it. I started a thread about that a while ago...


                            FelipeAd:
                            What is the numAlignedReads or numAlignedBases in your 454NewblerMetrics.txt file (section consenusResults) ?

                            Comment


                            • #29
                              Originally posted by dschika View Post
                              FelipeAd:
                              What is the numAlignedReads or numAlignedBases in your 454NewblerMetrics.txt file (section consenusResults) ?
                              The number of aligned reads range from 75% to 90% for all my libraries. But maybe i would not count on this very much since the output includes contigs of length even 2nt. That is why i refer only to the number of bases that are included in 'large' contigs only

                              Comment


                              • #30
                                Since your samples all come from the same plant then an assembly using all of the samples together will give you the most information. This has downstream benefits also as you will have only one set of Isotigs to annotate using Blast2GO or similar.

                                I have done something very similar and mapped the raw reads from each sample back against the contigs from the combined assembly in order to get read counts. Summing read counts for each contig used in an isotig or isogroup gives you isotig and isogroup read counts. Since gsMapper only maps each read once it should get around the problem of multiple contig assignment that you run into by just using the output file (although potentially introduces another problem where some reads will not be counted at all).

                                Also note that the 454Contigs.fna file has an error in the sequence where it appends the previous contigs' sequence for status=isotig contigs, as mentioned about half way through this thread:
                                Pyrosequencing in picotiter plates, custom arrays for enrichment/decomplexing. (Roche)


                                This thread is somewhat related to a question I have recently posted:
                                Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 11:49 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                61 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X