Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioben
    Junior Member
    • Sep 2010
    • 6

    newbler problem

    Hi, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one? Thanks ...
    Last edited by bioben; 09-30-2010, 07:49 PM.
  • bioben
    Junior Member
    • Sep 2010
    • 6

    #2
    Forgot to say that I am trying to assemble ~10 million 454 ESTs and ~1 million sanger ESTs. I also tried CAP3 and TGICL. They all output identical sequences more or less in the contigs and singlets files.

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #3
      Originally posted by bioben View Post
      Hi, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one? Thanks ...
      This is the gsAssembler (Newbler) saying that it believes there are two isoforms of the gene, one being shorter than the other. Is it correct?? That's where your biological expertise comes in. Personally I would bet a large number of donuts that it's not correct. gsAssembler seems to be overzealous in finding isoforms.

      Comment

      • bioben
        Junior Member
        • Sep 2010
        • 6

        #4
        Thanks, kmcarr. I think you are right. Probably they are splicing variants.

        Then how about singlets? I tried to find them back by parsing the 454ReadStatus.txt file. The resulting singlets file also contains many identical reads. To me, they are supposed to be assembled as one and show up in the isotigs file. Do people usually care about singlets or not? Thanks ...

        Comment

        • westerman
          Rick Westerman
          • Jun 2008
          • 1104

          #5
          Originally posted by bioben View Post
          Then how about singlets? I tried to find them back by parsing the 454ReadStatus.txt file. The resulting singlets file also contains many identical reads. To me, they are supposed to be assembled as one and show up in the isotigs file. Do people usually care about singlets or not? Thanks ...
          I suspect that the singletons are not assembled together simply because they are identical and thus considered to be technical duplicates. It is hard to have a contig made up of exactly one identical read. If the reads overlap then they could be assembled. Unfortunately do not know of a 454 file that describes which reads are true singletons and which are duplicate singletons.

          Comment

          • Jeremy
            Senior Member
            • Nov 2009
            • 190

            #6
            Hi bioben
            I think you should read this thread: Detection of alternative splicing events from 454 output
            it should answer a lot of questions

            Comment

            • martin2
              Member
              • Nov 2010
              • 42

              #7
              Originally posted by westerman View Post
              I suspect that the singletons are not assembled together simply because they are identical and thus considered to be technical duplicates. It is hard to have a contig made up of exactly one identical read. If the reads overlap then they could be assembled. Unfortunately do not know of a 454 file that describes which reads are true singletons and which are duplicate singletons.
              I don't think so. Singletons are read from region poorly covered by emPCR. also, if there were reads having an overlap but when they were trimmed or there were some sequencing errors, newbler did not find the overlap. Set these before you start assembly in 454AssemblyProject.xml:

              <minimumReadLength>45</minimumReadLength>
              <overlapSeedStep>1</overlapSeedStep>
              <overlapMinMatchLength>60</overlapMinMatchLength>
              <overlapMinMatchIdentity>96</overlapMinMatchIdentity>
              <ripMode>true</ripMode>

              Make a new cDNA assembly, do not re-run it from the current assembly directory because in my opinion newbler does not re-compute the overlaps and hence not all changes will kick in. With these settings I got 50% more assembled contigs than with loose defaults!

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 06:09 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              33 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              43 views
              0 reactions
              Last Post SEQadmin2  
              Working...