Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • newbler problem

    Hi, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one? Thanks ...
    Last edited by bioben; 09-30-2010, 07:49 PM.

  • #2
    Forgot to say that I am trying to assemble ~10 million 454 ESTs and ~1 million sanger ESTs. I also tried CAP3 and TGICL. They all output identical sequences more or less in the contigs and singlets files.

    Comment


    • #3
      Originally posted by bioben View Post
      Hi, I found the 454Isotigs.fna file contains many sequences that are 100% identical but with different lengths (i.e. one sequence contains another shorter one). Isn't this supposed not to happen. I mean they should be assembled as one? Thanks ...
      This is the gsAssembler (Newbler) saying that it believes there are two isoforms of the gene, one being shorter than the other. Is it correct?? That's where your biological expertise comes in. Personally I would bet a large number of donuts that it's not correct. gsAssembler seems to be overzealous in finding isoforms.

      Comment


      • #4
        Thanks, kmcarr. I think you are right. Probably they are splicing variants.

        Then how about singlets? I tried to find them back by parsing the 454ReadStatus.txt file. The resulting singlets file also contains many identical reads. To me, they are supposed to be assembled as one and show up in the isotigs file. Do people usually care about singlets or not? Thanks ...

        Comment


        • #5
          Originally posted by bioben View Post
          Then how about singlets? I tried to find them back by parsing the 454ReadStatus.txt file. The resulting singlets file also contains many identical reads. To me, they are supposed to be assembled as one and show up in the isotigs file. Do people usually care about singlets or not? Thanks ...
          I suspect that the singletons are not assembled together simply because they are identical and thus considered to be technical duplicates. It is hard to have a contig made up of exactly one identical read. If the reads overlap then they could be assembled. Unfortunately do not know of a 454 file that describes which reads are true singletons and which are duplicate singletons.

          Comment


          • #6
            Hi bioben
            I think you should read this thread: Detection of alternative splicing events from 454 output
            it should answer a lot of questions

            Comment


            • #7
              Originally posted by westerman View Post
              I suspect that the singletons are not assembled together simply because they are identical and thus considered to be technical duplicates. It is hard to have a contig made up of exactly one identical read. If the reads overlap then they could be assembled. Unfortunately do not know of a 454 file that describes which reads are true singletons and which are duplicate singletons.
              I don't think so. Singletons are read from region poorly covered by emPCR. also, if there were reads having an overlap but when they were trimmed or there were some sequencing errors, newbler did not find the overlap. Set these before you start assembly in 454AssemblyProject.xml:

              <minimumReadLength>45</minimumReadLength>
              <overlapSeedStep>1</overlapSeedStep>
              <overlapMinMatchLength>60</overlapMinMatchLength>
              <overlapMinMatchIdentity>96</overlapMinMatchIdentity>
              <ripMode>true</ripMode>

              Make a new cDNA assembly, do not re-run it from the current assembly directory because in my opinion newbler does not re-compute the overlaps and hence not all changes will kick in. With these settings I got 50% more assembled contigs than with loose defaults!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin




                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin


                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:53 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-10-2024, 07:30 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-03-2024, 09:45 AM
              0 responses
              204 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-03-2024, 08:54 AM
              0 responses
              213 views
              0 likes
              Last Post seqadmin  
              Working...
              X