Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly

    Hi everyone, i need to know what is the real meaning of the concept "Unigene" in a de novo assembly context?

    Best regards!

  • #2
    De novo assembly

    I don't think there is a de novo assembly concept of 'Unigene'.

    I think you may be confusing UniGene



    with unitig

    Download Whole-Genome Shotgun Assembler for free. Celera Assembler (CA) is a whole-genome shotgun (WGS) assembler for the reconstruction of genomic DNA sequence from WGS sequencing data.

    Comment


    • #3
      @mastal, unigene is a concept in transcriptome assembly, exactly the same as in the NCBI definition.

      @mruizm, a unigene is a hypothetical gene represented by a cluster of similar transcripts that are thought to be isoforms in a de-novo transcriptome assembly.

      see for example this Safflower transcriptome paper:
      We obtained a total of 4.69 Gb in clean nucleotides comprising 52,119,104 clean sequencing reads, 195,320 contigs, and 120,778 unigenes. Based on similarity searches with known proteins, we annotated 70,342 of the unigenes (about 58% of the identified unigenes) with cut-off E-values of 10−5. In total, 21,943 of the safflower unigenes were found to have COG classifications, and BLAST2GO assigned 26,332 of the unigenes to 1,754 GO term annotations. In addition, we assigned 30,203 of the unigenes to 121 KEGG pathways.
      I think it's a confusing name, because there's also the concept of a unigene in the phylogenetic context, where it refers to a gene which always occurs in a single copy in any genome.
      Last edited by Blahah404; 09-13-2013, 03:24 AM.

      Comment


      • #4
        Yeah! I agree with you @Blahah404 because it's redundant that concept, thanks for your answer!

        Comment


        • #5
          @mruizm I also just remembered that some authors define unigenes as all contigs + unassembled reads.

          See for example http://www.biomedcentral.com/1471-2164/14/465:
          The notions of contig and singleton are straightforward for perfect assemblies: a contig is any sequence produced by two or more overlapping reads, while singletons are the remaining isolated reads. By contrast, the assembler we compare with produces a variety of output types: first, portions of overlapping reads are assembled into “contigs” representing putative exons. Groups of contigs that appear to constitute a single gene are then arranged to form “isotigs” representing putative splice variants of the gene. Note that an isotig may consist of only a single contig. When this splice variant reconstruction fails, some “orphan” contigs may be unused in isotigs. Thus, unique sequence in a Newbler assembly is represented by unassembled singleton reads, (orphan) contigs, and isotigs. For our purposes we consider both Newbler orphan contigs and isotigs as unique assembled sequence comparable to perfectly assembled contigs. We shall refer to this combined set of orphan contigs and isotigs as c-isotigs. Further, we shall refer to the combined set of perfect contigs and singletons (and non-perfect c-isotigs and singletons) for a single assembly as the set of unigenes.
          So there are at least two conflicting definitions in use for transcriptome assembly - and to add to the confusion they are pretty much opposites! The first involves collapsing the contigs, the second adds to them.

          Comment


          • #6
            Originally posted by Blahah404 View Post
            @mruizm I also just remembered that some authors define unigenes as all contigs + unassembled reads.

            See for example http://www.biomedcentral.com/1471-2164/14/465:


            So there are at least two conflicting definitions in use for transcriptome assembly - and to add to the confusion they are pretty much opposites! The first involves collapsing the contigs, the second adds to them.
            See also: http://www.plosone.org/article/info%...l.pone.0038653

            Where they uses the word "unigene" as a single transcript!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X