Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Bang_Didi
    Junior Member
    • Sep 2014
    • 5

    Trinity Assembly

    Dear All..
    As a newbie in transcriptome analysis, I would like to ask a question about doing whole transcriptome assembly using trinity.

    Is it possible for us to get two different transcripts when we assemble our reads by either concatenating the reads or listed the reads (using comma separation as Trinity manual says)?. I am just not sure with the results that I got, It seems that I got different transcripts (can tell this from its size which is different) using these two different method in preparing my reads for the assembly using Trinity. Any thought what went wrong ?

    Cheers
    Didi
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    Trinity is non-deterministic thus some variation between runs of it are expected. Not a lot but some.

    Comment

    • Bang_Didi
      Junior Member
      • Sep 2014
      • 5

      #3
      Thanks for that westerman... Should I worry that the variation will also significantly be expressed when I construct the metrics for the transcripts evaluation?

      Comment

      • Bang_Didi
        Junior Member
        • Sep 2014
        • 5

        #4
        FYI:

        The Trinity stats that I got for the transcript that was built from concatenated data:
        ################################
        ## Counts of transcripts, etc.
        ################################
        Total trinity 'genes': 236322
        Total trinity transcripts: 518647
        Percent GC: 45.98

        ########################################
        Stats based on ALL transcript contigs:
        ########################################

        Contig N10: 8296
        Contig N20: 6856
        Contig N30: 5744
        Contig N40: 4826
        Contig N50: 4031

        Median contig length: 1217
        Average contig: 2100.35
        Total assembled bases: 1,089,337,664


        #####################################################
        ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
        #####################################################

        Contig N10: 6119
        Contig N20: 4351
        Contig N30: 3248
        Contig N40: 2367
        Contig N50: 1635

        Median contig length: 367
        Average contig: 799.05
        Total assembled bases: 188,834,004

        The Trinity stats that I got for the transcript that was built from listing all of the reads using comma separation:

        ################################
        ## Counts of transcripts, etc.
        ################################
        Total trinity 'genes': 244,160
        Total trinity transcripts: 301,140
        Percent GC: 44.75

        ########################################
        Stats based on ALL transcript contigs:
        ########################################

        Contig N10: 6864
        Contig N20: 5185
        Contig N30: 4130
        Contig N40: 3303
        Contig N50: 2581

        Median contig length: 448
        Average contig: 1115.03
        Total assembled bases: 335,781,132


        #####################################################
        ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
        #####################################################

        Contig N10: 5852
        Contig N20: 4184
        Contig N30: 3122
        Contig N40: 2305
        Contig N50: 1623

        Median contig length: 374
        Average contig: 806.19
        Total assembled bases: 196,840,230

        Comment

        • westerman
          Rick Westerman
          • Jun 2008
          • 1104

          #5
          Those variations are more than I would expect and I can see why you are concerned. I'll see if I can fire up a recent Trinity assembly (I almost always use comma separated files) with combined reads and see what differences I get.

          Comment

          • ltutar
            Junior Member
            • Dec 2013
            • 1

            #6
            Dear Bang_Didi,

            Did you make a decision which way is the best comma separation or combining?



            Originally posted by Bang_Didi View Post
            FYI:

            The Trinity stats that I got for the transcript that was built from concatenated data:
            ################################
            ## Counts of transcripts, etc.
            ################################
            Total trinity 'genes': 236322
            Total trinity transcripts: 518647
            Percent GC: 45.98

            ########################################
            Stats based on ALL transcript contigs:
            ########################################

            Contig N10: 8296
            Contig N20: 6856
            Contig N30: 5744
            Contig N40: 4826
            Contig N50: 4031

            Median contig length: 1217
            Average contig: 2100.35
            Total assembled bases: 1,089,337,664


            #####################################################
            ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
            #####################################################

            Contig N10: 6119
            Contig N20: 4351
            Contig N30: 3248
            Contig N40: 2367
            Contig N50: 1635

            Median contig length: 367
            Average contig: 799.05
            Total assembled bases: 188,834,004

            The Trinity stats that I got for the transcript that was built from listing all of the reads using comma separation:

            ################################
            ## Counts of transcripts, etc.
            ################################
            Total trinity 'genes': 244,160
            Total trinity transcripts: 301,140
            Percent GC: 44.75

            ########################################
            Stats based on ALL transcript contigs:
            ########################################

            Contig N10: 6864
            Contig N20: 5185
            Contig N30: 4130
            Contig N40: 3303
            Contig N50: 2581

            Median contig length: 448
            Average contig: 1115.03
            Total assembled bases: 335,781,132


            #####################################################
            ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
            #####################################################

            Contig N10: 5852
            Contig N20: 4184
            Contig N30: 3122
            Contig N40: 2305
            Contig N50: 1623

            Median contig length: 374
            Average contig: 806.19
            Total assembled bases: 196,840,230

            Comment

            • Nanu
              Member
              • Sep 2014
              • 30

              #7
              Greetings to all!

              I would like to know about the reads/kmers per transcripts. As the TrinityStats.pl tells the total assembled bases. contig length. no . of transcripts as longest isoform. So I would like to know about the difference between Trinity.fasta and single.fasta.
              When we execute the TrinityStats.pl , we know about the
              1. Stats based on ONLY LONGEST ISOFORM per 'GENE
              2.Stats based on ALL transcript contigs

              May i know that Trinity.fasta contains all transcripts or it has genes also. ?

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              34 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              99 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              120 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              113 views
              0 reactions
              Last Post SEQadmin2  
              Working...