Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • i got two different trinity.fasta by using two version of trinity

    hi guys,
    we have RNA-seq data sequenced of an insect in 2012, and assembled them by using one of the Trinity 2011 versions at the time (got the trinity.fasta) . now i analyzed the sequence length distribution in this file , and got the redult as follows:

    Code:
    kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Downloads/gene.fa
    stats.sh: 52: stats.sh: Bad substitution
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 65: stats.sh: source: not found
    stats.sh: 66: stats.sh: parseXmx: not found
    A	C	G	T	N	IUPAC	Other	GC	GC_stdev
    0.2875	0.2118	0.2067	0.2940	0.0000	0.0000	0.0000	0.4186	0.0894
    
    Main genome scaffold total:         	144777
    Main genome contig total:           	144777
    Main genome scaffold sequence total:	67.067 MB
    Main genome contig sequence total:  	67.067 MB  	0.000% gap
    Main genome scaffold N/L50:         	15033/1.075 KB
    Main genome contig N/L50:           	15033/1.075 KB
    Max scaffold length:                	24.081 KB
    Max contig length:                  	24.081 KB
    Number of scaffolds > 50 KB:        	0
    % main genome in scaffolds > 50 KB: 	0.00%
    
    
    Minimum 	Number        	Number        	Total         	Total         	Scaffold
    Scaffold	of            	of            	Scaffold      	Contig        	Contig  
    Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
    --------	--------------	--------------	--------------	--------------	--------
        All 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
        100 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
        250 	        56,929	        56,929	    53,670,774	    53,670,774	 100.00%
        500 	        30,137	        30,137	    44,518,044	    44,518,044	 100.00%
       1 KB 	        16,207	        16,207	    34,757,505	    34,757,505	 100.00%
     2.5 KB 	         4,183	         4,183	    15,894,549	    15,894,549	 100.00%
       5 KB 	           588	           588	     3,942,668	     3,942,668	 100.00%
      10 KB 	            28	            28	       353,549	       353,549	 100.00%
    in the file the min seq. length is 101; the longest one is 22181.

    past several days i used the latest trinity version- trinityrnaseq-2.0.6, assembled the raw data once again(after low quality reads teamed of course). this time the length distribution of the file is :

    Code:
    kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Desktop/data_from_server/2015_6_04_assembled_CD_and_CK/Trinity.fasta
    stats.sh: 52: stats.sh: Bad substitution
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 65: stats.sh: source: not found
    stats.sh: 66: stats.sh: parseXmx: not found
    A	C	G	T	N	IUPAC	Other	GC	GC_stdev
    0.2932	0.2083	0.2114	0.2871	0.0000	0.0000	0.0000	0.4197	0.0823
    
    Main genome scaffold total:         	56130
    Main genome contig total:           	56130
    Main genome scaffold sequence total:	57.963 MB
    Main genome contig sequence total:  	57.963 MB  	0.000% gap
    Main genome scaffold N/L50:         	9036/1.861 KB
    Main genome contig N/L50:           	9036/1.861 KB
    Max scaffold length:                	30.733 KB
    Max contig length:                  	30.733 KB
    Number of scaffolds > 50 KB:        	0
    % main genome in scaffolds > 50 KB: 	0.00%
    
    
    Minimum 	Number        	Number        	Total         	Total         	Scaffold
    Scaffold	of            	of            	Scaffold      	Contig        	Contig  
    Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
    --------	--------------	--------------	--------------	--------------	--------
        All 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
        100 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
        250 	        50,921	        50,921	    56,731,956	    56,731,956	 100.00%
        500 	        29,025	        29,025	    49,248,962	    49,248,962	 100.00%
       1 KB 	        18,003	        18,003	    41,494,038	    41,494,038	 100.00%
     2.5 KB 	         5,541	         5,541	    21,499,015	    21,499,015	 100.00%
       5 KB 	           900	           900	     5,895,754	     5,895,754	 100.00%
      10 KB 	            35	            35	       466,389	       466,389	 100.00%
      25 KB 	             1	             1	        30,733	        30,733	 100.00%
    in this second trinity.fasta file the min sequence length is 224; the longest one is 30733.

    my questions are :
    1. why two assembly results are different,e.g. the former version assembled lots of sequences in length range from 101 to ~200 ? but the minimum length of the assembled sequence by using latest version of trinity is 224?
    2. which trinity.fasta file should i use in the following analysis process ? why?

    could u please give me little bit detailed explanation ?!
    thanks.

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X