Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • i got two different trinity.fasta by using two version of trinity

    hi guys,
    we have RNA-seq data sequenced of an insect in 2012, and assembled them by using one of the Trinity 2011 versions at the time (got the trinity.fasta) . now i analyzed the sequence length distribution in this file , and got the redult as follows:

    Code:
    kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Downloads/gene.fa
    stats.sh: 52: stats.sh: Bad substitution
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 65: stats.sh: source: not found
    stats.sh: 66: stats.sh: parseXmx: not found
    A	C	G	T	N	IUPAC	Other	GC	GC_stdev
    0.2875	0.2118	0.2067	0.2940	0.0000	0.0000	0.0000	0.4186	0.0894
    
    Main genome scaffold total:         	144777
    Main genome contig total:           	144777
    Main genome scaffold sequence total:	67.067 MB
    Main genome contig sequence total:  	67.067 MB  	0.000% gap
    Main genome scaffold N/L50:         	15033/1.075 KB
    Main genome contig N/L50:           	15033/1.075 KB
    Max scaffold length:                	24.081 KB
    Max contig length:                  	24.081 KB
    Number of scaffolds > 50 KB:        	0
    % main genome in scaffolds > 50 KB: 	0.00%
    
    
    Minimum 	Number        	Number        	Total         	Total         	Scaffold
    Scaffold	of            	of            	Scaffold      	Contig        	Contig  
    Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
    --------	--------------	--------------	--------------	--------------	--------
        All 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
        100 	       144,777	       144,777	    67,066,997	    67,066,997	 100.00%
        250 	        56,929	        56,929	    53,670,774	    53,670,774	 100.00%
        500 	        30,137	        30,137	    44,518,044	    44,518,044	 100.00%
       1 KB 	        16,207	        16,207	    34,757,505	    34,757,505	 100.00%
     2.5 KB 	         4,183	         4,183	    15,894,549	    15,894,549	 100.00%
       5 KB 	           588	           588	     3,942,668	     3,942,668	 100.00%
      10 KB 	            28	            28	       353,549	       353,549	 100.00%
    in the file the min seq. length is 101; the longest one is 22181.

    past several days i used the latest trinity version- trinityrnaseq-2.0.6, assembled the raw data once again(after low quality reads teamed of course). this time the length distribution of the file is :

    Code:
    kurban@kurban-X550VC:~/Downloads/bbmap$ sh stats.sh in=~/Desktop/data_from_server/2015_6_04_assembled_CD_and_CK/Trinity.fasta
    stats.sh: 52: stats.sh: Bad substitution
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 59: stats.sh: [[: not found
    stats.sh: 65: stats.sh: source: not found
    stats.sh: 66: stats.sh: parseXmx: not found
    A	C	G	T	N	IUPAC	Other	GC	GC_stdev
    0.2932	0.2083	0.2114	0.2871	0.0000	0.0000	0.0000	0.4197	0.0823
    
    Main genome scaffold total:         	56130
    Main genome contig total:           	56130
    Main genome scaffold sequence total:	57.963 MB
    Main genome contig sequence total:  	57.963 MB  	0.000% gap
    Main genome scaffold N/L50:         	9036/1.861 KB
    Main genome contig N/L50:           	9036/1.861 KB
    Max scaffold length:                	30.733 KB
    Max contig length:                  	30.733 KB
    Number of scaffolds > 50 KB:        	0
    % main genome in scaffolds > 50 KB: 	0.00%
    
    
    Minimum 	Number        	Number        	Total         	Total         	Scaffold
    Scaffold	of            	of            	Scaffold      	Contig        	Contig  
    Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
    --------	--------------	--------------	--------------	--------------	--------
        All 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
        100 	        56,130	        56,130	    57,962,594	    57,962,594	 100.00%
        250 	        50,921	        50,921	    56,731,956	    56,731,956	 100.00%
        500 	        29,025	        29,025	    49,248,962	    49,248,962	 100.00%
       1 KB 	        18,003	        18,003	    41,494,038	    41,494,038	 100.00%
     2.5 KB 	         5,541	         5,541	    21,499,015	    21,499,015	 100.00%
       5 KB 	           900	           900	     5,895,754	     5,895,754	 100.00%
      10 KB 	            35	            35	       466,389	       466,389	 100.00%
      25 KB 	             1	             1	        30,733	        30,733	 100.00%
    in this second trinity.fasta file the min sequence length is 224; the longest one is 30733.

    my questions are :
    1. why two assembly results are different,e.g. the former version assembled lots of sequences in length range from 101 to ~200 ? but the minimum length of the assembled sequence by using latest version of trinity is 224?
    2. which trinity.fasta file should i use in the following analysis process ? why?

    could u please give me little bit detailed explanation ?!
    thanks.

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-12-2024, 02:20 PM
0 responses
17 views
0 likes
Last Post seqadmin  
Working...
X