Hi all,
I am using Trinity for de novo assembly of mRNA from Illumina Hiseq2000, 100bp pair ends. We get more than 11M clean reads of each sample.and totally 139M clean reads( about 28G clean data). The reads quality is good and all of the bases are above 30, using FastQC.
I did de novo assembly using Trinity with default parameters and got 920k transcripts( 201bp-24477bp; and if we calculate the traanscripts from 500bp ---24477bp, we will get 350k transcripts), After that, I filtered it with CD-HIT-EST and 789k clusters left. There are still too many transcripts.
The clusters ranges from 201bp to 24477bp, While N50 is 2549bp.
I divided these data to two groups, 57.5M and 81M,respectively. I assembled again with the 57.5M reads, This time I get 518387 transcript, 358886 components. The longest is 17339, N50 is 2900. After I filted it by CD-HIT-EST, the Trinity transcript is 443765, components is 358695, N50 is 1618, which is shorter than before.
Does anybody have any suggestion on how to minimize the number of transcript?
Is it OK that N50 become shorter after filtering(2900bp to 1618bp)
I am using Trinity for de novo assembly of mRNA from Illumina Hiseq2000, 100bp pair ends. We get more than 11M clean reads of each sample.and totally 139M clean reads( about 28G clean data). The reads quality is good and all of the bases are above 30, using FastQC.
I did de novo assembly using Trinity with default parameters and got 920k transcripts( 201bp-24477bp; and if we calculate the traanscripts from 500bp ---24477bp, we will get 350k transcripts), After that, I filtered it with CD-HIT-EST and 789k clusters left. There are still too many transcripts.
The clusters ranges from 201bp to 24477bp, While N50 is 2549bp.
I divided these data to two groups, 57.5M and 81M,respectively. I assembled again with the 57.5M reads, This time I get 518387 transcript, 358886 components. The longest is 17339, N50 is 2900. After I filted it by CD-HIT-EST, the Trinity transcript is 443765, components is 358695, N50 is 1618, which is shorter than before.
Does anybody have any suggestion on how to minimize the number of transcript?
Is it OK that N50 become shorter after filtering(2900bp to 1618bp)
Comment