Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA de novo assembly - blasts - KEGG - GO

    Hello,

    I am a phd candidate to bioninformatics and with (almost) 0 guidance. Seeking help here.. I was asked to do a de novo RNA transcriptome assembly from a total RNA sequencing. After fastqc i trimmed my original fastq and then ran trinity. So i got my trinity_trimmed.fasta. So, some of the things i was asked to do are:

    1) fill out a table like this one :

    | total number | total length(nt) | mean length(nt) | N50 | total consensus sequences | Distinct Clusters | Distinct Singletons

    Contig
    ______

    Unigene

    I used TrinityStats.pl and got this :

    ## Counts of transcripts, etc.
    ################################
    Total trinity 'genes': 87177
    Total trinity transcripts: 169974
    Percent GC: 40.18

    ########################################
    Stats based on ALL transcript contigs:
    ########################################

    Contig N10: 3290
    Contig N20: 2503
    Contig N30: 2049
    Contig N40: 1713
    Contig N50: 1413

    Median contig length: 529
    Average contig: 869.67
    Total assembled bases: 147821426

    #####################################################
    ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
    #####################################################

    Contig N10: 3087
    Contig N20: 2301
    Contig N30: 1816
    Contig N40: 1414
    Contig N50: 1029

    Median contig length: 348
    Average contig: 632.11
    Total assembled bases: 55105774

    My question has 2 parts : a) can i fill out this table with this information? b) Some people use cap3 assembly tool. I have already done that too in case i need it. Is that the way to go ? I need to check the quality of trinity_trimmed.fasta ?

    for cap3 i also used TrinityStats.pl and got this :

    for contigs:

    Total trinity 'genes': 23017
    Total trinity transcripts: 23017
    Percent GC: 40.42

    ########################################
    Stats based on ALL transcript contigs:
    ########################################

    Contig N10: 3885
    Contig N20: 3082
    Contig N30: 2598
    Contig N40: 2254
    Contig N50: 1971

    Median contig length: 1318
    Average contig: 1522.23
    Total assembled bases: 35037102

    - note: not reporting gene-based longest isoform info since couldn't parse Trinity accession info.

    for singletons:

    ## Counts of transcripts, etc.
    ################################
    Total trinity 'genes': 67695
    Total trinity transcripts: 81478
    Percent GC: 38.77

    ########################################
    Stats based on ALL transcript contigs:
    ########################################

    Contig N10: 1906
    Contig N20: 1347
    Contig N30: 1007
    Contig N40: 751
    Contig N50: 572

    Median contig length: 333
    Average contig: 490.70
    Total assembled bases: 39981353

    #####################################################
    ## Stats based on ONLY LONGEST ISOFORM per 'GENE':
    #####################################################

    Contig N10: 1853
    Contig N20: 1284
    Contig N30: 917
    Contig N40: 671
    Contig N50: 508

    Median contig length: 317
    Average contig: 461.01
    Total assembled bases: 31207973


    2) blastp/blastx in excel files.

    i should use -outfmt 16 ?

    ( also hmmscan/pfam is needed for KEGG / GO terms ? )

    3) Do a KEGG and GO analysis. I should annotate the assembly ( but which one the trinity_trimmed.fasta or the cap3 one ? ) using Trinotate and then go with GOseq for GO? Or i could use blast2go, using the blastx/blatp files with -outfmt 16? (7 days trial version ) . Kegg also in blast2go or i could something llike this : https://www.kegg.jp/blastkoala/ ?

    i know i was long, sorry about that.

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:58 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:18 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:04 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-03-2024, 06:55 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Working...
X