Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting most abundant transcript per gene (Trinity, RSEM)

    Is anyone aware of a program or simple script to select the most abundant transcript (for ex., based on FPKM values) for each gene, from a Trinity assembled transcriptome that has been run through RSEM? I have the RSEM output file RSEM.isoform.results that looks like this:

    transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

    comp1000093_c0_seq1 comp1000093_c0 257 180.57 2.00 0.33 0.31 100.00
    comp1000100_c0_seq1 comp1000100_c0 308 231.21 4.00 0.51 0.49 100.00
    comp1000106_c0_seq1 comp1000106_c0 279 202.37 2.00 0.29 0.28 100.00
    comp135533_c0_seq1 comp135533_c0 233 156.94 0.00 0.00 0.00 0.00
    comp135533_c0_seq2 comp135533_c0 288 211.31 4.00 0.56 0.54 48.65
    comp135533_c0_seq3 comp135533_c0 235 158.90 0.00 0.00 0.00 0.00
    comp135533_c0_seq4 comp135533_c0 426 349.02 7.00 0.60 0.57 51.35

    As well as a fasta file with all the transcripts.

    So I would want to end of with a fasta file with only a single transcript_id per gene_id.

  • #2
    Hello!gevieir! You can chose the RSEM.gene.results.

    Comment


    • #3
      Hi gevielr..
      I just wondering how to run RSEM correctly, sorry if I am not helping your problem as I am still newbie here.

      I tried doing RSEM calculation of my transcripts, but somehow it did not work ( I ran it using RSEM/1.12.15).

      Firstly I prepared the reference:

      -bash-4.1$ rsem-prepare-reference trinity_out_dir.Trinity.fasta refAB
      rsem-synthesis-reference-transcripts refAB 0 0 trinity_out_dir.Trinity.fasta
      Transcript Information File is generated!
      Group File is generated!
      Chromosome List File is generated!
      Extracted Sequences File is generated!

      rsem-preref refAB.transcripts.fa 0 refAB -l 125
      Refs.makeRefs finished!
      Refs.saveRefs finished!
      refAB.idx.fa is generated!
      refAB.n2g.idx.fa is generated!

      I have several output files from this process (refAB.n2g.idx.fa; refAB.seq; refAB.ti; refAB.transcripts.fa and a few more), sadly I have no idea which one should I use for the reference when I have to run this command:

      rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name

      and then I just run the RSEm-calculate-expression as follows with :
      -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB

      and here was the output that I got :

      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq-bas-bas-bas-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fast-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!


      Any suggestion what went wrong based on your experience running RSEM?

      Thanks
      Didi

      Comment


      • #4
        This line looks like your answer:

        Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2

        You have put A_1_C4HUHACXX_AGTCAA_L003_R1.fastq instead of A_1_C4HUHACXX_AGTCAA_L003_R2.fastq as the reverse read

        Comment


        • #5
          Thanks kopi-o

          I have corrected that line but still could not get through it.. here is my log file:
          -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
          bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
          Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
          Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
          [samopen] no @SQ lines in the header.
          [sam_read1] missing header? Abort!
          "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!
          -bash-4.1$


          Any suggestion? Did I provide correct information for the RSEM? or something wrong with my RSEM installation?

          Thanks
          Didi

          Comment


          • #6
            There is no software available to allows you to do that. I read the isoforms file into R, then group the isoforms by the gene names of the BLAST hits, then choose the isoform I want. The gene_id is unreliable for identifying genes, because multiple IDs can be of the same gene. You should identify the genes by BLAST matching.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM
            • seqadmin
              Non-Coding RNA Research and Technologies
              by seqadmin




              Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

              Nobel Prize for MicroRNA Discovery
              This week,...
              10-07-2024, 08:07 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:09 AM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-24-2024, 06:58 AM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-23-2024, 08:43 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X