Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gevielr
    Member
    • Oct 2013
    • 14

    Selecting most abundant transcript per gene (Trinity, RSEM)

    Is anyone aware of a program or simple script to select the most abundant transcript (for ex., based on FPKM values) for each gene, from a Trinity assembled transcriptome that has been run through RSEM? I have the RSEM output file RSEM.isoform.results that looks like this:

    transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

    comp1000093_c0_seq1 comp1000093_c0 257 180.57 2.00 0.33 0.31 100.00
    comp1000100_c0_seq1 comp1000100_c0 308 231.21 4.00 0.51 0.49 100.00
    comp1000106_c0_seq1 comp1000106_c0 279 202.37 2.00 0.29 0.28 100.00
    comp135533_c0_seq1 comp135533_c0 233 156.94 0.00 0.00 0.00 0.00
    comp135533_c0_seq2 comp135533_c0 288 211.31 4.00 0.56 0.54 48.65
    comp135533_c0_seq3 comp135533_c0 235 158.90 0.00 0.00 0.00 0.00
    comp135533_c0_seq4 comp135533_c0 426 349.02 7.00 0.60 0.57 51.35

    As well as a fasta file with all the transcripts.

    So I would want to end of with a fasta file with only a single transcript_id per gene_id.
  • lzy5117830
    Junior Member
    • May 2014
    • 1

    #2
    Hello!gevieir! You can chose the RSEM.gene.results.

    Comment

    • Bang_Didi
      Junior Member
      • Sep 2014
      • 5

      #3
      Hi gevielr..
      I just wondering how to run RSEM correctly, sorry if I am not helping your problem as I am still newbie here.

      I tried doing RSEM calculation of my transcripts, but somehow it did not work ( I ran it using RSEM/1.12.15).

      Firstly I prepared the reference:

      -bash-4.1$ rsem-prepare-reference trinity_out_dir.Trinity.fasta refAB
      rsem-synthesis-reference-transcripts refAB 0 0 trinity_out_dir.Trinity.fasta
      Transcript Information File is generated!
      Group File is generated!
      Chromosome List File is generated!
      Extracted Sequences File is generated!

      rsem-preref refAB.transcripts.fa 0 refAB -l 125
      Refs.makeRefs finished!
      Refs.saveRefs finished!
      refAB.idx.fa is generated!
      refAB.n2g.idx.fa is generated!

      I have several output files from this process (refAB.n2g.idx.fa; refAB.seq; refAB.ti; refAB.transcripts.fa and a few more), sadly I have no idea which one should I use for the reference when I have to run this command:

      rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name

      and then I just run the RSEm-calculate-expression as follows with :
      -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB

      and here was the output that I got :

      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq-bas-bas-bas-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fast-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!


      Any suggestion what went wrong based on your experience running RSEM?

      Thanks
      Didi

      Comment

      • kopi-o
        Senior Member
        • Feb 2008
        • 319

        #4
        This line looks like your answer:

        Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2

        You have put A_1_C4HUHACXX_AGTCAA_L003_R1.fastq instead of A_1_C4HUHACXX_AGTCAA_L003_R2.fastq as the reverse read

        Comment

        • Bang_Didi
          Junior Member
          • Sep 2014
          • 5

          #5
          Thanks kopi-o

          I have corrected that line but still could not get through it.. here is my log file:
          -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
          bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
          Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
          Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
          [samopen] no @SQ lines in the header.
          [sam_read1] missing header? Abort!
          "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!
          -bash-4.1$


          Any suggestion? Did I provide correct information for the RSEM? or something wrong with my RSEM installation?

          Thanks
          Didi

          Comment

          • Dario1984
            Senior Member
            • Jun 2011
            • 166

            #6
            There is no software available to allows you to do that. I read the isoforms file into R, then group the isoforms by the gene names of the BLAST hits, then choose the isoform I want. The gene_id is unreliable for identifying genes, because multiple IDs can be of the same gene. You should identify the genes by BLAST matching.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            8 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            44 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            104 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            125 views
            0 reactions
            Last Post SEQadmin2  
            Working...