Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting most abundant transcript per gene (Trinity, RSEM)

    Is anyone aware of a program or simple script to select the most abundant transcript (for ex., based on FPKM values) for each gene, from a Trinity assembled transcriptome that has been run through RSEM? I have the RSEM output file RSEM.isoform.results that looks like this:

    transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct

    comp1000093_c0_seq1 comp1000093_c0 257 180.57 2.00 0.33 0.31 100.00
    comp1000100_c0_seq1 comp1000100_c0 308 231.21 4.00 0.51 0.49 100.00
    comp1000106_c0_seq1 comp1000106_c0 279 202.37 2.00 0.29 0.28 100.00
    comp135533_c0_seq1 comp135533_c0 233 156.94 0.00 0.00 0.00 0.00
    comp135533_c0_seq2 comp135533_c0 288 211.31 4.00 0.56 0.54 48.65
    comp135533_c0_seq3 comp135533_c0 235 158.90 0.00 0.00 0.00 0.00
    comp135533_c0_seq4 comp135533_c0 426 349.02 7.00 0.60 0.57 51.35

    As well as a fasta file with all the transcripts.

    So I would want to end of with a fasta file with only a single transcript_id per gene_id.

  • #2
    Hello!gevieir! You can chose the RSEM.gene.results.

    Comment


    • #3
      Hi gevielr..
      I just wondering how to run RSEM correctly, sorry if I am not helping your problem as I am still newbie here.

      I tried doing RSEM calculation of my transcripts, but somehow it did not work ( I ran it using RSEM/1.12.15).

      Firstly I prepared the reference:

      -bash-4.1$ rsem-prepare-reference trinity_out_dir.Trinity.fasta refAB
      rsem-synthesis-reference-transcripts refAB 0 0 trinity_out_dir.Trinity.fasta
      Transcript Information File is generated!
      Group File is generated!
      Chromosome List File is generated!
      Extracted Sequences File is generated!

      rsem-preref refAB.transcripts.fa 0 refAB -l 125
      Refs.makeRefs finished!
      Refs.saveRefs finished!
      refAB.idx.fa is generated!
      refAB.n2g.idx.fa is generated!

      I have several output files from this process (refAB.n2g.idx.fa; refAB.seq; refAB.ti; refAB.transcripts.fa and a few more), sadly I have no idea which one should I use for the reference when I have to run this command:

      rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name

      and then I just run the RSEm-calculate-expression as follows with :
      -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB

      and here was the output that I got :

      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq-bas-bas-bas-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fast-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fa-bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
      bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
      Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2
      Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
      Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
      [samopen] no @SQ lines in the header.
      [sam_read1] missing header? Abort!
      "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!


      Any suggestion what went wrong based on your experience running RSEM?

      Thanks
      Didi

      Comment


      • #4
        This line looks like your answer:

        Warning: Same mate file "A_1_C4HUHACXX_AGTCAA_L003_R1.fastq" appears as argument to both -1 and -2

        You have put A_1_C4HUHACXX_AGTCAA_L003_R1.fastq instead of A_1_C4HUHACXX_AGTCAA_L003_R2.fastq as the reverse read

        Comment


        • #5
          Thanks kopi-o

          I have corrected that line but still could not get through it.. here is my log file:
          -bash-4.1$ rsem-calculate-expression --paired-end A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa AB
          bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -
          Could not locate a Bowtie index corresponding to basename "refAB.transcripts.fa"
          Command: bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta refAB.transcripts.fa
          [samopen] no @SQ lines in the header.
          [sam_read1] missing header? Abort!
          "bowtie -q --phred33-quals -n 2 -e 99999999 -l 25 -I 1 -X 1000 -p 1 -a -m 200 -S refAB.transcripts.fa -1 A_1_C4HUHACXX_AGTCAA_L003_R1.fastq,A_2_C4HUHACXX_AGTTCC_L003_R1.fastq,A_3_C4HUHACXX_ATGTCA_L003_R1.fastq,A_4_C4HUHACXX_CCGTCC_L003_R1.fastq,B_1_C4HUHACXX_GTCCGC_L003_R1.fastq,B_2_C4HUHACXX_GTGAAA_L003_R1.fastq,B_3_C4HUHACXX_GTGGCC_L003_R1.fastq,B_4_C4HUHACXX_GTTTCG_L003_R1.fastq -2 A_1_C4HUHACXX_AGTCAA_L003_R2.fastq,A_2_C4HUHACXX_AGTTCC_L003_R2.fastq,A_3_C4HUHACXX_ATGTCA_L003_R2.fastq,A_4_C4HUHACXX_CCGTCC_L003_R2.fastq,B_1_C4HUHACXX_GTCCGC_L003_R2.fastq,B_2_C4HUHACXX_GTGAAA_L003_R2.fastq,B_3_C4HUHACXX_GTGGCC_L003_R2.fastq,B_4_C4HUHACXX_GTTTCG_L003_R2.fasta | samtools view -S -b -o AB.temp/AB.bam -" failed! Plase check if you provide correct parameters/options for the pipeline!
          -bash-4.1$


          Any suggestion? Did I provide correct information for the RSEM? or something wrong with my RSEM installation?

          Thanks
          Didi

          Comment


          • #6
            There is no software available to allows you to do that. I read the isoforms file into R, then group the isoforms by the gene names of the BLAST hits, then choose the isoform I want. The gene_id is unreliable for identifying genes, because multiple IDs can be of the same gene. You should identify the genes by BLAST matching.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Technologies
              by seqadmin







              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

              Long-Read Sequencing
              Long-read sequencing has...
              12-02-2024, 01:49 PM
            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 12-02-2024, 09:29 AM
            0 responses
            141 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-02-2024, 09:06 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 12-02-2024, 08:03 AM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-22-2024, 07:36 AM
            0 responses
            70 views
            0 likes
            Last Post seqadmin  
            Working...
            X