Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR input to RSEM

    Hi

    I am using STAR to alingn my paired-end reads and use the resulting sam files as input to RSEM.
    However, I find the sam files generated by STAR to be incompatible with RSEM.
    Based on the RSEM group discussions, I ran my STAR with the following options:

    Code:
    --alignIntronMax 1 --alignIntronMin 2  --scoreDelOpen -10000  --scoreInsOpen -10000 --alignEndsType EndToEnd
    This becasue RSEM requires input that has been mapped to the transcriptome instead of the geneome.

    However, I still cannot get RSEM to accept these files as the input. It says:
    Code:
    .sam -t 3 -tag XM
    [samopen] SAM header is present: 13 sequences.
    Number of transcripts does not match! Please check if the reads are aligned to a transcript set (instead of a genome)!
    I checked my sam files using the RSEM-validator, and its output says that the sam files are verified.

    Can someone kindly suggest a solution.

    Thanks.

    Best
    M

  • #2
    Given that there are only 13 contigs to which your mapped your reads (according to the header at least), did you actually map to the transcriptome? My guess would be that the reference you mapped against and the one you're feeding to rsem aren't the same.

    Comment


    • #3
      Hi Devon
      (again! )
      Thanks for your reply.

      For generating the sam files (using STAR) I used the genome.fa file from UCSC and the assocaited gene annotation file (.gtf). Accordingly, the mapping is based on these annotations.

      When I input these sam files to RSEM it throws me the error (I am not using RSEM for mapping, which it does via Bowtie, but for calculating expression.)
      For running RSEM for calculating expression, I need to use the rsem-prepare-reference option in which I give the same USCS two files (genome.fa and .gtf) as the input. It then generates a trancriptome.fa (and others) which are used in rsem-calculate-expression.

      And I am using -p 8 (not --p8, it was a typo, my mistake).

      Suggestions welcome.

      Thanks again and Best
      M

      Comment


      • #4
        Then the SAM/BAM file that you're giving to rsem doesn't have mappings to tanscripts, but to the genome. That's not what rsem is expecting. I expect that the normal route would be to use rsem-prepare-reference with the --no-bowtie option to generate the reference_name.idx.fa file, index that with STAR (don't use a GTF, since that would no longer be meaningful), map to the result and then use that as input.

        Comment


        • #5
          I am going to revive this long dead thread because I am seeing something similar.

          I've tracked it down to a mismatch in the number of transcripts that STAR gets as opposed to RSEM. There are off by 1. RSEM finds 1 more transcript than STAR.

          I looked at other species where we have done the exact same steps to generate references, and in those cases, the transcript numbers agree. The rsem-star combo works just find for those species.

          Any thoughts?

          Comment


          • #6
            Originally posted by Gators View Post
            I am going to revive this long dead thread because I am seeing something similar.

            I've tracked it down to a mismatch in the number of transcripts that STAR gets as opposed to RSEM. There are off by 1. RSEM finds 1 more transcript than STAR.

            I looked at other species where we have done the exact same steps to generate references, and in those cases, the transcript numbers agree. The rsem-star combo works just find for those species.

            Any thoughts?
            Could you post an example for this problem? Are you mapping to the transcriptome, or converting genomic alignments to transcriptomic with
            --quantMode TranscriptomeSAM ?

            Cheers
            Alex

            Comment


            • #7
              Hi Alex,

              Here is the nitty gritty details

              STAR reference made like this:
              /opt/bin/STAR-2.3 --runMode genomeGenerate --genomeFastaFiles mm10.fa --genomeDir ./ --sjdbGTFfile ../this.gtf --sjdbOverhang 200 --runThreadN 12
              Followed by:
              /opt/pkg/rsem-1.2.14/rsem-prepare-reference --gtf ../this.gtf mm10.fa rsem --no-polyA > rsem_generate.log 2>&1
              The rsem_generate.log has this key line:
              61026 transcripts are extracted and 0 transcripts are omitted.
              Meanwhile the star log says:
              found:
              61025 transcripts
              I then run this STAR alignment command:
              /opt/bin/STAR-2.4 --readFilesIn file1_1.clipped.fastq.gz file1_2.clipped.fastq.gz --genomeDir [path to reference] --runThreadN 22 --readFilesCommand zcat --chimSegmentMin 15 --chimJunctionOverhangMin 17 --chimScoreMin 1 --chimScoreDropMax 20 --chimScoreSeparation 10 --quantMode TranscriptomeSAM --outSAMattributes NH --alignSJoverhangMin 8 --sjdbScore 3 --outSAMunmapped Within --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonicalUnannotated --sjdbOverhang 200 --outFileNamePrefix file1
              RSEM is then run like this:
              rsem-calculate-expression -p 22 --fragment-length-mean 159.559312802775 --fragment-length-sd 30.1145388424399 --no-bam-output --paired-end --bam file1.star.transcriptome.bam [path_to_reference] file1.rsem
              The error is this:
              Number of reference sequences does not match! Please check if the reads are aligned to a transcript set (instead of a genome)!
              It is worth noting that for human and rat these steps work just fine. However, the RSEM and STAR transcript counts that come out of the reference file creation step are equal for those two species. So again I go back to RSEM and STAR finding differing numbers of transcripts...in spite of using the same reference and the same gtf.

              Comment


              • #8
                Okay I've narrowed it down to the transcript that shows up twice in RSEM and once in STAR...it is an error in our GTF in which the transcript shows up twice. STAR was smart enough to only make one, but RSEM made both.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 02:46 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-07-2024, 06:57 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Working...
                X