Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ulz_peter
    Senior Member
    • Feb 2010
    • 219

    Expression in RNA-seq

    Hi everyone,

    We recently sent some samples for RNA-sequencing (Solid 50bp strand-specific) and we realized a gene that we consider important is not covered with sequencing reads. There are some reads aligning to intronic sequences and one to exonic but it seems that this happened by accident. We analyzed the alignments with Tophat and Cufflinks and not surprisingly the FPKM for this gene is 0.
    Does anyone know the chances that this can happen by technical difficulties (it's tumor tissue and normal tissue of that organ should have that gene expressed quite highly according to UCSC). Most other genes we've looked at had quite a high coverage.
    We got ~100M reads.
    Any suggestions?

    Best regards
  • mgogol
    Senior Member
    • Mar 2008
    • 197

    #2
    The reads that would have aligned there could have been thrown out because they mapped too many places in the genome, is that possible?

    You could test this by cutting up the gene and trying to align it to the genome. Tophat by default will throw out reads that map > 40 places.

    If you think it's expressed, you could test it with qPCR too.

    Comment

    • malachig
      Senior Member
      • Aug 2010
      • 117

      #3
      I agree with mgogol. One possible explanation is mapping, although with 50mers you would think some part of the gene would be mappable. How big is this gene? How many exons?

      Try a BLAT of some of the exons of this gene against the genome (do they hit everywhere?). You could also try mapping reads directly against a database of transcripts to see whether there are any matching reads (this would again suggest that the reason they are not coming through in your Tophat analysis is that they are ambiguous in the genome...). It would also allow you to identify a subset of reads to try different mapping approaches on and maybe eliminate the possibility of some technical issue(s) with your tophat run...

      Comment

      • bioinfosm
        Senior Member
        • Jan 2008
        • 483

        #4
        Another possibility could be small exons, and then junctions come into play. I am not sure is tophat/cufflinks take a de novo approach that overcomes this?
        --
        bioinfosm

        Comment

        • mnkyboy
          Member
          • Mar 2009
          • 87

          #5
          We have seen trimming down to 25 with bowtie and tophat can help with this.

          Comment

          • RockChalkJayhawk
            Senior Member
            • Mar 2009
            • 192

            #6
            One easy way to check if it is a mappability issue it to go to UCSC for hg19,
            click the Mapability link under "Mapping and Sequence Tracks"
            select the 50bp option,
            look at your gene.


            Otherwise, maybe you just found some biology in your experiment.

            Comment

            • malachig
              Senior Member
              • Aug 2010
              • 117

              #7
              Tophat/cufflinks does try to predict novel exons and junctions. I'm not sure how well it handles small exons... Mapping small reads to the genome and then trying to find small exons is a challenging problem. A read that overlaps a small exon may require an alignment into three or more short blocks with potentially large gaps between (e.g. <exon>-intron-<exon>-intron-<exon>). Doing this with full length cDNAs (never mind short reads) can be a difficult. I suspect that most methods in use right now have fairly low sensitivity for short exons. One strategy to at least capture the short exons from known transcripts is to integrate alignment to transcripts and the genome (as advocated in Griffith et al.). This way, reads that hit short exons can be aligned to a transcript sequence without gaps which is much easier. This of course does not work for novel transcripts and relies on the accuracy and completion of transcript annotations for the species being analyzed.

              Comment

              • bioinfosm
                Senior Member
                • Jan 2008
                • 483

                #8
                Exactly, and I think doing alignments to genome and transcripts raises an important question of which one to get priority. There would certainly be reads well aligned to both, then I suppose transcript alignment would get preference.

                Similar issue comes when using a separate reads junction database or contamination database.. given equal alignments of a read to multiple datasets, which one should get preference
                --
                bioinfosm

                Comment

                • malachig
                  Senior Member
                  • Aug 2010
                  • 117

                  #9
                  Regarding the comment that the observation may simply be due to biology. Good point! Once you convince yourself that the observation is not due to an artifact of the analysis (mapability, small exons, etc.) you should definitely consider this.

                  One of the benefits of RNA-seq over microarrays (IMHO) is the excellent signal-to-noise ratio. I have observed many cases where a gene appears to be turned off in one condition and on in another condition (tissue type, treatment, etc.). Even in very deep RNA-seq libraries, i have been amazed to see just how few reads are reported for the 'off' state. And since in this case we have RNA-seq libraries (analyzed by the same method) for alternate conditions that do get covered by reads we can reassure ourselves that the lack of coverage is not due to some artifact of the analysis.

                  For example, consider this data set (ALEXA-seq by Griffith et al) consisting of four cell types (normal luminal breast, normal myepithelial breast, hESCs, and vHMECs). These libraries have ~150 million paired-end 75-mers each. The quality of these libraries was very high. After performing differential expression analysis we can find many examples where a particular gene has 0 reads (or just a few) in one of these conditions, and thousands in one or more of the others. And we can find 'off' genes for any of the four cell types. For example, CCL2 has 0 reads in the vHMECs library and 124,664 in the myoepithelial breast library. Similarly, COL17A1 has 776,907 reads in the vHMECs library but only 142 in the luminal epithelial breast tissue. You can explore further examples in this list of DE genes.

                  Comment

                  • bioinfosm
                    Senior Member
                    • Jan 2008
                    • 483

                    #10
                    Yes. A paired or comparison analysis really reduces a lot of sequencing and mapping biases, so differential expression comparisons are more or less accurate.

                    I really wish to give this trans-Abyss tool a try, seems like a one-stop solution to rna-seq analysis, but has a huge list of pre-reqs and config files to figure out
                    --
                    bioinfosm

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                      Here are nine questions we think about, in roughly the order they matter, before...
                      Yesterday, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    38 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    44 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...