Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expression in RNA-seq

    Hi everyone,

    We recently sent some samples for RNA-sequencing (Solid 50bp strand-specific) and we realized a gene that we consider important is not covered with sequencing reads. There are some reads aligning to intronic sequences and one to exonic but it seems that this happened by accident. We analyzed the alignments with Tophat and Cufflinks and not surprisingly the FPKM for this gene is 0.
    Does anyone know the chances that this can happen by technical difficulties (it's tumor tissue and normal tissue of that organ should have that gene expressed quite highly according to UCSC). Most other genes we've looked at had quite a high coverage.
    We got ~100M reads.
    Any suggestions?

    Best regards

  • #2
    The reads that would have aligned there could have been thrown out because they mapped too many places in the genome, is that possible?

    You could test this by cutting up the gene and trying to align it to the genome. Tophat by default will throw out reads that map > 40 places.

    If you think it's expressed, you could test it with qPCR too.

    Comment


    • #3
      I agree with mgogol. One possible explanation is mapping, although with 50mers you would think some part of the gene would be mappable. How big is this gene? How many exons?

      Try a BLAT of some of the exons of this gene against the genome (do they hit everywhere?). You could also try mapping reads directly against a database of transcripts to see whether there are any matching reads (this would again suggest that the reason they are not coming through in your Tophat analysis is that they are ambiguous in the genome...). It would also allow you to identify a subset of reads to try different mapping approaches on and maybe eliminate the possibility of some technical issue(s) with your tophat run...

      Comment


      • #4
        Another possibility could be small exons, and then junctions come into play. I am not sure is tophat/cufflinks take a de novo approach that overcomes this?
        --
        bioinfosm

        Comment


        • #5
          We have seen trimming down to 25 with bowtie and tophat can help with this.

          Comment


          • #6
            One easy way to check if it is a mappability issue it to go to UCSC for hg19,
            click the Mapability link under "Mapping and Sequence Tracks"
            select the 50bp option,
            look at your gene.


            Otherwise, maybe you just found some biology in your experiment.

            Comment


            • #7
              Tophat/cufflinks does try to predict novel exons and junctions. I'm not sure how well it handles small exons... Mapping small reads to the genome and then trying to find small exons is a challenging problem. A read that overlaps a small exon may require an alignment into three or more short blocks with potentially large gaps between (e.g. <exon>-intron-<exon>-intron-<exon>). Doing this with full length cDNAs (never mind short reads) can be a difficult. I suspect that most methods in use right now have fairly low sensitivity for short exons. One strategy to at least capture the short exons from known transcripts is to integrate alignment to transcripts and the genome (as advocated in Griffith et al.). This way, reads that hit short exons can be aligned to a transcript sequence without gaps which is much easier. This of course does not work for novel transcripts and relies on the accuracy and completion of transcript annotations for the species being analyzed.

              Comment


              • #8
                Exactly, and I think doing alignments to genome and transcripts raises an important question of which one to get priority. There would certainly be reads well aligned to both, then I suppose transcript alignment would get preference.

                Similar issue comes when using a separate reads junction database or contamination database.. given equal alignments of a read to multiple datasets, which one should get preference
                --
                bioinfosm

                Comment


                • #9
                  Regarding the comment that the observation may simply be due to biology. Good point! Once you convince yourself that the observation is not due to an artifact of the analysis (mapability, small exons, etc.) you should definitely consider this.

                  One of the benefits of RNA-seq over microarrays (IMHO) is the excellent signal-to-noise ratio. I have observed many cases where a gene appears to be turned off in one condition and on in another condition (tissue type, treatment, etc.). Even in very deep RNA-seq libraries, i have been amazed to see just how few reads are reported for the 'off' state. And since in this case we have RNA-seq libraries (analyzed by the same method) for alternate conditions that do get covered by reads we can reassure ourselves that the lack of coverage is not due to some artifact of the analysis.

                  For example, consider this data set (ALEXA-seq by Griffith et al) consisting of four cell types (normal luminal breast, normal myepithelial breast, hESCs, and vHMECs). These libraries have ~150 million paired-end 75-mers each. The quality of these libraries was very high. After performing differential expression analysis we can find many examples where a particular gene has 0 reads (or just a few) in one of these conditions, and thousands in one or more of the others. And we can find 'off' genes for any of the four cell types. For example, CCL2 has 0 reads in the vHMECs library and 124,664 in the myoepithelial breast library. Similarly, COL17A1 has 776,907 reads in the vHMECs library but only 142 in the luminal epithelial breast tissue. You can explore further examples in this list of DE genes.

                  Comment


                  • #10
                    Yes. A paired or comparison analysis really reduces a lot of sequencing and mapping biases, so differential expression comparisons are more or less accurate.

                    I really wish to give this trans-Abyss tool a try, seems like a one-stop solution to rna-seq analysis, but has a huge list of pre-reqs and config files to figure out
                    --
                    bioinfosm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM
                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 06:35 AM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 02:44 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-11-2024, 06:55 AM
                    0 responses
                    15 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-02-2024, 04:51 AM
                    0 responses
                    111 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X