Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nicolas
    replied
    I think first of all, it means that microRNA and smallRNA cannot be studied by in canonical RNA-Seq experiment, aimed at studying "longer" RNA (mRNA, lincRNA).
    For smallRNA, you do not need to assemble spliced transcripts, and therefore, you could use another type of analysis: something along the lines of:
    1) remove the adapter
    2) map with Bowtie/BWA
    3) count the reads mapping to miRBase hairpins + your smallRNA genes of interest

    Leave a comment:


  • honey
    replied
    Small RNA

    So that means Cufflinks cannot be used for micro RNA or small RNAs. Any one differ on this please.

    Leave a comment:


  • dietmar13
    replied
    steven, you are lucky,

    both data are not from us, i am only playing around...

    12 normal vs 12 colon cancer, paired:
    sra:
    SRP007584

    Leave a comment:


  • steven
    replied
    Originally posted by dietmar13 View Post
    Thanks! Is the corresponding RNA-seq data available too by any chance?

    Leave a comment:


  • dietmar13
    replied
    @arvid

    publicly available: GSE25070

    http://www.ncbi.nlm.nih.gov/projects...i?acc=GSE25070

    Leave a comment:


  • arvid
    replied
    @dietmar

    What method did you use to calculate expression on the microarrays and what kind of microarrays were they?

    Leave a comment:


  • dietmar13
    replied
    short genes

    hello cole and epi,

    i have made some comparisons of FPKM values (calculated from count data, not with cufflinks) with corresponding microarray data regarding gene length, and found some interesting details.

    it is true that FPKM values from genes with length below 500 bp correlate much less with expression values derived from microarrays, but the differences between tissues (i.e. normal vs. cancer) from small genes correlated even better (NGS vs microarray) than the differences from larger genes (interestingly, the correlations go continuos down. see figure).

    and in most study designs, the differences from two conditions are important, not the absolute expression values. therefore, i would not exclude small genes from statistical analysis!

    in green and blue are the correlations of lg2_FPKM values from each 12 normal and 12 cancer tissues with corresponding lg2_microarray_expression values from 26 normal and 26 cancer tissues. in red the correlations of the differences (lg2_FPKM_NORMAL - lg2_FPKM_CANCER vs. lg2_microarray_NORMAL - lg2_microarray_CANCER) are shown. on the x-axis genes are grouped according gene length (and the number of genes in each bin are shown), e.g. 190 genes are below 500 bp length.
    Attached Files

    Leave a comment:


  • epi
    replied
    Hi Cole, Thanks for your post. I keep reading your comments here which are useful for many including me. I asked a similar question, with a twist, here: http://seqanswers.com/forums/showthread.php?t=17992

    Can you comment please. In short, it is about how to deal with larger(>300 bp) transcripts with high FPKMs.




    Originally posted by Cole Trapnell View Post
    This issue has been discussed elsewhere on this board. As Nicholas points out, RNA-Seq really isn't reliable for very short transcripts. The reason is that all the fragments that map to these transcripts come from the "tail" of the distribution of library fragment lengths. That is, fragments that map to microRNAs are much, much shorter than most fragments in the library - by design in the RNA-Seq protocol, which size selects away very short inserts. Thus, Cufflinks infers that even though relatively few fragments actually mapped to the microRNAs, there were probably TONS of individual microRNA molecules in the transcriptome before all of the various size selection parts of the protocol kicked in. Cufflinks accordingly increases the FPKM of these short transcripts to compensate for the bias against short fragments in the library.

    This compensation was designed to improve accuracy for transcripts that are in the 500bp-1kb range - for longer transcripts, the "edge effects" due to library fragment size aren't much of an issue. However, I wouldn't trust FPKM values for transcripts shorter than your average fragment length. There's really just not enough data in most standard RNA-Seq libraries to say much about small RNA abundance.

    I should also point out that other methods use this same bias correction technique (RSEM for example). As far as I'm aware, the "count-based" methods don't, but that doesn't mean they shouldn't. Most of those methods are strictly for differential analysis, where any edge effects are assumed to be affecting each condition the same way. That may or may not be the case in your data.

    In any case, the quick answer to this problem is to simply remove or ignore transcripts shorter than around 300bp from your GTF. In a future version, we will be flagging these transcripts as too short for reliable quantification where appropriate.

    Leave a comment:


  • Cole Trapnell
    replied
    This issue has been discussed elsewhere on this board. As Nicholas points out, RNA-Seq really isn't reliable for very short transcripts. The reason is that all the fragments that map to these transcripts come from the "tail" of the distribution of library fragment lengths. That is, fragments that map to microRNAs are much, much shorter than most fragments in the library - by design in the RNA-Seq protocol, which size selects away very short inserts. Thus, Cufflinks infers that even though relatively few fragments actually mapped to the microRNAs, there were probably TONS of individual microRNA molecules in the transcriptome before all of the various size selection parts of the protocol kicked in. Cufflinks accordingly increases the FPKM of these short transcripts to compensate for the bias against short fragments in the library.

    This compensation was designed to improve accuracy for transcripts that are in the 500bp-1kb range - for longer transcripts, the "edge effects" due to library fragment size aren't much of an issue. However, I wouldn't trust FPKM values for transcripts shorter than your average fragment length. There's really just not enough data in most standard RNA-Seq libraries to say much about small RNA abundance.

    I should also point out that other methods use this same bias correction technique (RSEM for example). As far as I'm aware, the "count-based" methods don't, but that doesn't mean they shouldn't. Most of those methods are strictly for differential analysis, where any edge effects are assumed to be affecting each condition the same way. That may or may not be the case in your data.

    In any case, the quick answer to this problem is to simply remove or ignore transcripts shorter than around 300bp from your GTF. In a future version, we will be flagging these transcripts as too short for reliable quantification where appropriate.

    Leave a comment:


  • Xiaobin
    replied
    These genes don't seem to be that short. There must be other reasons.
    I suggest you try count method first. Cufflinks is just too complex to be understood.

    Leave a comment:


  • honey
    replied
    High RPKM

    Originally posted by Nicolas View Post
    That does not make sense to me. Unless it is an option in either Cufflinks or Cuffdiff, but I have never saw a log relationship between Cufflinks and Cuffdiff outputs.

    Honey, how did you run Cufflinks? RABT mode or simple "quantification" mode? How long are the genes with super-high RPKM?


    It seems to me that Cufflinks has a tendency to report super-high RPKM for very short transcripts (such as microRNA). I now routinely filter out the transcripts shorter than the expected fragment size (from the GTF annotation file). I think there is a good rationale to filter them out, because they can not be accurately captured by the RNA-Seq protocol....

    In RABT mode, Cufflinks also reports a large number of short transcripts with crazy high values. A solution could be to re-quantify the discovered transcripts with something like BEDtools or HTSeq-count...
    I used simple quantification

    So you mean probably count method is better?

    Leave a comment:


  • honey
    replied
    very high RPKM

    Here are three examples with genomic coordinates

    CGA - 87795222 to 87804824
    KISS1 – 204159469- 204165619
    TFP12- 93515745- 93520065

    Should I then go back to count method?

    Thanks for all your help.

    Leave a comment:


  • Xiaobin
    replied
    Are those genes very short?
    As cufflinks will remove the fragment length from gene length in calculating FPKM, sometimes it will give this kind of results.

    Leave a comment:


  • Nicolas
    replied
    Originally posted by peromhc View Post
    I think that these values should be taken to the log(10).. this is not documented, but my suspicion.

    log(10) values from cufflinks roughly equals FPKM values from cuffdiff..
    That does not make sense to me. Unless it is an option in either Cufflinks or Cuffdiff, but I have never saw a log relationship between Cufflinks and Cuffdiff outputs.

    Honey, how did you run Cufflinks? RABT mode or simple "quantification" mode? How long are the genes with super-high RPKM?

    It seems to me that Cufflinks has a tendency to report super-high RPKM for very short transcripts (such as microRNA). I now routinely filter out the transcripts shorter than the expected fragment size (from the GTF annotation file). I think there is a good rationale to filter them out, because they can not be accurately captured by the RNA-Seq protocol....

    In RABT mode, Cufflinks also reports a large number of short transcripts with crazy high values. A solution could be to re-quantify the discovered transcripts with something like BEDtools or HTSeq-count...

    Leave a comment:


  • honey
    replied
    very large RPKM

    It is human genome so it is not small.
    The egnes which have very high RPKM values are relavnt to biology of the tissue samples, but my problem is how to provide a scientific rational that our results are not nonspecific.
    Thanks for the input

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:35 AM
0 responses
6 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 02:44 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-11-2024, 06:55 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
110 views
0 likes
Last Post seqadmin  
Working...
X