Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Classify genes as expressed or not expressed

    Hello all,

    this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.

    Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.

    What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.

    Thank you for any inputs.

  • #2
    I was wondering this as well... I have multi-species/ multi-individual RNA-seq data so its easy to bin genes into on and off if they are expressed highly in all individuals of 1 species and very little in another... The difficulty is that many genes might be expressed stochastically and at low levels although still functional. If both species have low FPKM then I have trouble distinguishing mapping errors and actual low expression.

    I was thinking of just choosing a cut-off based on the distribution of FPKMs that makes sense to me

    Comment


    • #3
      How can you ever say something is 'not expressed'? Absence of evidence is not evidence of absence, and this is especially true with RNA-Seq data, because it's a sampling technique. Unless you're doing ridiculously deep sequencing on your samples any cut off you put in place is pretty arbitrary.

      Comment


      • #4
        Yes, I see your point. Furthermore, if I remember the ENCODE papers correctly, they came to the conclusion that good fraction of mRNAs are present in some cells of the same type, and absent in others. Thus we would only see certain average, which could be quite low..

        On the other hand, especially for microarrays, there is quite big (numerical) difference between something obviously expressed, and something that's not. From the practical standpoint (I need it for gene expression clustering) it would have been useful to restrict the gene array to only significantly expressed ones. How would one go about it? I do have some ideas, but I'm curious what do others think.

        Comment


        • #5
          "Absence of evidence is not evidence of absence , and this is especially true with RNA-Seq data, because it's a sampling technique"

          I agree wholeheartedly, but there is still a distribution for that sample you take, and I like to think that taking a cut-off from that distribution is 'less arbitrary' than the "FPKM<1" route.

          I put up a script on github after being asked about a method I use in a seminar. If you want to have a look I would appreciate ideas about this. A workable solution, I thought, though I am frequently wrong!

          Comment


          • #6
            Hi, I hope to revive this thread, because I need some advice:
            I got a debate with colleagues about the very same question: which gene is expressed or not. When I mentioned that I used an FPKM cutoff to select genes I further analyzed, they demanded statistics to prove if my selected genes are significant.
            Could someone propose a statistical approach to show whether my genes are expressed and if that is "significant"?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X