Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • alexcele
    Junior Member
    • Jun 2025
    • 2

    About using or not FPKM threshold after DESeq2

    Hi everyone,
    I'm currently working on a mRNA-seq which was performed by a private company. After obtaining the data, I'm filtering and trying to understand the whole meaning of DEGs in my samples. However, as I see, plenty of the most differential expressed genes are due to small number of counts in both groups. So, considering this, is it correct or necessary to apply a threshold for FPKM or counts? and what is the recommended threshold? If needed, how I filter? I have 6 samples per group, what would be the minimum number of samples that should be over the threshold for considering the gene as DEG?

    I now these are a lot of questions but I'm new into this 😀
    Thanks a lot.
  • fchatonnet
    Member
    • Sep 2014
    • 30

    #2
    Dear alexcele,

    I'm sorry to read (once again) that bioinformatics analysis by a private company was not up to expectations, it happens a lot!
    Got it also from my own experience when I usually restart everything from the beginning...

    To answer your questions and assuming you're using R to analyze your data, yes you can apply some kind of filtering on raw counts (I strongly advise you to NOT use FPKM with DESeq2 which statistical assumptions and model are made to work with raw counts). You can make a DGElist object with your counts and samples, with group / condition indications and use it to filter your low expressed genes with the filterByExpr() function from de edgeR package. You can find all relevant information in the edgeR manual which is well written.
    I would also advise you to read the DESeq2 manual, I'm pretty sure that filtering is mentioned somewhere.
    Finally, you can also install and use the HTSFilter package that will directly output a DESeqDataSet object with filtered counts. The only drawback with this package is that the threshold are sometimes quite high, removing even genes with a "reasonable" level of expression. However, HTSFilter provides a graph showing how it computed this threshold, so you can decide to use it or not.
    Last of last, there is a crude method where you decide from your data which threshold to apply. For that I would draw a histogram of your normalized read counts, set a threshold (usually you get quite a lot of low expressed genes, then a gap and then a more or less (log)normal distribution of the read counts. Then you decide how many samples in each condition need to be over the threshold (in your case, I would choose 4 or 5 out of 6) and remove all the genes which do not comply with these rules. Here the rowSums() functions and selection tools (brackets or which() function) will be quite useful!

    Good luck, hope that helps!

    Comment

    • alexcele
      Junior Member
      • Jun 2025
      • 2

      #3
      Hi, thanks a lot for replying. Actually, I'm not re-doing from zero, in part because I do not have the proper computer to manage this amount of data in the way it should be. I've worked with webtools that run DESeq2 just to confirm that data coming from the company it is reliable. And yes, I'm only using raw counts for this. What it better sounds to me about filtering is the 'crude method' but I'm not sure how to apply. Do you know how could I draw the histogram? If there is a simple code or prompt to R I would use it, I know a little about R.

      Again, thanks a lot for your help !

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      16 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      34 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      37 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Working...