Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sum of FPKMs?

    I'm analyzing the expression levels of certain genes in different tissues with data from a database and I need to count two different genes as one because I know by experimental data that they were erroneously annotated.

    The expression levels in the database are in FPKM and I know I can't simple make the sum of the two genes to count it as one.

    If I had raw counts what I would do is

    gene A = 400, gene B = 300, counting them as a single gene = 700.

    what would be the best thing to do this with FPKMs?

    gene A = 12 FPKM
    gene B = 20 FPKM
    as single gene = x ?
    Last edited by dlepe; 07-23-2014, 11:51 AM.

  • #2
    FPKM is fragments per kilobase of transcript per million mapped reads.

    So then
    x = total number of fragments / ((total number of bases of transcipt / 1000) * (mapped fragments / 1000000))
    = (fragments mapped to gene A + fragments mapped to gene B) / ((bases of gene A + bases of gene B) / 1000 * (mapped fragments / 1000000))

    This would be if gene A and gene B did not overlap (and by that I mean that no read is mapped to both gene A and gene B). If they do, you'll have to use something like the inclusion-exclusion principle. I don't think you can simply add the two FPKM values, like you mentioned.

    Comment


    • #3
      The thing is I donĀ“t have the total number of mapped fragments from the libraries, I would have to try to see if the raw data is available somewhere and do the mapping myself..

      Since I'm trying to get an estimation of the correlated expression between the gene in question to another gene a friend suggested to simply use the average of gene A and gene B as the expression value I'm trying to find.

      His reasoning is that since FPKMs are normalized by length, and assuming that the number of raw counts in gene A and B similar, the FPKM for only gene A or B should be very similar to the number of FPKMs we'd get if we calculate the FPKMs for they both as a single gene.

      Comment


      • #4
        I suppose you could do an average. I think a weighted average would be better suited for this. You could weight each FPKM value by the length of the corresponding gene.

        Comment


        • #5
          Yeah I guess, I'll see how that goes, thanks.

          Comment


          • #6
            I just did the math, and the weighted average is what you want, provided the genes don't overlap like I previously stated. So if gene A has FPKM a, and gene B has FPKM b, you want:

            a * |A| + b * |B|
            |A| + |B|

            where |x| is the length of gene x.

            Edit: If you want, I can type up my reasoning in latex. I just don't know of a nice way to display fractions on seqanswers.

            Comment


            • #7
              awesome, I'll look into it, thanks again.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM
              • seqadmin
                Addressing Off-Target Effects in CRISPR Technologies
                by seqadmin






                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                08-27-2024, 04:44 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:25 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 01:02 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-18-2024, 06:39 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-11-2024, 02:44 PM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Working...
              X