Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hypno
    Member
    • Feb 2013
    • 12

    Procedure for count RPKM

    I need to count the expression of a particular gene from rna-seq data. I've mapped the reads using tophat and then I use cuffdiff for analyse the differential gene expression.
    I want to see if also I use a raw count I obtain the same results. I mean with cufflinks I see the sample A have low quantity of a particula gene, but using the raw estimation. I see the oppostive results using my script.
    I merged the hg19 to obatain a one dimension for each gene and the I use intersectbed.

    Code:
    intersectBed -bed -abam example.bam -b ../RefSeq/hg19_merged.bed -wa -wb | cut -f 16 | sort | uniq -c > example.counts.txt
    Then I use a script in perl with I make this procedure:
    (Count reads for a each gene/kb of that gene) /number of fragment.
    example( reads count on TP53/kb dimension TP53/number of fragment mapped)

    What it is wrong?
    Thanks for any help!

    thanks for any suggestion?
  • mknut
    Member
    • Jul 2012
    • 23

    #2
    First of all, Cufflinks uses FPKM(Fragments Per Kilobase of exon per Million mapped fragments) instead of RPKM(Reads Per Kilobase of exon per Million mapped reads) to avoid confusion when dealing with paired-end data.

    Secondly, Cufflinks uses corrections when calculating FPKM, so if you do a simple calculation it will not match that of Cufflink's. Anyway, the crude calculation for a gene would be (NOT the one that Cufflinks uses):

    FPKM = [f / (e / 1000)] / (m / 1,000,000)

    f - number of fragments mapping to gene
    e - exonic length of gene
    m - total number of mapped fragments

    Your crude calculation is wrong in 2 places:
    - you are using length of a gene instead of combined length of exons that a gene is comprised of
    - you are not dividing total number of mapped fragments by 1,000,000

    If you would like to know more about the corrections that Cufflinks applies to FPKM, see this paper:
    Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
    Nature Biotechnology doi:10.1038/nbt.1621

    Supplementary Text and Figures, 3. Transcript abundance estimation

    Also, have a look at Cufflink's FAQ

    Comment

    • hypno
      Member
      • Feb 2013
      • 12

      #3
      Thanks for the answer, nut i continue to have a d'ombra on the fact RPKM suggest different result on fpkm. If I express a fusion protein I suppose RNA-seq show every metodo I use the same magnitudine of expression. Nut in my case fpkm show the over expression but no the rpkm.any idea?

      Comment

      • giorgifm
        Member
        • Aug 2011
        • 35

        #4
        hypno can you please reformulate the question and turn the italian auto-correction off?

        Comment

        • hypno
          Member
          • Feb 2013
          • 12

          #5
          I'm sorry!
          If I overexress a particular protein (ex. pax8),I must see in my RNA-seq data the presence of the gene of pax8( however I need to see much more than I see in the control cell line). In FPKM normalization I can see the overexpression but not in RPKM. Is it possible that? Any idea? Thanks in advance for any kind help!

          Comment

          • giorgifm
            Member
            • Aug 2011
            • 35

            #6
            Is the difference astonishingly big? Can you visualize your BAM file with a tool like tablet and check which one is right? How is the counting of that gene? Are you using paired or single ended reads?

            Comment

            • hypno
              Member
              • Feb 2013
              • 12

              #7
              thanks for your fast reply!! I use paried read. The difference it is to big. (control show 18 times more protein than to expresssion ).

              Comment

              • giorgifm
                Member
                • Aug 2011
                • 35

                #8
                Cufflinks/Cuffdiff generates also a .count tracking file during the analysis. Can you indicate here the values you find for your gene of interest in the count file and fpkm file?

                Comment

                • hypno
                  Member
                  • Feb 2013
                  • 12

                  #9
                  Sorry If answer late...

                  This is the count file for my gene :


                  control siGene
                  0 4.15044
                  0.972804 2.31611
                  0.472964 5.44177
                  36.7692 134.833
                  2.05042 57.0802
                  110.835 319.281


                  thisis the fpkm
                  0 0.100004
                  0.126068 0.0594524
                  0.0630838 0.1437
                  4.03458 2.94116
                  0.303802 1.6647
                  16.2359 9.21288

                  Any idea?
                  Attached Files
                  Last edited by hypno; 04-29-2013, 02:03 AM.

                  Comment

                  • pengchy
                    Senior Member
                    • Feb 2009
                    • 116

                    #10
                    This issue has been discussed here and here, still not be solved.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    13 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    48 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    107 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    125 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...