Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Procedure for count RPKM

    I need to count the expression of a particular gene from rna-seq data. I've mapped the reads using tophat and then I use cuffdiff for analyse the differential gene expression.
    I want to see if also I use a raw count I obtain the same results. I mean with cufflinks I see the sample A have low quantity of a particula gene, but using the raw estimation. I see the oppostive results using my script.
    I merged the hg19 to obatain a one dimension for each gene and the I use intersectbed.

    Code:
    intersectBed -bed -abam example.bam -b ../RefSeq/hg19_merged.bed -wa -wb | cut -f 16 | sort | uniq -c > example.counts.txt
    Then I use a script in perl with I make this procedure:
    (Count reads for a each gene/kb of that gene) /number of fragment.
    example( reads count on TP53/kb dimension TP53/number of fragment mapped)

    What it is wrong?
    Thanks for any help!

    thanks for any suggestion?

  • #2
    First of all, Cufflinks uses FPKM(Fragments Per Kilobase of exon per Million mapped fragments) instead of RPKM(Reads Per Kilobase of exon per Million mapped reads) to avoid confusion when dealing with paired-end data.

    Secondly, Cufflinks uses corrections when calculating FPKM, so if you do a simple calculation it will not match that of Cufflink's. Anyway, the crude calculation for a gene would be (NOT the one that Cufflinks uses):

    FPKM = [f / (e / 1000)] / (m / 1,000,000)

    f - number of fragments mapping to gene
    e - exonic length of gene
    m - total number of mapped fragments

    Your crude calculation is wrong in 2 places:
    - you are using length of a gene instead of combined length of exons that a gene is comprised of
    - you are not dividing total number of mapped fragments by 1,000,000

    If you would like to know more about the corrections that Cufflinks applies to FPKM, see this paper:
    Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
    Nature Biotechnology doi:10.1038/nbt.1621

    Supplementary Text and Figures, 3. Transcript abundance estimation

    Also, have a look at Cufflink's FAQ

    Comment


    • #3
      Thanks for the answer, nut i continue to have a d'ombra on the fact RPKM suggest different result on fpkm. If I express a fusion protein I suppose RNA-seq show every metodo I use the same magnitudine of expression. Nut in my case fpkm show the over expression but no the rpkm.any idea?

      Comment


      • #4
        hypno can you please reformulate the question and turn the italian auto-correction off?

        Comment


        • #5
          I'm sorry!
          If I overexress a particular protein (ex. pax8),I must see in my RNA-seq data the presence of the gene of pax8( however I need to see much more than I see in the control cell line). In FPKM normalization I can see the overexpression but not in RPKM. Is it possible that? Any idea? Thanks in advance for any kind help!

          Comment


          • #6
            Is the difference astonishingly big? Can you visualize your BAM file with a tool like tablet and check which one is right? How is the counting of that gene? Are you using paired or single ended reads?

            Comment


            • #7
              thanks for your fast reply!! I use paried read. The difference it is to big. (control show 18 times more protein than to expresssion ).

              Comment


              • #8
                Cufflinks/Cuffdiff generates also a .count tracking file during the analysis. Can you indicate here the values you find for your gene of interest in the count file and fpkm file?

                Comment


                • #9
                  Sorry If answer late...

                  This is the count file for my gene :


                  control siGene
                  0 4.15044
                  0.972804 2.31611
                  0.472964 5.44177
                  36.7692 134.833
                  2.05042 57.0802
                  110.835 319.281


                  thisis the fpkm
                  0 0.100004
                  0.126068 0.0594524
                  0.0630838 0.1437
                  4.03458 2.94116
                  0.303802 1.6647
                  16.2359 9.21288

                  Any idea?
                  Attached Files
                  Last edited by hypno; 04-29-2013, 02:03 AM.

                  Comment


                  • #10
                    This issue has been discussed here and here, still not be solved.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Best Practices for Single-Cell Sequencing Analysis
                      by seqadmin



                      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                      06-06-2024, 07:15 AM
                    • seqadmin
                      Latest Developments in Precision Medicine
                      by seqadmin



                      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                      Somatic Genomics
                      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                      05-24-2024, 01:16 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 06-07-2024, 06:58 AM
                    0 responses
                    177 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-06-2024, 08:18 AM
                    0 responses
                    215 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-06-2024, 08:04 AM
                    0 responses
                    180 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-03-2024, 06:55 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X