Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Procedure for count RPKM

    I need to count the expression of a particular gene from rna-seq data. I've mapped the reads using tophat and then I use cuffdiff for analyse the differential gene expression.
    I want to see if also I use a raw count I obtain the same results. I mean with cufflinks I see the sample A have low quantity of a particula gene, but using the raw estimation. I see the oppostive results using my script.
    I merged the hg19 to obatain a one dimension for each gene and the I use intersectbed.

    Code:
    intersectBed -bed -abam example.bam -b ../RefSeq/hg19_merged.bed -wa -wb | cut -f 16 | sort | uniq -c > example.counts.txt
    Then I use a script in perl with I make this procedure:
    (Count reads for a each gene/kb of that gene) /number of fragment.
    example( reads count on TP53/kb dimension TP53/number of fragment mapped)

    What it is wrong?
    Thanks for any help!

    thanks for any suggestion?

  • #2
    First of all, Cufflinks uses FPKM(Fragments Per Kilobase of exon per Million mapped fragments) instead of RPKM(Reads Per Kilobase of exon per Million mapped reads) to avoid confusion when dealing with paired-end data.

    Secondly, Cufflinks uses corrections when calculating FPKM, so if you do a simple calculation it will not match that of Cufflink's. Anyway, the crude calculation for a gene would be (NOT the one that Cufflinks uses):

    FPKM = [f / (e / 1000)] / (m / 1,000,000)

    f - number of fragments mapping to gene
    e - exonic length of gene
    m - total number of mapped fragments

    Your crude calculation is wrong in 2 places:
    - you are using length of a gene instead of combined length of exons that a gene is comprised of
    - you are not dividing total number of mapped fragments by 1,000,000

    If you would like to know more about the corrections that Cufflinks applies to FPKM, see this paper:
    Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
    Nature Biotechnology doi:10.1038/nbt.1621

    Supplementary Text and Figures, 3. Transcript abundance estimation

    Also, have a look at Cufflink's FAQ

    Comment


    • #3
      Thanks for the answer, nut i continue to have a d'ombra on the fact RPKM suggest different result on fpkm. If I express a fusion protein I suppose RNA-seq show every metodo I use the same magnitudine of expression. Nut in my case fpkm show the over expression but no the rpkm.any idea?

      Comment


      • #4
        hypno can you please reformulate the question and turn the italian auto-correction off?

        Comment


        • #5
          I'm sorry!
          If I overexress a particular protein (ex. pax8),I must see in my RNA-seq data the presence of the gene of pax8( however I need to see much more than I see in the control cell line). In FPKM normalization I can see the overexpression but not in RPKM. Is it possible that? Any idea? Thanks in advance for any kind help!

          Comment


          • #6
            Is the difference astonishingly big? Can you visualize your BAM file with a tool like tablet and check which one is right? How is the counting of that gene? Are you using paired or single ended reads?

            Comment


            • #7
              thanks for your fast reply!! I use paried read. The difference it is to big. (control show 18 times more protein than to expresssion ).

              Comment


              • #8
                Cufflinks/Cuffdiff generates also a .count tracking file during the analysis. Can you indicate here the values you find for your gene of interest in the count file and fpkm file?

                Comment


                • #9
                  Sorry If answer late...

                  This is the count file for my gene :


                  control siGene
                  0 4.15044
                  0.972804 2.31611
                  0.472964 5.44177
                  36.7692 134.833
                  2.05042 57.0802
                  110.835 319.281


                  thisis the fpkm
                  0 0.100004
                  0.126068 0.0594524
                  0.0630838 0.1437
                  4.03458 2.94116
                  0.303802 1.6647
                  16.2359 9.21288

                  Any idea?
                  Attached Files
                  Last edited by hypno; 04-29-2013, 02:03 AM.

                  Comment


                  • #10
                    This issue has been discussed here and here, still not be solved.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM
                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 05:31 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-24-2024, 06:58 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-23-2024, 08:43 AM
                    0 responses
                    48 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-17-2024, 07:29 AM
                    0 responses
                    58 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X