Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fabio25
    replied
    Dear everybody,
    I would be very interested in asking how you fix the miRNA targets.
    just use mirBASE or have you a much nicer way?
    Thanks a lot,
    Fabio

    Leave a comment:


  • vruotti
    replied
    normalization

    Hi Chema,
    Yeah I see what you mean. I guess most of the conditions we ran so far do not have the a huge amount of genes downregulated in comparison with the rest of the genes studied. We are doing whole genome RNA-seq studies and I think the house keeping genes do help avoid this problem.

    In case having a huge number of genes that are indeed different, then maybe we have to come up with a slight modification of the RPKM formula/method to cope with this elevated expression or lack of. Maybe include a coverage based on the estimated level of expression variable.

    Does anybody else used the RPKM method? Can you share with us the different normalization methods you might use for doing RNA-seq outside of RPKM?

    Victor

    Leave a comment:


  • Chema
    replied
    Hi Victor,

    What I wanted to say is that if you have a group of genes that are less expressed in condition 2 than 1, then, you will have less reads that represent these genes in condition 2 than 1, and therefore the genes that are not in this group will be represented for more reads in condition 2 than in condition 1 (not necessarily, because they are differentially expressed). But I guess this problem will be only important when you have a high number of genes differentially expressed.

    Yes, you are right the numbers are as you said. Only:

    Translate to RPKM , since they have the same length, it should be something like:
    Condition 1 Condition 2
    Gene A 333,000,000 500,000,000
    Gene B 333,000,000 500,000,000
    Gene C 333,000,000 0

    About if the difference in gene A and B are significant. Well, I didn’t generate any noise for these data, this is a theorical example, they are not estimations if not the real values of the variable. So, if they are different is because the method will declare it as differentially expressed.

    Any way, if you want to see a bigger difference, just run an example with more genes that are downregulated in condition 2. You will see that the diference for gene A and B increase.

    Leave a comment:


  • vruotti
    replied
    RPKM concern?

    Hi Chema,

    We have used Wold's (RPMK) method and got very good results. Are you saying that if a particular gene has less coverage (coverage=fewer number of mapped reads) this gene will contribute to a change regarding the accuracy of detecting the expression of the other genes? I think the this is the whole point of normalizing. The difference is that the overall expression should not be too different. Also, can you please check your numbers for us? Are they really 333,333 or 333,000,000 for genes a, b and c in condition 1 after converting to RPKM?

    Maybe I'm doing this wrong. The formula I see in their paper is:

    RPKM = 10^9 x C / NL, which is really just simply C/N

    C= the number of mappable reads that felt onto the gene's exons
    N= total number of mappable reads in the experiment
    L= the sum of the exons in base pairs.

    So, let's plug in your numbers.
    Condition 1 Condition 2
    Gene A 3*10^5 4.5*10^5
    Gene B 3*10^5 4.5*10^5
    Gene C 3*10^5 0
    Total 9*10^5 9*10^5

    Translate to RPKM , since they have the same length, it should be something like:
    Condition 1 Condition 2
    Gene A 333,000,000 500,000,000
    Gene B 333,000,000 500,000,000
    Gene C 333,000,000 500,000,000

    If you look at these numbers you could argue whether the two expression values are differentially expressed. They are not that far apart. Sorry, did I miss your point? Can you explain your concern again?

    Victor
    Last edited by vruotti; 10-08-2008, 06:38 AM.

    Leave a comment:


  • Chema
    replied
    Hi,

    The point with RPKM that I do not like, it is that I do not feel that it can handle different coverages. Perhaps I can explain it better through an example.

    Let say that we are working with a genome with three genes A, B and C with the same length (I know not very realist, but just an example), and we want to study their expression in two conditions 1 and 2.

    The real expression of the genes is:
    Condition 1 Condition 2
    Gene A 1 1
    Gene B 1 1
    Gene C 1 0

    We run a RNA-seq experiment and we get the next number of reads
    Condition 1 Condition 2
    Gene A 3*10^5 4.5*10^5
    Gene B 3*10^5 4.5*10^5
    Gene C 3*10^5 0
    Total 9*10^5 9*10^5

    Translate to RPKM , since they have the same length, it should be something like:
    Condition 1 Condition 2
    Gene A 333333 5*10^5
    Gene B 333333 5*10^5
    Gene C 333333 0

    As you can see, it seems that Gene A and B are also differentially express. This is because, since the expression of gene C is lower in condition 2 than 1, we have more reads that will improve of the coverage of the other genes.

    Anyway, I think that always it is nice to normalize the data in some way. Mainly, when you are working with so low number of replicates.
    Last edited by Chema; 09-23-2008, 04:30 AM. Reason: change format

    Leave a comment:


  • ECO
    replied
    We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report refere …


    This paper introduces a concept RPKM for quantifying tags in RNA-seq. Might be worth a look if you haven't seen it already.

    Leave a comment:


  • Chema
    replied
    What we are doing is to apply kind of quantile normalization. After to obtain a score for each miRNA , since we are expecting that most of the miRNA have similar expression in both tissues at study, we normalize the score values to have similar distribution.

    The main reason to apply this kind of normalization is that we were very worry about the different coverage of the different samples at study.

    Leave a comment:


  • zee
    started a topic RNA-seq and normalization numbers

    RNA-seq and normalization numbers

    In many of the experiments our lab is doing with Illumina reads, we always seem to end up with the task of normalizing data.
    If I have 3 experimental conditions, I've sequenced a lane for each and there needs to be a way to compare counts of my known RNA in mirbase to my sequence reads mapped to the genome (with MAQ / novoalign).

    I've read about people doing counts as reads per million and log transforming these values to fit Poisson distribution, but it's sprung multiple ideas in my mind. Would this be as simple as dividing my counts for each experiment by

    1) 1 Million
    2) the total number of reads sequenced
    3) the total number of uniquely mapped reads


    I'm inclined to option (3) because that represents the amount of usable sequence data.

    I'm just wondering if anybody has a more intelligent way of tackling this problem with nextgen data or perhaps there's some software to help out.

    I have :

    a) Alignment locations of all reads on a ref. genome for each experiment
    b) Location of my reference RNAs on the same genome

    I am already able to count the number of overlapping locations with each reference RNA in each experiment, and that gives me raw counts.

    I have about 4 experiments, but this varies from study to study.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X