Seqanswers Leaderboard Ad

**Xi Wang** · 01-14-2010, 08:53 PM

Originally posted by Gamaliel View Post

Hi everybody im new y this, im from mexico and i have a data from total RNAm in one condition of the bacteria R. etli the first what i see is tha some intergenic region got a read, that the orfs have diferents numbers of reads.

the cuestion is how can i normalize the data? of the RNA-seq
if i have to use RPKM?
is exist any tool to normalize the results from teh illumina?

remember that i just have the results of one condition and i think that i have to normalize this data to next compare whit other condition.

thank you everybody for your time and im sorry for the spell

I can guess what you meant to type. RPKM is recommended if you want to do the comparison between samples. A tool named "cufflink" can be used to carry out your job.

**Gamaliel** · 01-18-2010, 11:40 AM

Originally posted by Xi Wang View Post

I can guess what you meant to type. RPKM is recommended if you want to do the comparison between samples. A tool named "cufflink" can be used to carry out your job.

Hi Xi wang, the problem is that i have to normalize one condition firts because i have estaocastic reads and i need to know in just one condition what genes are trasncribed (determinat transcription) and what are trasncribed product of the estocastic level (inespesific trascription). after compare to other condition.

thanx for your time and help

**Xi Wang** · 01-18-2010, 10:37 PM

Originally posted by Gamaliel View Post

Hi Xi wang, the problem is that i have to normalize one condition firts because i have estaocastic reads and i need to know in just one condition what genes are trasncribed (determinat transcription) and what are trasncribed product of the estocastic level (inespesific trascription). after compare to other condition.

thanx for your time and help

Can I understand your question in this way: as there are some background noise of the RNA-seq data, some regions may have reads, but the DNA don't transcribe. I think there are lots of researcher are doing this project or related, and I did not find any publication yet. A naive way could be filtering out the regions with RPKM value less than a given cutoff (say, 1 RPKM). And then you can compare the remaining between conditions.

**Gamaliel** · 01-19-2010, 10:51 AM

Originally posted by Xi Wang View Post

Can I understand your question in this way: as there are some background noise of the RNA-seq data, some regions may have reads, but the DNA don't transcribe. I think there are lots of researcher are doing this project or related, and I did not find any publication yet. A naive way could be filtering out the regions with RPKM value less than a given cutoff (say, 1 RPKM). And then you can compare the remaining between conditions.

hi xi wang thanx again, yes i mean it the background of my RNAseq data, my other cuestion is: in the genome there are genes of diferents legths for example:

if i get 2 genes one gene size 1.2kb gen "a" and the other size 800pb gen "b"

gen "a" __________________________ 1.2kb
_________
_______ ________

imagine that i have for this gen "a" 50 reads

gen "b" ________________ 800pb
_____ ______ ____
_____ _____ ____

and for the gen "b" i have 48 reads

my question is if the size of the orf import, because the gen "a" got more reads than gen "b" just for the size and not for is transcribed more than the gen "b"

well thanx for all your suport and help, have a good day

**Xi Wang** · 01-20-2010, 06:13 AM

If the reads along the transcripts follow the uniform distribution assumption, you can the RPKM concept to calculate the proportion of trancribed copies for different genes. RPKM means reads per kilo-base of transcript per million reads. Taking your example, suppose the total reads of the experiment is 10 million, the RPKM for gene "a" is 50/(1.2k)/(10M)=4.17, which RPKN for gene "b" 48/(0.8k)/(10M)=6. So, gene "b" has a higher expression level than gene "a".

**Gamaliel** · 01-21-2010, 11:01 AM

Originally posted by Xi Wang View Post

If the reads along the transcripts follow the uniform distribution assumption, you can the RPKM concept to calculate the proportion of trancribed copies for different genes. RPKM means reads per kilo-base of transcript per million reads. Taking your example, suppose the total reads of the experiment is 10 million, the RPKM for gene "a" is 50/(1.2k)/(10M)=4.17, which RPKN for gene "b" 48/(0.8k)/(10M)=6. So, gene "b" has a higher expression level than gene "a".

Hi Xi Wang i understand you. but the reads along the transcripts are not uniform it mean the covert is difrent a long the ORF (transcrpts). thanx for your help this is my email [email protected] if you need some help too.

**Xi Wang** · 01-21-2010, 06:27 PM

Thanks Gamaliel.

So your difficulty is to estimate the gene expression levels from ununiformly distributed reads, right? First, RNA-seq experiments following the random priming protocol are supposed to generate uniformly distributed reads from transcripts' 5'ends to 3'ends. However, I still saw some ununiformity on our data, but RPKM still worked. I think if the reads in your data is not extremely ununiformly distributed, RPKM still works. Second, if you still concern the ununiformity, you can give up the concept of ORFs, but use sliding windows to scan the genome, to see the read enrichment. Maybe the sliding window size is a key parameter.

**Gamaliel** · 01-29-2010, 08:43 AM

Originally posted by Xi Wang View Post

Thanks Gamaliel.

So your difficulty is to estimate the gene expression levels from ununiformly distributed reads, right? First, RNA-seq experiments following the random priming protocol are supposed to generate uniformly distributed reads from transcripts' 5'ends to 3'ends. However, I still saw some ununiformity on our data, but RPKM still worked. I think if the reads in your data is not extremely ununiformly distributed, RPKM still works. Second, if you still concern the ununiformity, you can give up the concept of ORFs, but use sliding windows to scan the genome, to see the read enrichment. Maybe the sliding window size is a key parameter.

Hi Xi Wang, sorry to answer leate back. im already at home.
thanx for your helpe. i intersting to use the RPKM but i can`t run my data, i think because my reference genome need to be in other extension is that right?.

thanx for your help.
have a good day

**Gamaliel** · 01-29-2010, 10:15 AM

i think is a UCSC refFlat format. how can i created that archive

**Xi Wang** · 01-29-2010, 07:32 PM

Originally posted by Gamaliel View Post

Hi Xi Wang, sorry to answer leate back. im already at home.
thanx for your helpe. i intersting to use the RPKM but i can`t run my data, i think because my reference genome need to be in other extension is that right?.

thanx for your help.
have a good day

I am also at home now. :-)

To answer your question, I need to know which tool you used to calculate the RPKM values.

Thanks!

**Xi Wang** · 01-29-2010, 07:42 PM

Originally posted by Gamaliel View Post

i think is a UCSC refFlat format. how can i created that archive

You can download the archive from UCSC genome browser. Eg, for hg18 refSeq genes, follow the link below:

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refFlat.txt.gz

**bansal_raman** · 06-13-2011, 05:25 AM

Hi Xi Wang
I found your replies in this thread very informative. I have a further inquiry. For differential gene expression analysis (between two samples), do we need to transform and normalize the RPKM values (obtained for genes of individual samples)? I guess RPKM values are already obtained by normalization

**Xi Wang** · 06-13-2011, 06:13 AM

Originally posted by bansal_raman View Post

Hi Xi Wang
I found your replies in this thread very informative. I have a further inquiry. For differential gene expression analysis (between two samples), do we need to transform and normalize the RPKM values (obtained for genes of individual samples)? I guess RPKM values are already obtained by normalization

Hi,

It has been realized that the further normalization is needed if the total numbers of expressed RNA molecules are different in two samples. See the reference:
Robinson MD, Oshlack A.A scaling normalization method for differential expression analysis of RNA-seq data.Genome Biol. 2010;11(3):R25. Epub 2010 Mar.

Application Unavailable | Springer Nature

http://genomebiology.com/2010/11/3/R25

**bansal_raman** · 06-13-2011, 09:17 AM

Thanks Xi,
Do you think that the quality controls like box chart can help to determine if further normalization is required or not?

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 26 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

what tool can i use for illumina-solexa data RNA-seq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News