Unconfigured Ad

**wuj** · 04-21-2011, 07:58 AM

Originally posted by harshinamdar View Post

I have just started exploring the Transcriptomics through NGS. I went through the paper “Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease”
I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

Normal temporal lobe sample SRR085471.fastq
Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

I followed these steps:

1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
my qyery is :

1. Is the protocol i followed is correct?

2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

Apologies for the lengthy post..!

I assume they are using old versions of these softwares....check if there are big changes in cufflinks/cuffdiff/tophat

**honey** · 04-22-2011, 06:45 AM

Cufflink and Cuffdiff

For sure they have used the older version but with new improvements in these tools it has just improved efficency, automation but will not flip flop results. I would email to the authors they are also in genomic cores and are in bioinformatics to get the sense what they feel.
I must say this is one of those few selected papers whcih has published RNA-seq using Cufflink

**harshinamdar** · 04-26-2011, 09:45 PM

For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.

example:

Gene FPKM
APOE 4085.26 With -N
APOE 743.98 Without -N

Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.

Comments are appreciated!!.

**dariober** · 04-27-2011, 04:52 AM

Hi harshinamdar,

Originally posted by harshinamdar View Post

For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.

I've noticed the same. I think it is because normalization ignores the top 25% most expressed genes, therefore the raw expression of each transcript is divided by a much smaller total count compared to non-normalized transcripts.

Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.

In my datasets normalize and non-normalized FPKMs are almost perfectly correlated, so I'm also tempted not to use -N.

All the best
Dario

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Today, 06:09 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 Today, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 47 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

transcriptome analysis using tophat and cufflinks,cuffcompare,

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News