I have just started exploring the Transcriptomics through NGS. I went through the paper “Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease”
I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.
Normal temporal lobe sample SRR085471.fastq
Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq
I followed these steps:
1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.
3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).
4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)
5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).
6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)
7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf
8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam
To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
my qyery is :
1. Is the protocol i followed is correct?
2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?
3. I am getting very high FPKM values. What is the range within which FPKM values should fall?
4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?
Apologies for the lengthy post..!
I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.
Normal temporal lobe sample SRR085471.fastq
Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq
I followed these steps:
1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.
3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).
4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)
5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).
6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)
7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf
8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam
To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
my qyery is :
1. Is the protocol i followed is correct?
2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?
3. I am getting very high FPKM values. What is the range within which FPKM values should fall?
4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?
Apologies for the lengthy post..!
Comment