Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • harshinamdar
    harry
    • Jun 2010
    • 14

    transcriptome analysis using tophat and cufflinks,cuffcompare,

    I have just started exploring the Transcriptomics through NGS. I went through the paper Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease
    I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

    Normal temporal lobe sample SRR085471.fastq
    Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

    I followed these steps:

    1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
    2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

    3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

    4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

    5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

    6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

    7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

    8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

    To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
    my qyery is :

    1. Is the protocol i followed is correct?

    2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

    3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

    4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

    Apologies for the lengthy post..!
  • wuj
    Junior Member
    • Jul 2009
    • 8

    #2
    Originally posted by harshinamdar View Post
    I have just started exploring the Transcriptomics through NGS. I went through the paper Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease
    I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

    Normal temporal lobe sample SRR085471.fastq
    Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

    I followed these steps:

    1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
    2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

    3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

    4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

    5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

    6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

    7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

    8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

    To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
    my qyery is :

    1. Is the protocol i followed is correct?

    2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

    3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

    4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

    Apologies for the lengthy post..!
    I assume they are using old versions of these softwares....check if there are big changes in cufflinks/cuffdiff/tophat

    Comment

    • honey
      Senior Member
      • Feb 2010
      • 151

      #3
      Cufflink and Cuffdiff

      For sure they have used the older version but with new improvements in these tools it has just improved efficency, automation but will not flip flop results. I would email to the authors they are also in genomic cores and are in bioinformatics to get the sense what they feel.
      I must say this is one of those few selected papers whcih has published RNA-seq using Cufflink

      Comment

      • harshinamdar
        harry
        • Jun 2010
        • 14

        #4
        For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.

        example:

        Gene FPKM
        APOE 4085.26 With -N
        APOE 743.98 Without -N

        Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.

        Comments are appreciated!!.

        Comment

        • dariober
          Senior Member
          • May 2010
          • 311

          #5
          Hi harshinamdar,

          Originally posted by harshinamdar View Post
          For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.
          I've noticed the same. I think it is because normalization ignores the top 25% most expressed genes, therefore the raw expression of each transcript is divided by a much smaller total count compared to non-normalized transcripts.

          Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.
          In my datasets normalize and non-normalized FPKMs are almost perfectly correlated, so I'm also tempted not to use -N.

          All the best
          Dario

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 07-02-2026, 11:08 AM
          0 responses
          9 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          13 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...