Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • harshinamdar
    harry
    • Jun 2010
    • 14

    transcriptome analysis using tophat and cufflinks,cuffcompare,

    I have just started exploring the Transcriptomics through NGS. I went through the paper Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease
    I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

    Normal temporal lobe sample SRR085471.fastq
    Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

    I followed these steps:

    1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
    2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

    3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

    4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

    5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

    6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

    7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

    8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

    To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
    my qyery is :

    1. Is the protocol i followed is correct?

    2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

    3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

    4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

    Apologies for the lengthy post..!
  • wuj
    Junior Member
    • Jul 2009
    • 8

    #2
    Originally posted by harshinamdar View Post
    I have just started exploring the Transcriptomics through NGS. I went through the paper Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's disease
    I tried to replicate the results for temporal lobe obtained in paper using tophat v1.2.0 and cufflinks v0.9.3.

    Normal temporal lobe sample SRR085471.fastq
    Alzhmier Disease (AD) temporal lobe sample SRR085473.fastq

    I followed these steps:

    1. Filtered reads using fastx toolkit with Qvalue =25 and 65% of bases satisfying Q value
    2. Edited reference .gtf file ; removed all noncoding enteries in ref gtf file.

    3. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085471.fastq (normal temporal lobe sample).

    4. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_normal.bam (normal temporal lobe FPKM calculation)

    5. tophat -o tophat_filter_out --segment-mismatches 0 --segment-length 18 -p 4 ../hg19 SRR085473.fastq (Alzheimer temporal lobe sample).

    6. cufflinks -p 4 -G ../../../human_gtf/human_protein_coding.gtf -r ../../../hg19.fa ../accepted_hits_ad.bam (Alzheimer temporal lobe FPKM calculation)

    7. cuffcompare -o normal_ad_comapare -r ../human_gtf/human_protein_coding.gtf -R -s ../../human_chr/ transcripts_normal.gtf transcripts_ad.gtf

    8. cuffdiff -o normal_ad_filt_cuffdiff -p 4 -N -r ../../hg19.fa ../normal_ad_comapare.combined.gtf ../accepted_hits_norm_filt.bam ../accepted_hits_ad.bam

    To my disappointment I am not able to replicate the FPKM values for top 10 'up an down regulated genes' for the temporal lobe sample.
    my qyery is :

    1. Is the protocol i followed is correct?

    2. Do I use -N (Quartile normalization) for assembly through cufflinks? Does it affect final outcome?

    3. I am getting very high FPKM values. What is the range within which FPKM values should fall?

    4. While looking at differential gene expression do one look at the FPKM values obtained in gene.expr through cufflinks for both samples and then calculate the fold? OR look at the output gene_exp.diff obtained through cuffdiff ?

    Apologies for the lengthy post..!
    I assume they are using old versions of these softwares....check if there are big changes in cufflinks/cuffdiff/tophat

    Comment

    • honey
      Senior Member
      • Feb 2010
      • 151

      #3
      Cufflink and Cuffdiff

      For sure they have used the older version but with new improvements in these tools it has just improved efficency, automation but will not flip flop results. I would email to the authors they are also in genomic cores and are in bioinformatics to get the sense what they feel.
      I must say this is one of those few selected papers whcih has published RNA-seq using Cufflink

      Comment

      • harshinamdar
        harry
        • Jun 2010
        • 14

        #4
        For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.

        example:

        Gene FPKM
        APOE 4085.26 With -N
        APOE 743.98 Without -N

        Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.

        Comments are appreciated!!.

        Comment

        • dariober
          Senior Member
          • May 2010
          • 311

          #5
          Hi harshinamdar,

          Originally posted by harshinamdar View Post
          For cufflinks if I use the -N/--quartile normalization, the FPKM values for a gene are very high and if I don't use -N option the values come down drastically.
          I've noticed the same. I think it is because normalization ignores the top 25% most expressed genes, therefore the raw expression of each transcript is divided by a much smaller total count compared to non-normalized transcripts.

          Using -N option improves robustness of differential expression calls for less abundant genes but should the difference be so high? I am kind of inclined more towards accepting the values obtained without using the -N option.
          In my datasets normalize and non-normalized FPKMs are almost perfectly correlated, so I'm also tempted not to use -N.

          All the best
          Dario

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 06:09 AM
          0 responses
          15 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          34 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          39 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          47 views
          0 reactions
          Last Post SEQadmin2  
          Working...