Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Morten
    Junior Member
    • Jun 2010
    • 9

    Cufflinks / Cuffdiff problem

    Hi all.

    I hope someone can help me with my cuffdiff problem, which I have been trying to work around for a few days now.
    I guess I should mention that I am a novice at this...

    I run cuffdiff on the output from Tophat like this:

    Code:
    cuffdiff -o ./cuffdiff -p 7 /media/bigdrive/morten/software/cufflinks-1.0.3.Linux_x86_64/Homo_sapiens_annotation_hg19-GRCh37.62.gtf T4A_1_L1-tophat/accepted_hits.bam T4C_1_L1-tophat/accepted_hits.bam
    But for some reason cuffdiff reports zero total map mass and read type as 0bp single-end

    Code:
    You are using Cufflinks v1.0.3, which is the most recent release.
    [10:51:59] Loading reference annotation.
    [10:52:07] Inspecting maps and determining fragment length distributions.
    Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
    > Map Properties:
    >	Total Map Mass: 0.00
    >	Read Type: 0bp single-end
    >	Fragment Length Distribution: Truncated Gaussian (default)
    >	              Default Mean: 200
    >	           Default Std Dev: 80
    Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
    > Map Properties:
    >	Total Map Mass: 0.00
    >	Read Type: 0bp single-end
    >	Fragment Length Distribution: Truncated Gaussian (default)
    >	              Default Mean: 200
    >	           Default Std Dev: 80
    [10:53:16] Modeling fragment count overdispersion.
    Warning: compparcomp: perfect fit
    [10:53:16] Testing for differential expression and regulation in locus.
    > Processed 33455 loci.                        [*************************] 100%
    Performed 0 isoform-level transcription difference tests
    Performed 0 tss-level transcription difference tests
    Performed 0 gene-level transcription difference tests
    Performed 0 CDS-level transcription difference tests
    Performed 0 splicing tests
    Performed 0 promoter preference tests
    Performing 0 relative CDS output tests
    Writing isoform-level FPKM tracking
    Writing TSS group-level FPKM tracking
    Writing gene-level FPKM tracking
    Writing CDS-level FPKM tracking
    And the output files (like gene_exp.diff) just have zero for all values and fold changes

    I also tried using bam files from a direct Bowtie alignment with the same results.

    Software versions I am using:
    Bowtie v 0.12.5
    Samtools v 0.1.16 (r963:234)
    TopHat v1.3.1 (also tried v1.2)
    Cufflinks v1.0.3

    I really hope someone can help me because I am completely stuck!
    Thank you.

    -Morten
  • Morten
    Junior Member
    • Jun 2010
    • 9

    #2
    It seems I just solved my problem.
    The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
    I downloaded one originating from UCSC, now it works.

    Comment

    • pinki999
      Member
      • Oct 2010
      • 37

      #3
      Hi Morten,

      I am also facing the same problem partially. I get almost the same warning messages as above and all the output files are empty. I am using the annotation file which i downloaded from ensembl (hg18).

      If you have any idea about this, can you please help me?

      Comment

      • gavin.oliver
        Senior Member
        • Jan 2010
        • 110

        #4
        Originally posted by Morten View Post
        It seems I just solved my problem.
        The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
        I downloaded one originating from UCSC, now it works.
        Can you post the link to the hg19 gtf annotation? I've been trying to find one with no luck

        Comment

        • Dario1984
          Senior Member
          • Jun 2011
          • 166

          #5
          Hi everyone,

          The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

          Note: If an arbitrary GTF/GFF3 file is used as input (to CuffDiff) (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this:

          Code:
          cuffcompare -s /path/to/genome_seqs.fa -r annotation.gtf annotation.gtf
          I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

          I agree that this is quite obscure and hard to find, especially since the argument description states
          <transcripts.(gtf/gff)> A transcript annotation file produced by cufflinks, cuffcompare, or other source.
          The other source implies that a standard GTF file from UCSC should work, but this is misleading.

          --------------------------------------
          Dario Strbenac
          Research Assistant
          Cancer Epigenetics
          Garvan Institute of Medical Research
          Darlinghurst NSW 2010
          Australia

          Comment

          • gavin.oliver
            Senior Member
            • Jan 2010
            • 110

            #6
            Does anyone actually have links to UCSC GTF files? I have been able to find none. The only ones I have located are based on Ensembl annotation and chromosome names.

            Comment

            • nsl
              Member
              • Jan 2011
              • 28

              #7
              sorry for the naive question, but does it matter whether the annotations are from UCSC or Ensemble? I used Ensemble and get "0" values as well.
              Last edited by nsl; 07-19-2011, 12:05 PM.

              Comment

              • Morten
                Junior Member
                • Jun 2010
                • 9

                #8
                I downloaded the UCSC annotation file here:

                Comment

                • fabrice
                  Member
                  • Oct 2009
                  • 86

                  #9
                  I have the same problem when run cufflinks.

                  I run cufflinks 1.0.3 with command:

                  bin/rnaseq/cufflinks-1.0.3.Linux_x86_64/cufflinks -p 6 -I 5000000
                  --upper-quartile-norm
                  -G ~/bin/genome_index/annotation/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf
                  --output-dir mapping/7124 mapping/7124/accepted_hits.bam

                  I got these output.

                  Warning: Using default Gaussian distribution due to insufficient
                  paired-end reads in open ranges. It is recommended that correct
                  paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
                  > Map Properties:
                  > Upper Quartile: 366.70
                  > Read Type: 108bp x 102bp
                  > Fragment Length Distribution: Truncated Gaussian (default)
                  > Default Mean: 200
                  > Default Std Dev: 80

                  My data is Illumina pair-end RNA-seq 2*101bp which is mapped by
                  Tophat. Before mapping the data, I have used cutadat to trim the
                  adaptor, So read1 and read2 will not the same length after trimed.

                  1, Do you have some suggestion why cufflinks cannot learns the
                  fragment length mean from my data?
                  2, Why it output so strange Read Type: 108bp x 102bp?

                  Originally posted by Dario1984 View Post
                  Hi everyone,

                  The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.



                  I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

                  I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

                  --------------------------------------
                  Dario Strbenac
                  Research Assistant
                  Cancer Epigenetics
                  Garvan Institute of Medical Research
                  Darlinghurst NSW 2010
                  Australia

                  Comment

                  • DZhang
                    Senior Member
                    • Jun 2010
                    • 177

                    #10
                    Hi fabrice,

                    I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

                    Thank you,
                    Douglas

                    Comment

                    • fabrice
                      Member
                      • Oct 2009
                      • 86

                      #11
                      Hi Douglas,

                      Thank you for your suggestions. It seems that there is not plenty of properly mapped pair-end reads in my bam files.

                      I think there is a problem in the mapping. Maybe this problem caused by the parameter -r/--mate-inner-dist in tophat. Because I trimed some reads, the different sizes of forward and reverse reads will let this parameter hard to set. The inner distance between mate pairs is variation.

                      -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                      Originally posted by DZhang View Post
                      Hi fabrice,

                      I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

                      Thank you,
                      Douglas
                      www.contigexpress.com

                      Comment

                      • DZhang
                        Senior Member
                        • Jun 2010
                        • 177

                        #12
                        Hi fabrice,

                        Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

                        Douglas

                        Comment

                        • fabrice
                          Member
                          • Oct 2009
                          • 86

                          #13
                          Hi Douglas,

                          I have try trimed and without trimed with BWA. I found quality trimming can get more properly paired mapping. About 30% in my dataset, so I think quality trimming is necessary to my dataset.

                          BWA seems no problem for different sizes. But I prefer using Tophat to output junctions.bed, insertions.bed and deletions.bed.

                          Thanks.

                          Originally posted by DZhang View Post
                          Hi fabrice,

                          Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

                          Douglas
                          www.contigexpress.com

                          Comment

                          • DZhang
                            Senior Member
                            • Jun 2010
                            • 177

                            #14
                            Hi fabrice,

                            Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

                            Douglas

                            Comment

                            • fabrice
                              Member
                              • Oct 2009
                              • 86

                              #15
                              tophat-1.3.1.Linux_x86_64/tophat --mate-inner-dist 194 -p 6 --segment-mismatches 2 --segment-length 25 --mate-std-dev 25 --min-anchor 8 --splice-mismatches 0 --min-intron 50 --max-intron 5000000 --min-isoform-fraction 0.15 --max-multihits 40 --solexa1.3-quals -o mapping/7124_3 ~/bin/genome_index/tophat_indexes/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.dna.chromosome mapping/7124_3/7124_3_1.fq mapping/7124_3/7124_3_2.fq


                              Here mate-inner-dist=PCR size - 2*101 = 194.

                              But read1 and read2 have been trimed.

                              Originally posted by DZhang View Post
                              Hi fabrice,

                              Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

                              Douglas
                              www.contigexpress.com

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Today, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 06:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              37 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              43 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...