Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Morten
    Junior Member
    • Jun 2010
    • 9

    Cufflinks / Cuffdiff problem

    Hi all.

    I hope someone can help me with my cuffdiff problem, which I have been trying to work around for a few days now.
    I guess I should mention that I am a novice at this...

    I run cuffdiff on the output from Tophat like this:

    Code:
    cuffdiff -o ./cuffdiff -p 7 /media/bigdrive/morten/software/cufflinks-1.0.3.Linux_x86_64/Homo_sapiens_annotation_hg19-GRCh37.62.gtf T4A_1_L1-tophat/accepted_hits.bam T4C_1_L1-tophat/accepted_hits.bam
    But for some reason cuffdiff reports zero total map mass and read type as 0bp single-end

    Code:
    You are using Cufflinks v1.0.3, which is the most recent release.
    [10:51:59] Loading reference annotation.
    [10:52:07] Inspecting maps and determining fragment length distributions.
    Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
    > Map Properties:
    >	Total Map Mass: 0.00
    >	Read Type: 0bp single-end
    >	Fragment Length Distribution: Truncated Gaussian (default)
    >	              Default Mean: 200
    >	           Default Std Dev: 80
    Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
    > Map Properties:
    >	Total Map Mass: 0.00
    >	Read Type: 0bp single-end
    >	Fragment Length Distribution: Truncated Gaussian (default)
    >	              Default Mean: 200
    >	           Default Std Dev: 80
    [10:53:16] Modeling fragment count overdispersion.
    Warning: compparcomp: perfect fit
    [10:53:16] Testing for differential expression and regulation in locus.
    > Processed 33455 loci.                        [*************************] 100%
    Performed 0 isoform-level transcription difference tests
    Performed 0 tss-level transcription difference tests
    Performed 0 gene-level transcription difference tests
    Performed 0 CDS-level transcription difference tests
    Performed 0 splicing tests
    Performed 0 promoter preference tests
    Performing 0 relative CDS output tests
    Writing isoform-level FPKM tracking
    Writing TSS group-level FPKM tracking
    Writing gene-level FPKM tracking
    Writing CDS-level FPKM tracking
    And the output files (like gene_exp.diff) just have zero for all values and fold changes

    I also tried using bam files from a direct Bowtie alignment with the same results.

    Software versions I am using:
    Bowtie v 0.12.5
    Samtools v 0.1.16 (r963:234)
    TopHat v1.3.1 (also tried v1.2)
    Cufflinks v1.0.3

    I really hope someone can help me because I am completely stuck!
    Thank you.

    -Morten
  • Morten
    Junior Member
    • Jun 2010
    • 9

    #2
    It seems I just solved my problem.
    The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
    I downloaded one originating from UCSC, now it works.

    Comment

    • pinki999
      Member
      • Oct 2010
      • 37

      #3
      Hi Morten,

      I am also facing the same problem partially. I get almost the same warning messages as above and all the output files are empty. I am using the annotation file which i downloaded from ensembl (hg18).

      If you have any idea about this, can you please help me?

      Comment

      • gavin.oliver
        Senior Member
        • Jan 2010
        • 110

        #4
        Originally posted by Morten View Post
        It seems I just solved my problem.
        The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
        I downloaded one originating from UCSC, now it works.
        Can you post the link to the hg19 gtf annotation? I've been trying to find one with no luck

        Comment

        • Dario1984
          Senior Member
          • Jun 2011
          • 166

          #5
          Hi everyone,

          The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

          Note: If an arbitrary GTF/GFF3 file is used as input (to CuffDiff) (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this:

          Code:
          cuffcompare -s /path/to/genome_seqs.fa -r annotation.gtf annotation.gtf
          I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

          I agree that this is quite obscure and hard to find, especially since the argument description states
          <transcripts.(gtf/gff)> A transcript annotation file produced by cufflinks, cuffcompare, or other source.
          The other source implies that a standard GTF file from UCSC should work, but this is misleading.

          --------------------------------------
          Dario Strbenac
          Research Assistant
          Cancer Epigenetics
          Garvan Institute of Medical Research
          Darlinghurst NSW 2010
          Australia

          Comment

          • gavin.oliver
            Senior Member
            • Jan 2010
            • 110

            #6
            Does anyone actually have links to UCSC GTF files? I have been able to find none. The only ones I have located are based on Ensembl annotation and chromosome names.

            Comment

            • nsl
              Member
              • Jan 2011
              • 28

              #7
              sorry for the naive question, but does it matter whether the annotations are from UCSC or Ensemble? I used Ensemble and get "0" values as well.
              Last edited by nsl; 07-19-2011, 12:05 PM.

              Comment

              • Morten
                Junior Member
                • Jun 2010
                • 9

                #8
                I downloaded the UCSC annotation file here:

                Comment

                • fabrice
                  Member
                  • Oct 2009
                  • 86

                  #9
                  I have the same problem when run cufflinks.

                  I run cufflinks 1.0.3 with command:

                  bin/rnaseq/cufflinks-1.0.3.Linux_x86_64/cufflinks -p 6 -I 5000000
                  --upper-quartile-norm
                  -G ~/bin/genome_index/annotation/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf
                  --output-dir mapping/7124 mapping/7124/accepted_hits.bam

                  I got these output.

                  Warning: Using default Gaussian distribution due to insufficient
                  paired-end reads in open ranges. It is recommended that correct
                  paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
                  > Map Properties:
                  > Upper Quartile: 366.70
                  > Read Type: 108bp x 102bp
                  > Fragment Length Distribution: Truncated Gaussian (default)
                  > Default Mean: 200
                  > Default Std Dev: 80

                  My data is Illumina pair-end RNA-seq 2*101bp which is mapped by
                  Tophat. Before mapping the data, I have used cutadat to trim the
                  adaptor, So read1 and read2 will not the same length after trimed.

                  1, Do you have some suggestion why cufflinks cannot learns the
                  fragment length mean from my data?
                  2, Why it output so strange Read Type: 108bp x 102bp?

                  Originally posted by Dario1984 View Post
                  Hi everyone,

                  The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.



                  I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

                  I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

                  --------------------------------------
                  Dario Strbenac
                  Research Assistant
                  Cancer Epigenetics
                  Garvan Institute of Medical Research
                  Darlinghurst NSW 2010
                  Australia

                  Comment

                  • DZhang
                    Senior Member
                    • Jun 2010
                    • 177

                    #10
                    Hi fabrice,

                    I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

                    Thank you,
                    Douglas

                    Comment

                    • fabrice
                      Member
                      • Oct 2009
                      • 86

                      #11
                      Hi Douglas,

                      Thank you for your suggestions. It seems that there is not plenty of properly mapped pair-end reads in my bam files.

                      I think there is a problem in the mapping. Maybe this problem caused by the parameter -r/--mate-inner-dist in tophat. Because I trimed some reads, the different sizes of forward and reverse reads will let this parameter hard to set. The inner distance between mate pairs is variation.

                      -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                      Originally posted by DZhang View Post
                      Hi fabrice,

                      I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

                      Thank you,
                      Douglas
                      www.contigexpress.com

                      Comment

                      • DZhang
                        Senior Member
                        • Jun 2010
                        • 177

                        #12
                        Hi fabrice,

                        Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

                        Douglas

                        Comment

                        • fabrice
                          Member
                          • Oct 2009
                          • 86

                          #13
                          Hi Douglas,

                          I have try trimed and without trimed with BWA. I found quality trimming can get more properly paired mapping. About 30% in my dataset, so I think quality trimming is necessary to my dataset.

                          BWA seems no problem for different sizes. But I prefer using Tophat to output junctions.bed, insertions.bed and deletions.bed.

                          Thanks.

                          Originally posted by DZhang View Post
                          Hi fabrice,

                          Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

                          Douglas
                          www.contigexpress.com

                          Comment

                          • DZhang
                            Senior Member
                            • Jun 2010
                            • 177

                            #14
                            Hi fabrice,

                            Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

                            Douglas

                            Comment

                            • fabrice
                              Member
                              • Oct 2009
                              • 86

                              #15
                              tophat-1.3.1.Linux_x86_64/tophat --mate-inner-dist 194 -p 6 --segment-mismatches 2 --segment-length 25 --mate-std-dev 25 --min-anchor 8 --splice-mismatches 0 --min-intron 50 --max-intron 5000000 --min-isoform-fraction 0.15 --max-multihits 40 --solexa1.3-quals -o mapping/7124_3 ~/bin/genome_index/tophat_indexes/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.dna.chromosome mapping/7124_3/7124_3_1.fq mapping/7124_3/7124_3_2.fq


                              Here mate-inner-dist=PCR size - 2*101 = 194.

                              But read1 and read2 have been trimed.

                              Originally posted by DZhang View Post
                              Hi fabrice,

                              Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

                              Douglas
                              www.contigexpress.com

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:03 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...