Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks / Cuffdiff problem

    Hi all.

    I hope someone can help me with my cuffdiff problem, which I have been trying to work around for a few days now.
    I guess I should mention that I am a novice at this...

    I run cuffdiff on the output from Tophat like this:

    Code:
    cuffdiff -o ./cuffdiff -p 7 /media/bigdrive/morten/software/cufflinks-1.0.3.Linux_x86_64/Homo_sapiens_annotation_hg19-GRCh37.62.gtf T4A_1_L1-tophat/accepted_hits.bam T4C_1_L1-tophat/accepted_hits.bam
    But for some reason cuffdiff reports zero total map mass and read type as 0bp single-end

    Code:
    You are using Cufflinks v1.0.3, which is the most recent release.
    [10:51:59] Loading reference annotation.
    [10:52:07] Inspecting maps and determining fragment length distributions.
    Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
    > Map Properties:
    >	Total Map Mass: 0.00
    >	Read Type: 0bp single-end
    >	Fragment Length Distribution: Truncated Gaussian (default)
    >	              Default Mean: 200
    >	           Default Std Dev: 80
    Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
    > Map Properties:
    >	Total Map Mass: 0.00
    >	Read Type: 0bp single-end
    >	Fragment Length Distribution: Truncated Gaussian (default)
    >	              Default Mean: 200
    >	           Default Std Dev: 80
    [10:53:16] Modeling fragment count overdispersion.
    Warning: compparcomp: perfect fit
    [10:53:16] Testing for differential expression and regulation in locus.
    > Processed 33455 loci.                        [*************************] 100%
    Performed 0 isoform-level transcription difference tests
    Performed 0 tss-level transcription difference tests
    Performed 0 gene-level transcription difference tests
    Performed 0 CDS-level transcription difference tests
    Performed 0 splicing tests
    Performed 0 promoter preference tests
    Performing 0 relative CDS output tests
    Writing isoform-level FPKM tracking
    Writing TSS group-level FPKM tracking
    Writing gene-level FPKM tracking
    Writing CDS-level FPKM tracking
    And the output files (like gene_exp.diff) just have zero for all values and fold changes

    I also tried using bam files from a direct Bowtie alignment with the same results.

    Software versions I am using:
    Bowtie v 0.12.5
    Samtools v 0.1.16 (r963:234)
    TopHat v1.3.1 (also tried v1.2)
    Cufflinks v1.0.3

    I really hope someone can help me because I am completely stuck!
    Thank you.

    -Morten

  • #2
    It seems I just solved my problem.
    The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
    I downloaded one originating from UCSC, now it works.

    Comment


    • #3
      Hi Morten,

      I am also facing the same problem partially. I get almost the same warning messages as above and all the output files are empty. I am using the annotation file which i downloaded from ensembl (hg18).

      If you have any idea about this, can you please help me?

      Comment


      • #4
        Originally posted by Morten View Post
        It seems I just solved my problem.
        The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
        I downloaded one originating from UCSC, now it works.
        Can you post the link to the hg19 gtf annotation? I've been trying to find one with no luck

        Comment


        • #5
          Hi everyone,

          The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

          Note: If an arbitrary GTF/GFF3 file is used as input (to CuffDiff) (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this:

          Code:
          cuffcompare -s /path/to/genome_seqs.fa -r annotation.gtf annotation.gtf
          I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

          I agree that this is quite obscure and hard to find, especially since the argument description states
          <transcripts.(gtf/gff)> A transcript annotation file produced by cufflinks, cuffcompare, or other source.
          The other source implies that a standard GTF file from UCSC should work, but this is misleading.

          --------------------------------------
          Dario Strbenac
          Research Assistant
          Cancer Epigenetics
          Garvan Institute of Medical Research
          Darlinghurst NSW 2010
          Australia

          Comment


          • #6
            Does anyone actually have links to UCSC GTF files? I have been able to find none. The only ones I have located are based on Ensembl annotation and chromosome names.

            Comment


            • #7
              sorry for the naive question, but does it matter whether the annotations are from UCSC or Ensemble? I used Ensemble and get "0" values as well.
              Last edited by nsl; 07-19-2011, 12:05 PM.

              Comment


              • #8
                I downloaded the UCSC annotation file here:

                Comment


                • #9
                  I have the same problem when run cufflinks.

                  I run cufflinks 1.0.3 with command:

                  bin/rnaseq/cufflinks-1.0.3.Linux_x86_64/cufflinks -p 6 -I 5000000
                  --upper-quartile-norm
                  -G ~/bin/genome_index/annotation/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf
                  --output-dir mapping/7124 mapping/7124/accepted_hits.bam

                  I got these output.

                  Warning: Using default Gaussian distribution due to insufficient
                  paired-end reads in open ranges. It is recommended that correct
                  paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
                  > Map Properties:
                  > Upper Quartile: 366.70
                  > Read Type: 108bp x 102bp
                  > Fragment Length Distribution: Truncated Gaussian (default)
                  > Default Mean: 200
                  > Default Std Dev: 80

                  My data is Illumina pair-end RNA-seq 2*101bp which is mapped by
                  Tophat. Before mapping the data, I have used cutadat to trim the
                  adaptor, So read1 and read2 will not the same length after trimed.

                  1, Do you have some suggestion why cufflinks cannot learns the
                  fragment length mean from my data?
                  2, Why it output so strange Read Type: 108bp x 102bp?

                  Originally posted by Dario1984 View Post
                  Hi everyone,

                  The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.



                  I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

                  I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

                  --------------------------------------
                  Dario Strbenac
                  Research Assistant
                  Cancer Epigenetics
                  Garvan Institute of Medical Research
                  Darlinghurst NSW 2010
                  Australia

                  Comment


                  • #10
                    Hi fabrice,

                    I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

                    Thank you,
                    Douglas

                    Comment


                    • #11
                      Hi Douglas,

                      Thank you for your suggestions. It seems that there is not plenty of properly mapped pair-end reads in my bam files.

                      I think there is a problem in the mapping. Maybe this problem caused by the parameter -r/--mate-inner-dist in tophat. Because I trimed some reads, the different sizes of forward and reverse reads will let this parameter hard to set. The inner distance between mate pairs is variation.

                      -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                      Originally posted by DZhang View Post
                      Hi fabrice,

                      I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

                      Thank you,
                      Douglas
                      www.contigexpress.com

                      Comment


                      • #12
                        Hi fabrice,

                        Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

                        Douglas

                        Comment


                        • #13
                          Hi Douglas,

                          I have try trimed and without trimed with BWA. I found quality trimming can get more properly paired mapping. About 30% in my dataset, so I think quality trimming is necessary to my dataset.

                          BWA seems no problem for different sizes. But I prefer using Tophat to output junctions.bed, insertions.bed and deletions.bed.

                          Thanks.

                          Originally posted by DZhang View Post
                          Hi fabrice,

                          Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

                          Douglas
                          www.contigexpress.com

                          Comment


                          • #14
                            Hi fabrice,

                            Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

                            Douglas

                            Comment


                            • #15
                              tophat-1.3.1.Linux_x86_64/tophat --mate-inner-dist 194 -p 6 --segment-mismatches 2 --segment-length 25 --mate-std-dev 25 --min-anchor 8 --splice-mismatches 0 --min-intron 50 --max-intron 5000000 --min-isoform-fraction 0.15 --max-multihits 40 --solexa1.3-quals -o mapping/7124_3 ~/bin/genome_index/tophat_indexes/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.dna.chromosome mapping/7124_3/7124_3_1.fq mapping/7124_3/7124_3_2.fq


                              Here mate-inner-dist=PCR size - 2*101 = 194.

                              But read1 and read2 have been trimed.

                              Originally posted by DZhang View Post
                              Hi fabrice,

                              Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

                              Douglas
                              www.contigexpress.com

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X