Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues found in using cufflinks/cuffcompare/cuffdiff

    I already wrote to Cole for the following questions. Post here in case others have the relevant questions (or answer).

    Hi, Cole

    THanks first for the great tools you designed.

    I have found several issues during using the tools. I'm not sure it's reported by others already or not. Just let you know in case:

    1. Cufflinks -M mask.gtf seems not really mask reads mapped to mask.gtf file, when -G is set.
    I have checked the results from the following command:
    cufflinks -o \$HOME/scratch/HD_$inputfile/cufflinks_allknown -j 0.05 -p 8 -Q 0 -G $BOWTIE_INDEXES/hg19.gtf -M $BOWTIE_INDEXES/hg19.rRNA_MtRNA.gtf -r $BOWTIE_INDEXES/hg19.fa \$HOME/scratch/HD_$inputfile/tophat.mm2.accepted_hits.hg19.sam

    In the output file ../cufflinks_allknown/transcripts.gtf, I still can see rRNA and mitochondria RNAs there. Shouldn't they be masked out from the output gtf when -M is set?

    2. What's the suggested parameters for detecting novel transcripts using cufflinks?

    I think detecting novel transcripts is one of the popular tasks for RAN-seq processing. But I found only very short "transcripts" were assembled by cufflinks if without setting option -G. I compared the output from cufflinks (without -G) with annotated transcript in UCSC, and it's obviously seen that only a small fraction of reads are used to assemble the 'transcripts' (and they are shorter than the annotated ones). OK, if I switched to use -G in cufflinks, the output will of course show FPKM for each annotated transcript in the GTF; in this case, I cannot know how many reads are used to calculated that FPKM value (I can't find way to track this. Educate me if there is a way, please) So, by comparing these two results, I have an intuitive concern: even I use -G option, it might use only a small fraction of reads for calculating the FPKM? Can the fraction of used reads be increased if I turn on the -G option? I am slightly doubting this

    3. Option of "-o <outprefix>" in Cuffcompare should not contain '.' in the outprefix path.

    For example, if I set "cuffcompare -o /home/projects/HD.neuro/results/cuffcompare ...", the output of cuffcompare will be output to

    I guess this is because you wrongly stop until the first '.' in the path.

    4. How is the Test status NOTEST decided?

    In my cuffdiff output file gene_exp.diff, I saw many comparisons with test status NOTEST. It says "not enough alignments for testing" in the manual, but in my result file, I saw some genes with comparable FPKM values in both samples, but they are marked as NOTEST; while other genes with 0 FPKM in both sample, and they are marked as OK (e.g. the 2nd record in below example)

    Another confusion for the 'yes/no' significant column, It says "depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing". However, for cases with 0 uncorrected p-value, some are marked as 'yes', some are marked as 'no'. I am confused which one I should use to select differentially expression genes. The last column ?

    $ tail gene_exp.diff
    XLOC_048056 J01415.24 chrM:8293-8364 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
    XLOC_048057 J01415.19,J01415.2,J01415.21,J01415.23,J01415.5,J01415.8,MT-ATP6,MT-CO3,MT-ND3,MT-ND4,MT-ND4L,MT-ND5 chrM:8364-14742 HD_B4430 HD_B4189 OK 0 0 0 0 1no
    XLOC_048058 J01415.15,MT-CYB chrM:14745-15953 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
    XLOC_048059 J01415.22 chrM:3305-4400 HD_B4430 HD_B4189 NOTEST 3695.47 2047.85 -0.590317 21.4283 0 no
    XLOC_048060 J01415.1 chrM:5585-5655 HD_B4430 HD_B4189 NOTEST 1949.56 1234.71 -0.456771 12.5587 0 no
    XLOC_048061 J01415.18 chrM:5655-5729 HD_B4430 HD_B4189 NOTEST 970.36 667.756 -0.373745 7.43324 1.05915e-13 no
    XLOC_048062 J01415.11,J01415.14 chrM:5759-5891 HD_B4430 HD_B4189 OK 7858.87 5882.96 -0.289583 16.7969 0 yes
    XLOC_048063 J01415.13 chrM:5902-7514 HD_B4430 HD_B4189 OK 14129.9 10391.4 -0.30732 23.7808 0 yes
    XLOC_048064 J01415.16,MT-ND6 chrM:8364-14742 HD_B4430 HD_B4189 OK 55722.5 25892.8 -0.766417 77.8562 0 yes
    XLOC_048065 J01415.20 chrM:15954-16023 HD_B4430 HD_B4189 OK 14668.5 5896.61 -0.911327 59.1021 0 yes

    So many questions, I know. Sorry for that.



  • #2
    I hope Cole have time for these questions


    • #3
      annotated rRNA/MtRNA .gtf file

      Greetings sterding-

      I noticed that you have an annotated human rRNA_MtRNA file. I am trying to acquire such a file so that I may use the -M option when running cufflinks. Could you please advise on how to acquire/generate the file? Any assistance you could offer would be greatly appreciated.

      Many thanks


      • #4
        I simply extracted the record with keywords 'rRNA' or 'Mt' from hg19.gtf file by using command 'grep'.


        • #5
          Many, many, thanks sterding for your quick reply! Works great!


          • #6
            Hi sterding,

            Is it better you using -g instead of -G in your situation?


            Latest Articles


            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin

              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin

              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM





            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            Last Post seqadmin