Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues found in using cufflinks/cuffcompare/cuffdiff

    I already wrote to Cole for the following questions. Post here in case others have the relevant questions (or answer).

    -----------------------
    Hi, Cole

    THanks first for the great tools you designed.

    I have found several issues during using the tools. I'm not sure it's reported by others already or not. Just let you know in case:

    1. Cufflinks -M mask.gtf seems not really mask reads mapped to mask.gtf file, when -G is set.
    I have checked the results from the following command:
    cufflinks -o \$HOME/scratch/HD_$inputfile/cufflinks_allknown -j 0.05 -p 8 -Q 0 -G $BOWTIE_INDEXES/hg19.gtf -M $BOWTIE_INDEXES/hg19.rRNA_MtRNA.gtf -r $BOWTIE_INDEXES/hg19.fa \$HOME/scratch/HD_$inputfile/tophat.mm2.accepted_hits.hg19.sam

    In the output file ../cufflinks_allknown/transcripts.gtf, I still can see rRNA and mitochondria RNAs there. Shouldn't they be masked out from the output gtf when -M is set?

    2. What's the suggested parameters for detecting novel transcripts using cufflinks?

    I think detecting novel transcripts is one of the popular tasks for RAN-seq processing. But I found only very short "transcripts" were assembled by cufflinks if without setting option -G. I compared the output from cufflinks (without -G) with annotated transcript in UCSC, and it's obviously seen that only a small fraction of reads are used to assemble the 'transcripts' (and they are shorter than the annotated ones). OK, if I switched to use -G in cufflinks, the output will of course show FPKM for each annotated transcript in the GTF; in this case, I cannot know how many reads are used to calculated that FPKM value (I can't find way to track this. Educate me if there is a way, please) So, by comparing these two results, I have an intuitive concern: even I use -G option, it might use only a small fraction of reads for calculating the FPKM? Can the fraction of used reads be increased if I turn on the -G option? I am slightly doubting this

    3. Option of "-o <outprefix>" in Cuffcompare should not contain '.' in the outprefix path.

    For example, if I set "cuffcompare -o /home/projects/HD.neuro/results/cuffcompare ...", the output of cuffcompare will be output to
    /home/projects/HD.combined.gtf
    /home/projects/HD.loci
    /home/projects/HD.tracking

    I guess this is because you wrongly stop until the first '.' in the path.

    4. How is the Test status NOTEST decided?

    In my cuffdiff output file gene_exp.diff, I saw many comparisons with test status NOTEST. It says "not enough alignments for testing" in the manual, but in my result file, I saw some genes with comparable FPKM values in both samples, but they are marked as NOTEST; while other genes with 0 FPKM in both sample, and they are marked as OK (e.g. the 2nd record in below example)

    Another confusion for the 'yes/no' significant column, It says "depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing". However, for cases with 0 uncorrected p-value, some are marked as 'yes', some are marked as 'no'. I am confused which one I should use to select differentially expression genes. The last column ?

    $ tail gene_exp.diff
    XLOC_048056 J01415.24 chrM:8293-8364 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
    XLOC_048057 J01415.19,J01415.2,J01415.21,J01415.23,J01415.5,J01415.8,MT-ATP6,MT-CO3,MT-ND3,MT-ND4,MT-ND4L,MT-ND5 chrM:8364-14742 HD_B4430 HD_B4189 OK 0 0 0 0 1no
    XLOC_048058 J01415.15,MT-CYB chrM:14745-15953 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
    XLOC_048059 J01415.22 chrM:3305-4400 HD_B4430 HD_B4189 NOTEST 3695.47 2047.85 -0.590317 21.4283 0 no
    XLOC_048060 J01415.1 chrM:5585-5655 HD_B4430 HD_B4189 NOTEST 1949.56 1234.71 -0.456771 12.5587 0 no
    XLOC_048061 J01415.18 chrM:5655-5729 HD_B4430 HD_B4189 NOTEST 970.36 667.756 -0.373745 7.43324 1.05915e-13 no
    XLOC_048062 J01415.11,J01415.14 chrM:5759-5891 HD_B4430 HD_B4189 OK 7858.87 5882.96 -0.289583 16.7969 0 yes
    XLOC_048063 J01415.13 chrM:5902-7514 HD_B4430 HD_B4189 OK 14129.9 10391.4 -0.30732 23.7808 0 yes
    XLOC_048064 J01415.16,MT-ND6 chrM:8364-14742 HD_B4430 HD_B4189 OK 55722.5 25892.8 -0.766417 77.8562 0 yes
    XLOC_048065 J01415.20 chrM:15954-16023 HD_B4430 HD_B4189 OK 14668.5 5896.61 -0.911327 59.1021 0 yes

    So many questions, I know. Sorry for that.

    Regards,

    Xianjun

  • #2
    I hope Cole have time for these questions

    Comment


    • #3
      annotated rRNA/MtRNA .gtf file

      Greetings sterding-

      I noticed that you have an annotated human rRNA_MtRNA file. I am trying to acquire such a file so that I may use the -M option when running cufflinks. Could you please advise on how to acquire/generate the file? Any assistance you could offer would be greatly appreciated.

      Many thanks

      Comment


      • #4
        I simply extracted the record with keywords 'rRNA' or 'Mt' from hg19.gtf file by using command 'grep'.

        Comment


        • #5
          Many, many, thanks sterding for your quick reply! Works great!

          Comment


          • #6
            Hi sterding,

            Is it better you using -g instead of -G in your situation?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 11:09 AM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Today, 06:13 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Working...
            X