Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff short name

    I tired running cuffdiff on two RNA samples: Liver & Intestine.
    After aligning with tophat, I ran cuffcompare
    Code:
    cuffcompare -o GFF -r ../hg19.RefFlat.GFF3 ../Liver/transcripts.gtf ../Lung/transcripts.gtf
    Code:
    cuffdiff GFF.combined.gtf ../Liver/accepted_hits.sam ../Lung/accepted_hits.sam
    but my genes file looks like this:
    Code:
    ref_trans_id	class_code	gene_short_name	tss_id	locus	q0_FPKM	q0_conf_lo	q0_conf_hi	q1_FPKM	q1_conf_lo	q1_conf_hi	ref_id
    XLOC_000001-[chr22:17517459-17539682]	-	-	-	chr22:17517459-17539682	0	0	0	0.56927	0	1.37434	-
    XLOC_000002-[chr22:17565848-17591387]	-	-	-	chr22:17565848-17591387	2.37959	0.696966	4.06222	15.9232	12.1168	19.7296	-
    XLOC_000003-[chr22:17956627-18033845]	-	-	-	chr22:17956627-18033845	0.540063	0	1.16367	0.413014	0	0.889921	-
    XLOC_000004-[chr22:18043182-18073647]	-	-	-	chr22:18043182-18073647	92.8549	79.4174	106.292	1.11535	0	2.40326	-
    XLOC_000005-[chr22:18121484-18211987]	-	-	-	chr22:18121484-18211987	2.41355	0.706914	4.12019	1.38433	0.254029	2.51463	-
    XLOC_000006-[chr22:18560685-18572206]	-	-	-	chr22:18560685-18572206	3.25714	0.953993	5.56028	3.11362	1.1444	5.08285	-
    XLOC_000007-[chr22:18593558-18614498]	-	-	-	chr22:18593558-18614498	3.26743	0.704014	5.83085	0.960786	0	2.17638	-
    XLOC_000008-[chr22:18632757-18660160]	-	-	-	chr22:18632757-18660160	3.48871	0.849702	6.12772	3.43404	1.14438	5.7237	-
    XLOC_000009-[chr22:18893735-18899600]	-	-	-	chr22:18893735-18899600	25.8998	16.5909	35.2086	0	0	0	-
    My transcripts* and genes files have the correct gene and transcript names:
    Code:
    gene_id	bundle_id	chr	left	right	FPKM
    SAMD11	45228	chr1	861120	879961	0.794517
    NOC2L	45228	chr1	879583	894679	25.374
    ISG15	45228	chr1	948846	949915	3.06525
    AGRN	45228	chr1	955502	991492	8.73339
    C1orf159	45228	chr1	1017197	1051736	1.44667
    SDF4	45228	chr1	1152288	1167447	89.863
    UBE2J2	45228	chr1	1189293	1209234	12.5814
    ACAP3	45228	chr1	1227763	1243269	5.66814
    PUSL1	45228	chr1	1243993	1247056	2.45665
    Any clues?

  • #2
    Sorry I can't really help you with your problem, but I'm trying to use Cufflinks and I see you have made progress where I have been failing and was wondering if you could help me.

    I've been trying to supply a GFF file when I run cuffcompare, but the names are never assigned to the transcripts and they are all classified as class "u" or "."

    Code:
    brandon@brandon-desktop:~/arab/small$ cuffcompare -o 162_162E -r ~/Desktop/tair9_small_RNAs.gff 162/162.cufflinks/transcripts.gtf 162E/162E.cufflinks/transcripts.gtf 
    Warning: found 79213 transcripts with undetermined strand.
    Warning: found 64824 transcripts with undetermined strand.
    Any idea what I could be doing wrong? This problem has been holding me up for a while.

    Comment


    • #3
      Originally posted by DrD2009 View Post
      Sorry I can't really help you with your problem, but I'm trying to use Cufflinks and I see you have made progress where I have been failing and was wondering if you could help me.

      I've been trying to supply a GFF file when I run cuffcompare, but the names are never assigned to the transcripts and they are all classified as class "u" or "."

      Code:
      brandon@brandon-desktop:~/arab/small$ cuffcompare -o 162_162E -r ~/Desktop/tair9_small_RNAs.gff 162/162.cufflinks/transcripts.gtf 162E/162E.cufflinks/transcripts.gtf 
      Warning: found 79213 transcripts with undetermined strand.
      Warning: found 64824 transcripts with undetermined strand.
      Any idea what I could be doing wrong? This problem has been holding me up for a while.
      You might check your reference file to see if the stand information is in the correct place.

      I really wish there were some repository where we all could download the same files and make this process a lot easier!

      Comment


      • #4
        Thanks. I'll double check the file.

        I agree. One day it will be done, probably after I stop working on this stuff. lol

        Comment


        • #5
          I was wondering if you could post a few lines of the GFF file you used to annotate your reads for cuffcompare?

          I've tried GFF and GFF3, but the reads still came out without annotations.

          Are you able to supply cufflinks with a GFF as well to provide annotation? The manual only mentions GTF files.

          Thanks again,
          Brandon

          Comment


          • #6
            Originally posted by DrD2009 View Post
            I was wondering if you could post a few lines of the GFF file you used to annotate your reads for cuffcompare?

            I've tried GFF and GFF3, but the reads still came out without annotations.

            Are you able to supply cufflinks with a GFF as well to provide annotation? The manual only mentions GTF files.

            Thanks again,
            Brandon
            They only take GTF. I can send you the GTF file if you like. this is how I run the analysis:
            tophat -p 5 - SampleName Sample.fq

            cufflinks -p 5 -L Sample1 -G hg19.Ens.GTF Sample1.sam

            cuffcompare -o Sample1 -r hg19.Ens.GTF -R Sample1.transcripts.gtf Sample2.transcripts.gtf

            cuffdiff -p 5 combined.gtf Sample1.sam Sample2.sam
            The GTF looks like this:
            chr1 protein_coding CDS 67050223 67050289 . + 1 gene_id "ENSG00000173020"; transcript_id "ENST00000308595"; exon_number "14"; gene_name "ADRBK1"; transcript_name "ADRBK1-201"; protein_id "ENSP00000312262";
            chr1 protein_coding CDS 67050223 67050289 . + 1 gene_id "ENSG00000173020"; transcript_id "ENST00000416281"; exon_number "2"; gene_name "ADRBK1"; transcript_name "ADRBK1-202"; protein_id "ENSP00000407159";
            chr1 protein_coding CDS 67050599 67050699 . + 0 gene_id "ENSG00000173020"; transcript_id "ENST00000308595"; exon_number "15"; gene_name "ADRBK1"; transcript_name "ADRBK1-201"; protein_id "ENSP00000312262";
            I can send you the GTF if you like.

            Comment


            • #7
              Ahh, then that is my problem. The organism I work on, Arabidopsis thaliana, has no published GTF files anywhere that I have been able to locate to provide annotation for Cufflinks.

              I have only GFF files. I might try creating GTFs out of my GFFs and seeing if I can provide annotation with Cufflinks that way.

              Thank you for all of your help and fast replies. I really appreciate it.

              Comment


              • #8
                isoforms with cufflinks and cuffdiff

                Hi everyone,
                I have questions on the rebuilt isofroms/transcripts in the different steps of Cufflinks tool. I found that the rebuilt transcripts recovered by cufflinks and the transcripts in isoform_exp.diff created by cuffdiff are different. My questions are:
                1)Cuffdiff re-assign the reads, rebuild the transcripts, and re-estimate the expression level?
                2)How can I track the structure of the transcripts in isoform_exp.diff?

                Thank you.

                Comment


                • #9
                  sorry i posted the above thread in the wrong place.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-25-2024, 11:49 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  62 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X