Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • capricy
    Senior Member
    • Apr 2012
    • 125

    annotate cufflink assembled transcripts with reference gtf

    Hello, there,

    1. I did genome-guided de novo transcripts assembly for my RNAseq data using cufflinks. Here .sam file is from STAR mapping

    cufflinks -p 8 /mapping/mapped.sam

    2. I then merged the resultant gtf files from the same tissue to have merged.gtf without including reference.gtf

    cuffmerge -p 8 gtf.filelist.DeNovo

    3. I tried to find the closest gene id for those de novo assembled transcripts

    cuffcompare merged.gtf -r reference.gtf

    What I've found is that none of my de novo assembled transcripts are mapped to the reference gtf even though some introns are apparently identical between the merged.gtf and reference.gtf

    for example:

    from the cufflinks merged.gtf, I have

    more XLOC_005458.gtf
    chr2 Cufflinks exon 25289899 25290661 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "1"; oId "CUFF.5451.1"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25290738 25290883 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "2"; oId "CUFF.5451.1"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25290976 25291190 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010739"; exon_number "3"; oId "CUFF.5451.1"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25289938 25290082 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "1"; oId "CUFF.5451.2"; tss_i
    d "TSS7438";
    chr2 Cufflinks exon 25290388 25291177 . . . gene_id "XLOC_005458"; transcript_id "TCONS_00010740"; exon_number "2"; oId "CUFF.5451.2"; tss_i
    d "TSS7438";

    from the reference.gtf, I have:
    2 ensembl_havana CDS 25289989 25290661 . + 0 ccds_id "CCDS15763"; exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";

    2 ensembl_havana CDS 25290738 25291057 . + 2 ccds_id "CCDS15763"; exon_number "2"; gene_biotype "protein_coding"; gene_id "ENSMUSG00000026961"; gene_name "Lrrc26"; gene_source "ensembl_havana"; gene_version "6"; havana_gene "OTTMUSG00000011934"; havana_gene_version "1"; havana_transcript "OTTMUST00000028197"; havana_transcript_version "1"; p_id "P45943"; protein_id "ENSMUSP00000028337"; protein_version "6"; tag "basic"; transcript_biotype "protein_coding"; transcript_id "ENSMUST00000028337"; transcript_name "Lrrc26-001"; transcript_source "ensembl_havana"; transcript_support_level "1"; transcript_version "6"; tss_id "TSS86428";

    Apparently the same intron (25290661 .. 25290738) exists in both the de novo assemble transcript and the reference. So my question is why the XLOC_005458 from cufflinks output is not mapped to the Lrrc26 in reference.gtf even though they share the same gene region?

    Thanks for any inputs!

    C.
    Last edited by capricy; 02-01-2017, 08:15 AM.
  • shunyip
    Member
    • Oct 2013
    • 20

    #2
    RNA molecules can suffer from degradation. However, introns are identified by splice junctions and are often in the middle of the RNA reads. So, it is more likely for introns to be identified correctly. If you want all genes to be mapped very similar to the reference, you might need higher sequencing depth and/or higher quality data.

    Comment

    • capricy
      Senior Member
      • Apr 2012
      • 125

      #3
      Then what is the easy way to annotate those assembled transcripts? I meant, I would like to find the closest reference gene IDs for the transcripts.

      Thanks.

      C.

      Comment

      • shunyip
        Member
        • Oct 2013
        • 20

        #4
        Hi Capricy,

        You can supply your gene annotation (reference.gtf) to Cufflinks during assembly, using the -g argument.
        Or you can use bedtools intersect to overlap and combine your merged.gtf and reference.gtf. Here is its document. You need to convert the gtf files into bed files for this method.


        I hope this helps,

        Comment

        • capricy
          Senior Member
          • Apr 2012
          • 125

          #5
          According to the cuffcompare document, if I use -r <reference.gtf>, the output should be able to identify the overlapped transfrags. But it did not in my case.

          Just wonder if there is something wrong with my steps?

          C.

          Comment

          • capricy
            Senior Member
            • Apr 2012
            • 125

            #6
            I didn't use -g since I only would like to see the de novo assembled transfrags.

            Comment

            • shunyip
              Member
              • Oct 2013
              • 20

              #7
              Then, it would seem that an easy way for you is to use bedtools.

              You can convert a gtf file to bed file using:
              Code:
              cut -f 1,4,5,9 yourfile.gtf > yourfile.bed
              This extracts the 1st, 4th, 5th and 9th columns from the gtf files and write them to a new file.

              Then, you can use bedtools intersect to overlap the two files.
              It seems that the -loj and -wao arguments suit your case well. You can take a look.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                Yesterday, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              44 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...