Best way to build a modified transcriptome gtf model?

travelk

Member

Join Date: Jul 2013

Posts: 20
- Share
- Tweet
#1

Best way to build a modified transcriptome gtf model?

02-25-2015, 03:23 AM

Hi Everyone,

When I align my reads with tophat against the GRCm38.fa and then use various programs (including Cufflinks) to match them to known features using the Ensembl gtf file, many of my reads (~50%) are unassigned to known features. Therefore, someone suggested I use the transcripts.gtf file produced by Cufflinks to build a new/modified annotation gtf which finds groups of reads that are aligned to the genome but in an area that is not associated with a known transcript.

Therefore, I've been trying to build a new/modified annotation gtf file for Mus musculus specific for my cells of interests. I've tried to do this a few different ways using Cufflinks.

1. I've pooled all my reads and used the transcripts.gtf file produced by Cufflinks using the -g option
Pros: I boost the rate of alignment to features to over 80%
Cons: these are huge data files that take forever to process, don't contain all the original Ensembl.gtf transcripts

2. I've used cuffmerge on all the individual transcripts.gtf files produced by my individual samples
Pros: I boost the rate of alignment to features to over 60%
Cons: cuffmerge strips a lot of the data contained in the initial Ensembl.gtf file as well as only contains transcripts expressed by my cells.

3. I've run cufflinks with no -g option and then used cuffmerge to try and link the Ensembl.gtf file with the transcripts.gtf file
Pros: I boost the rate of alignment to features to over 60%
Cons: Because the output files are so different from the original input, it's hard to compare but there are 400 000 more lines in my original Ensembl.gtf than in the cuffmerged files. Where did all that information go?

My two biggest concerns is that one, my new gtf files don't contain all the transcripts in the original Ensembl file, and two Cufflinks removes features and relabels a lot of the attributes like gene_id.

Is there a better way to do this? Should I just concatenate the files together? I'm not sure what the best approach is.

Thanks all for your help.
Tags: None

Previous template Next

Proteomic Platforms: How to Choose the Right Analytical Strategy to Improve Detection and Clinical Applications

by SEQadmin2

Proteomics platforms are evolving rapidly, with advances in mass spectrometry and affinity-based approaches expanding what researchers can detect and at what scale. As the field moves toward deeper proteome coverage and clinical applications, scientists face an increasingly complex landscape of tools. This article will explore how researchers are navigating these choices to find the right platform for their work.

The systematic characterization of the human proteome has...
- Channel: Articles
07-20-2026, 11:48 AM
Advanced Sequencing Platforms Tackle Neuroscience’s Toughest Genomics Problems

by SEQadmin2

Genomics studies in neuroscience face a special challenge due to the brain’s complexity and scarcity of samples. Mapping changes in cell type and state using conventional next-generation sequencing methods remains challenging. Advances in technologies like single-cell sequencing, spatial transcriptomics, and long-read sequencing have opened the door to deeper studies of the brain and diseases like Alzheimer’s, amyotrophic lateral sclerosis (ALS), and schizophrenia.
...
- Channel: Articles
07-09-2026, 11:10 AM
Cancer Drug Resistance: The Lingering Barrier to Rising Survival

by SEQadmin2

Cancer survival rates have significantly increased in the last few decades in the United States, reaching a combined 70% 5-year survival rate by 2021. Behind this number, there are years of research to find new therapies, drug targets, and early detection methods. But there is one core challenge that keeps slowing down these advances, and it’s about drug resistance.

There is no single reason why many patients don’t respond to treatment as expected. Cancer is...
- Channel: Articles
07-08-2026, 05:17 AM

Topics	Statistics	Last Post
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 44 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM

Unconfigured Ad

Best way to build a modified transcriptome gtf model?

Latest Articles

ad_right_rmr

News