Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to build a modified transcriptome gtf model?

    Hi Everyone,

    When I align my reads with tophat against the GRCm38.fa and then use various programs (including Cufflinks) to match them to known features using the Ensembl gtf file, many of my reads (~50%) are unassigned to known features. Therefore, someone suggested I use the transcripts.gtf file produced by Cufflinks to build a new/modified annotation gtf which finds groups of reads that are aligned to the genome but in an area that is not associated with a known transcript.

    Therefore, I've been trying to build a new/modified annotation gtf file for Mus musculus specific for my cells of interests. I've tried to do this a few different ways using Cufflinks.

    1. I've pooled all my reads and used the transcripts.gtf file produced by Cufflinks using the -g option
    Pros: I boost the rate of alignment to features to over 80%
    Cons: these are huge data files that take forever to process, don't contain all the original Ensembl.gtf transcripts

    2. I've used cuffmerge on all the individual transcripts.gtf files produced by my individual samples
    Pros: I boost the rate of alignment to features to over 60%
    Cons: cuffmerge strips a lot of the data contained in the initial Ensembl.gtf file as well as only contains transcripts expressed by my cells.

    3. I've run cufflinks with no -g option and then used cuffmerge to try and link the Ensembl.gtf file with the transcripts.gtf file
    Pros: I boost the rate of alignment to features to over 60%
    Cons: Because the output files are so different from the original input, it's hard to compare but there are 400 000 more lines in my original Ensembl.gtf than in the cuffmerged files. Where did all that information go?


    My two biggest concerns is that one, my new gtf files don't contain all the transcripts in the original Ensembl file, and two Cufflinks removes features and relabels a lot of the attributes like gene_id.

    Is there a better way to do this? Should I just concatenate the files together? I'm not sure what the best approach is.

    Thanks all for your help.

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin


    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    [Article Coming Soon!]...
    Today, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
1 response
31 views
0 likes
Last Post EmiTom
by EmiTom
 
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
19 views
0 likes
Last Post seqadmin  
Working...
X