Unconfigured Ad

**dpryan** · 11-14-2013, 01:33 AM

Cufflinks outputs a GTF assembly that annotates each of the loci that it calls. If you want a multi-fasta file of that, just use gtf_to_fasta (that probably comes with tophat, but if not you can google around for it).

**sphil** · 11-14-2013, 02:27 AM

Originally posted by dpryan View Post

Cufflinks outputs a GTF assembly that annotates each of the loci that it calls. If you want a multi-fasta file of that, just use gtf_to_fasta (that probably comes with tophat, but if not you can google around for it).

be aware of where cufflinks gets the sequences from. maybe it uses the provided fasta and just extracts the sequecnes according to the coordinates given by the gtf. Which means, you end up with your 'non-variety' grape sequences.

**lzu** · 11-14-2013, 02:50 AM

Then I will end up getting a fasta file of transcript sequence extracted from the reference genome which is not what I wanted. I want the transcript sequences from my sample (a grape 'variety').

**lzu** · 11-14-2013, 02:52 AM

You are right, you've got my point. I still don't know how to extract sequences from the grape 'variety' RNA-seq data. Maybe it is hard, or should I assemble RNA-seq de novo by using Trinity?

**sphil** · 11-14-2013, 03:26 AM

Originally posted by lzu View Post

You are right, you've got my point. I still don't know how to extract sequences from the grape 'variety' RNA-seq data. Maybe it is hard, or should I assemble RNA-seq de novo by using Trinity?

Yep, maybe that's the better way of doing it. Assemble the transcripts denovo an map those transcript to the reference genome.

**lzu** · 11-14-2013, 03:29 AM

Originally posted by sphil View Post

Yep, maybe that's the better way of doing it. Assemble the transcripts denovo an map those transcript to the reference genome.

Do you know any paper(s) that "first denovo assemble RNA-seq, then map to ref genome"?

**sphil** · 11-14-2013, 03:49 AM

sorry, can't find one from the top of my head but the 'normal' mapping procedure after de novo assembly of transcripts should do the job pretty well. Just account for your diversity of strains when you choose the mapping parameters. Use loose mapping criteria after your assembly and it should be fine. If, however, this doesn't give you the desired results, what I normally do is to BLAST the transcripts against an in-house database. This is even looser than what most of the mappers allow . Also, if the transcripts are becoming too long this should be the way to go.

Hope that helps:

FWIW: see below some papers for assembly and mapping which might be helpful anyways.

There you go:
Garber et al.
Trinity used to assembly transcripts
Oases assembler

**lzu** · 11-14-2013, 05:25 PM

Originally posted by sphil View Post

sorry, can't find one from the top of my head but the 'normal' mapping procedure after de novo assembly of transcripts should do the job pretty well. Just account for your diversity of strains when you choose the mapping parameters. Use loose mapping criteria after your assembly and it should be fine. If, however, this doesn't give you the desired results, what I normally do is to BLAST the transcripts against an in-house database. This is even looser than what most of the mappers allow . Also, if the transcripts are becoming too long this should be the way to go.

Hope that helps:

FWIW: see below some papers for assembly and mapping which might be helpful anyways.

There you go:
Garber et al.
Trinity used to assembly transcripts
Oases assembler

----
Thanks for the suggestion. I read some papers which use model reference genome to predict alternative splicing diversity of subspecies or species 'variety' with RNA-seq data. There might be errors in results if some exons or introns are truly physically lost in those subspecies/variety genome due to the genetic diversity among different groups/populations...

**Jeremy** · 11-14-2013, 10:48 PM

It would probably not be too difficult to get a list of variants between your sample and the reference, then convert the reference genome to the variant bases and then use gtf to fasta to get the variant transcripts. I have done something similar in R using the seqinr package.

**sindrle** · 02-24-2014, 01:56 AM

Lets say you have called indels and SNPs with GATK. Would that work, or can you please share some more details?

I have never done this before.

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

How to extract assembled transcript sequence from RNA-seq data instead of ref genome?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News