Seqanswers Leaderboard Ad

**sdarko** · 01-18-2010, 07:37 AM

Never mind, I figured it out.

**lmf_bill** · 01-20-2010, 12:49 AM

I think you do it in the right way. Presently, there is no way for conveniently output the novel junction.

**jiwu2573** · 01-24-2010, 09:17 PM

Hi,

From what you said, it sounds like to use the junctions output from tophat to compare with a ref gff3 file, those that do not appear in gff3 are potential novel splice forms.

Can you explain in more details:
1. what's the gff3 file and how can I get one for human genome?
2. Do I need to convert junctions.bed to gff3?
3. Which command from TopHat or which software can do the comparison between my sample junctions and the ref gff3 file?

Thanks a lot

**lmf_bill** · 01-24-2010, 10:14 PM

Originally posted by jiwu2573 View Post

Hi,

From what you said, it sounds like to use the junctions output from tophat to compare with a ref gff3 file, those that do not appear in gff3 are potential novel splice forms.

Can you explain in more details:
1. what's the gff3 file and how can I get one for human genome?
2. Do I need to convert junctions.bed to gff3?
3. Which command from TopHat or which software can do the comparison between my sample junctions and the ref gff3 file?

Thanks a lot

RE:
1. for more information of gff3, you can search by google. It is complex to get gff3 for human genome. First, you get gtf from ensembl, then convert them to gff3 by some perl script (gtf2gff3.pl)
2. no
3. write program by yourself

you are welcome.

**jiwu2573** · 01-25-2010, 12:37 PM

Originally posted by sdarko View Post

Okay, so I've finally got my bearings with TopHat (I think).

Then I was finally able to work around the issues that I was having with getting a good gff3 file (thanks to the forum members who helped).

Will you kindly share the good gff3 file for human genome?
How big is the file?

After I get tophat running with this gff3 file, I can write a program to compare outputs with and without the --no-novel-juncs parameter and share with you.

In addition, have you ever thought about differential splice events between groups (2 conditions)? Maybe the program can count the percentage of a particular novel splice form and then get some statistics between 2 groups? Any other way you can think of? Let me know so I may implant this function in the program too.

Looking forward to your reply!

**jiwu2573** · 01-25-2010, 12:44 PM

Originally posted by sdarko View Post

I generated a list of junctions with and without the --no-novel-juncs parameter so I know that I have about a thousand novel junctions by comparing the two output junction files.

May I just confirm with you:
First, use command: tophat -G/<GFF3 file> --no-novel-juncs
Second, for the same dataset,use command: tophat -G/<GFF3 file>
Finally, compare the 2 files of junctions.bed, pick up the differences

By the way, how are you going to deal with those 1000 novel junctions?

Thanks!

**sdarko** · 01-26-2010, 07:37 AM

Originally posted by jiwu2573 View Post

May I just confirm with you:
First, use command: tophat -G/<GFF3 file> --no-novel-juncs
Second, for the same dataset,use command: tophat -G/<GFF3 file>
Finally, compare the 2 files of junctions.bed, pick up the differences

By the way, how are you going to deal with those 1000 novel junctions?

Thanks!

Essentially, yes, I was running with those attributes (plus a couple of other changes, fewer than the default multi-matches etc).

I was hoping to try to confirm some of those novel junctions by PCR. This is for my thesis project for my Masters degree in bioinformatics.

I think that I may take a slightly different approach now for novel junctions. I just built a bowtie index for mRNA, ESTs, and refmRNA from UCSC and I'm going to align my RNA-Seq tags to those with bowtie. Then I'm going to take what *doesn't* align to those and use those to search for novel junctions.

My thinking is that by aligning to mRNA, ESTs, and refmRNA (essentially known splice junctions) and then taking what doesn't align and running that with tophat, then I'll be enriching for novel splice junctions in the unaligned file.

**sdarko** · 01-26-2010, 07:39 AM

Also, maybe the next version of cufflinks (same developer who made bowtie and tophat) will be able to do what we want it to do.

See this post: http://seqanswers.com/forums/showthread.php?t=3754

**sdarko** · 01-26-2010, 07:55 AM

Originally posted by sdarko View Post

Also, maybe the next version of cufflinks (same developer who made bowtie and tophat) will be able to do what we want it to do.

See this post: http://seqanswers.com/forums/showthread.php?t=3754

*EDIT* Whoops, that seems to be a reply in a thread you started. So, I'm sure you've seen it

**Howie Goodell** · 03-02-2010, 03:32 PM

Hi --

In the post that started this thread, Sam says,
> I was able to shoehorn my Helicos RNA-Seq data so that it can be used with TopHat (thank you for the help Cole)
I'd like to know a bit more about how you solved this, because I'm trying to do something similar. Helicos data has several different characteristics. I'm specifically concerned about its 5% error rate: over half deletions from missing the light output of the un-amplified single DNA molecule, and many of the rest insertions (presumably stray light from nearby molecules or electrical noise; since electro-optical sensitivity will be maxed out for the same reason). Helicos claims its alignment algorithms are designed to handle these, but Bowtie isn't; since it doesn't handle indels. Did you do something really impressive, like hack Tophat to call the Helicos alignment algorithms instead of Bowtie? Or did you just make the formats compatible as you said in your very first SeqAnswers post
and decide to put up with the errors? I'd much appreciate any suggestions, whether they are file formats, parameter choices, or black-belt Python programing gems ;-)

Thanks much!
Howie

**sdarko** · 03-02-2010, 04:49 PM

Originally posted by Howie Goodell View Post

Did you do something really impressive, like hack Tophat to call the Helicos alignment algorithms instead of Bowtie? Or did you just make the formats compatible as you said in your very first SeqAnswers post
and decide to put up with the errors? I'd much appreciate any suggestions, whether they are file formats, parameter choices, or black-belt Python programing gems ;-)

Hey there Howie. This is going to really disappoint you, but what I ended up doing probably isn't too impressive.

My first problem was that tophat requires reads of identical length (which was not documented when I started this project). As you know, Helicos reads are of variable length. So, after converting the sms file to a FASTA file, I just wrote a quick program that trims the reads down to a certain length. I know I'm losing information, but have decided to live with it. So far I've been trimming them to 25bp and splitting reads of 50+bp into two.

As far as the indel situation goes, I just decided to be very conservative in my parameters using tophat. Using the 25bp reads, I require the anchor lengths to be 10bp with zero mismatches. I also don't allow any segment mismatches. In addition, multihits are set to zero. When I get my BED files, I also ignore any reported splice junctions that have less than 3 reads aligning to that particular junction. I figure that indels may be happening, but reported splice junctions reported using those criteria are probably not coincidence.

So, far I've had good luck. I probably (okay, certainly) don't have as many aligned reads as people who use Illumina data, but I'm okay with that. In my tests using the sample data from Helicos, I've found some very good evidence for novel spliceforms and novel transcripts.

I hope that helps and if you have any questions or need any further help, please feel free to PM me at any time.

Sam

**andrewj** · 03-04-2010, 10:56 AM

Sdarko, I'm new to the next generation sequencing community and also use the Helicos platform. Question: is there a reason why you're using Tophat instead of the Helicos pipelines?

**sdarko** · 03-04-2010, 02:59 PM

Originally posted by andrewj View Post

Sdarko, I'm new to the next generation sequencing community and also use the Helicos platform. Question: is there a reason why you're using Tophat instead of the Helicos pipelines?

As far as I know, the Helicos software (Helisphere --> http://open.helicosbio.com/mwiki/index.php/Main_Page) won't align RNA-Seq reads across exon-intron junctions. And I'm looking for novel transcripts and alternative splice junctions in known genes.

For transcript quantification, yes I use Helisphere and align reads to mRNA. For novel splice junctions, I use TopHat and align to the whole genome.

**carmeyeii** · 04-10-2013, 03:51 PM

Hi!

This is a somewhat old thread, but I would like to know more about the biological constraints TopHat uses to call a splice junction... and if there is anyway to override this...

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tophat: report only novel splice junctions?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News

Seqanswers Leaderboard Ad

Announcement

Tophat: report *only* novel splice junctions?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News

Tophat: report only novel splice junctions?