Unconfigured Ad

**ntn12** · 05-18-2014, 07:30 PM

Originally posted by ecSeq Bioinformatics View Post

Dear ntn12,

thanks for your comments and questions.

segemehl itself is not a fusion-finder. It is a mapping tool that can detect split-reads and its resulting set of these split-reads can be used to call fusion genes. But it has to be done in a separate downstream analysis and is not included in the segemehl algorithm. I hope that makes things clearer.

Ok. I understand now that SEGEMEHL is not a fusion genes finder and it has never been used for this. It has the same potential to be used for fusion finder as BLAT/BOWTIE/BWA for example.

I got confused because the authors of SEGEMEHL claim in the title of their paper:

Hoffmann et al. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and FUSION DETECTION, Genome Biol. 2014.

Checking your browser - reCAPTCHA

http://www.ncbi.nlm.nih.gov/pubmed/24512684

that SEGEMEHL does FUSION DETECTION when actually it does not.

**ntn12** · 05-18-2014, 07:55 PM

Originally posted by Paul Newport View Post

Sorry, but I don't understand the list shown on the linked page.

My questions would be:

Where do these 40 fusion genes come from?
Why does only FusionCatcher find all of these?
Why is this list on the FusionCatcher website?

I do not know. We have not used yet FusionCatcher. We have been testing TopHat-fusion, FusionMap, ChimeraScan, and FusionFinder. We found puzzling that all these four give thousands of candidate fusion genes per sample (some even hundred of thousands) when we know from the medical literature that there should not be more than 1-3 fusion genes per sample!!! Therefore one has here 99% false positives.

UPDATE: We started testing SOAPfuse and we start to like it!

**ecSeq Bioinformatics** · 05-18-2014, 11:25 PM

Originally posted by ntn12 View Post

I got confused because the authors of SEGEMEHL claim in the title of their paper:

Hoffmann et al. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and FUSION DETECTION, Genome Biol. 2014.

Checking your browser - reCAPTCHA

http://www.ncbi.nlm.nih.gov/pubmed/24512684

that SEGEMEHL does FUSION DETECTION when actually it does not.

Dear ntn12,

please step gently here. The title of the paper is very clear and all claims are met. Before reading something into the title, you should actually read the paper. Everything is written in very clear manner and all claims are confirmed by public available data.

Nevertheless, I do not understand your frustrations here. Perhaps you should directly contact the developers of the algorithm and seek a dialogue.

**ntn12** · 05-19-2014, 05:46 AM

Originally posted by ecSeq Bioinformatics View Post

Dear ntn12,

please step gently here. The title of the paper is very clear and all claims are met. Before reading something into the title, you should actually read the paper. Everything is written in very clear manner and all claims are confirmed by public available data.

I am even confused about SEGEMEHL after reading the paper.

The authors of this paper:

Hoffmann et al. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and FUSION DETECTION, Genome Biol. 2014. http://www.ncbi.nlm.nih.gov/pubmed/24512684

clearly state in the title and other three places thru out their article that:

"Here, we present a unified unbiased algorithm to detect splicing, trans-splicing and gene fusion events from single-end read data..."

"The algorithmic strategy to identify splicing, trans-splicing or gene fusion sites is based on a greedy, score-based seed chaining followed by a Smith-Waterman-like transition alignment."

"Implemented in the segemehl mapping tool, it readily identifies conventional splice junctions, collinear and non-collinear fusion transcripts, and trans-spliced RNAs, without the need for separate post-processing or an extensive computational overhead."

Also I did not find in the same article not even one fusion gene or fusion transcript found by SEGEMEHL. According to the last statement SEGEMEHL should identify readily fusion transcripts without the need for separate post-processing.

We will use SOAPfuse for finding fusion genes because it performed really well in our tests.

**ecSeq Bioinformatics** · 05-19-2014, 06:27 AM

Dear ntn12,

I herewith take notice of your assumption that the segemehl developers wrote some statements which are confusing for you, so you will use SOAPfuse.

**ntn12** · 05-19-2014, 06:47 AM

Originally posted by ecSeq Bioinformatics View Post

Dear ntn12,

I herewith take notice of your assumption that the segemehl developers wrote some statements which are confusing for you, so you will use SOAPfuse.

That is not an assumption. It is a fact.
Indeed the authors of "Hoffmann et al. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and FUSION DETECTION, Genome Biol. 2014. http://www.ncbi.nlm.nih.gov/pubmed/24512684"

clearly state in their article that:

"Implemented in the segemehl mapping tool, it readily identifies conventional splice junctions, collinear and non-collinear fusion transcripts, and trans-spliced RNAs, without the need for separate post-processing or an extensive computational overhead."

I did not write that. The authors wrote that! Anybody can check this! Please, check here:

Checking your browser - reCAPTCHA

http://www.ncbi.nlm.nih.gov/pubmed/24512684

Originally posted by ecSeq Bioinformatics View Post

I herewith take notice of your assumption that the segemehl developers wrote some statements which are confusing for you, so you will use SOAPfuse.

I am not the only one who got confused about SEGEMEHL. There are at least two others who are confused about SEGEMEHL and finding fusion genes here:

Just a moment...

https://www.biostars.org/p/45986/

**Paul Newport** · 05-19-2014, 07:34 AM

Originally posted by ntn12 View Post

I am not the only one who got confused about SEGEMEHL. There are at least two others who are confused about SEGEMEHL and finding fusion genes here:
https://www.biostars.org/p/45986/

Oh, please! Give me a break! Same statements, same time stamp! Too obvious, man!

**ntn12** · 05-19-2014, 07:40 AM

Originally posted by Paul Newport View Post

Oh, please! Give me a break! Same statements, same time stamp! Too obvious, man!

???

**ecSeq Bioinformatics** · 05-19-2014, 11:45 PM

As already mentioned before in this thread:

If any of you is interested in learning how to use segemehl to detect fusion transcripts and/or circularized RNAs, I can recommend you the following hands-on course:

Discovering standard and non-standard RNA transcripts - How to detect canonical splicing, circular RNAs, trans-splicing, and fusion transcripts

Developers of the algorithm will explain you step-by-step how you can use segemehl to detect standard and non-standard transcripts. They will assure that all of you understand the difference between 'fusion-junctions' and 'fusion-genes' and what exactly you can do with segemehl and all its downstream analysis tools like (lack or haarz). You will understand the implications of splicing or fusion events and the concept of split-reads, how to detect splice sites using split-read information and in the end be able to find circularized RNAs or fusion-stranscripts.

The cool thing with this course: You will not just use (and trust) a tool with pre-defined parameters (like SOAPfuse, etc.), but understand everything from scratch!

**NKAkers** · 09-11-2014, 11:16 AM

I'm interested in giving segemehl a shot, but so far it's taking prohibitively long to run. In my cluster-computer environment I reserved 60 nodes for 24 hours to run:

segemehl.x -q 8Gb_single_end.fastq -t 60 -d chromosome1.fa -i chr1.idx -S -s -o chr1.sam

took over 24hours without completing. There were no errors reported, it did create a sam file, however incomplete. Do you have any tips to make the software run more quickly?

**ecSeq Bioinformatics** · 10-29-2014, 01:17 AM

Originally posted by NKAkers View Post

I'm interested in giving segemehl a shot, but so far it's taking prohibitively long to run. In my cluster-computer environment I reserved 60 nodes for 24 hours to run:

segemehl.x -q 8Gb_single_end.fastq -t 60 -d chromosome1.fa -i chr1.idx -S -s -o chr1.sam

took over 24hours without completing. There were no errors reported, it did create a sam file, however incomplete. Do you have any tips to make the software run more quickly?

This extensively long runtime of segemehl is probably owed to the common mapping strategy of RNA aligners which first attempt to map reads contiguously (i.e. without split) and then use the unmapped ones for a more expensive split-read mapping strategy. By mapping your data only to one chromosome instead of the entire genome, most of your data cannot be mapped but are attempted to be split-mapped, resulting in this huge runtime.

Thus, we would recommend to use the entire genome as database, resulting in faster runtime and moreover more reliable hits since by default segemehl reports only the best ones.

**ninni** · 01-04-2016, 01:05 AM

Originally posted by ntn12 View Post

I do not know. We have not used yet FusionCatcher. We have been testing TopHat-fusion, FusionMap, ChimeraScan, and FusionFinder. We found puzzling that all these four give thousands of candidate fusion genes per sample (some even hundred of thousands) when we know from the medical literature that there should not be more than 1-3 fusion genes per sample!!! Therefore one has here 99% false positives.

UPDATE: We started testing SOAPfuse and we start to like it!

Hi!
Is it possible to use SOAPfuse with hg38? If so, how would I do this? I am a bit lost.

Thanks in advance!

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News