Seqanswers Leaderboard Ad

**dietmar13** · 02-11-2014, 02:43 AM

circular RNA

dear segemehl programmer,

which conditions are the best for finding back-spliced (circular) transcripts from 50 PE illumina reads.

i would run with following parameters:

Code:

./segemehl.x -t 20 -T -Y -S -i $index -d $fa -q $fq/15607.1.fq.gz -p $fq/15607.2.fq.gz | gzip > $out/CoCa_15607.sam.gz

should i change MEDAH?

how can i use Haarz to extract especially back-spliced reads?

dietmar

PS: the documentation is for version 0.1.3 - is there a newer one?

**luitpold** · 02-11-2014, 04:57 AM

Hi Dietmar,

build the source

>make
>make testrealign.x

to do the mapping ()
>segemehl.x -q file.fq -d hg19.fa -i hg19.idx -S -s -o file.out

option -S turns on the splice feature. This includes all non-standard splicing events. The option -s shuts up the progress bar.

to call the junctions:
>testrealign.x -d hg19.fa -q file.out -n

option -n is necessary to stop the program from realigning reads - takes much longer.

Hope that helps!

**dietmar13** · 02-11-2014, 06:27 AM

@luitpold

dear luitpold,

thank you!

but i always get this error:

Code:

testrealign.x: libs/memory.c:18: bl_realloc: Assertion `ptr != ((void *)0)' failed.
./testrealign_CoCa_CoNo.sh: line 11:  5078 Aborted                 (core dumped) ./testrealign.x -d $fa -q $out/CoCa_15607.sam -n -U $out/15607_splitfile.bed -T $out/15607_transsplit.bed

any hint what could be wrong? too large SAM-file: 42 GByte? i have 96 Gbyte RAM.

dietmar

**luitpold** · 02-11-2014, 06:51 AM

Hi Dietmar,

seems to be an "out of memory" issue. You might want test it on a smaller SAM file … otherwise contact the developers directly …

**luitpold** · 02-11-2014, 07:20 AM

Dietmar,

one more thought … is your SAM file sorted?

**dietmar13** · 02-12-2014, 10:51 AM

thank you,

sorting solved the problem.

dietmar

**mamonster** · 04-07-2014, 02:14 AM

Dear segemehl development team,

Using segemehl on Memczak 2013 Nature data sets, I managed to get tens of thousands circular RNA splice junctions. However when I compare them to the published data of Memczak, I found that 61 out of the 250~ circular RNAs in hek 293 cell line were not in the result I got from segemehl, which is different from what is declared in your manuscript. Do you think adding the trimming options (-Y -T) would make it different?

Also, I found it difficult to use the testrealign.x looking for junction sites on large sam files. Trying the -B option to split the result into different chromosomes, but still not working, the result bed files are empty.

Thank you

**ecSeq Bioinformatics** · 04-27-2014, 05:50 AM

If you are interested in how to use segemehl to detect fusion transcripts and/or circularized RNAs, I can recommend you the following hands-on course:
Discovering standard and non-standard RNA transcripts - How to detect canonical splicing, circular RNAs, trans-splicing, and fusion transcripts

Developers of the algorithm will explain you step-by-step how you can use segemehl to detect standard and non-standard transcripts.

**ntn12** · 05-17-2014, 10:29 PM

Is out there any article, paper, study where segemhl has been used for finding fusion genes (e.g. show a fusion gene found by segemhl)? Has segemhl been compared with other gene fusion finders? On average how many fusion genes are reported per sample? What is the wet-lab validation rate of the fusions found by segemhl?

For my case reporting hundreds/thousands of candidate fusion genes per sample is totally useless because according to the medical/biological literature the fusion genes are very rare events (i.e. in 98% of the all patient samples are zero fusions per sample) and in case that the indeed are found then there are not more than very few in one sample, maybe a maximum of 25 per sample is the absolute maximum and an average would be around 1 or 3 per sample. Please notice, that fusion genes are not SNPs/indels/alternative-splicing-events. Here the scientific "null" hypothesis is that there are on average between 0-5 fusion genes per sample! This hypothesis can be rejected using only wet-lab data and NOT in silico data! If a tool reports over 100 candidate fusion genes per sample it means that that tool already has a ~95% false positive rate!

I would like to use it for finding pathogenetic/somatic fusion genes and I looked/searched very hard and I was not able to find anything which suggest that segemhl has ever been used for finding pathogenetic/somatic fusion genes.

**Paul Newport** · 05-18-2014, 01:05 AM

Aren't most of these questions answered when reading the segemehl publication? They compared their tool with 7 other state-of-the-art tools and validated their results based on available RNA seq datasets.

As far as I can judge the situation, the group that developed segemehl is a pure bioinformatics group and thus they did not perform any wet-lab validation, but implemented a tool that does what it should (compared to other algorithms). And since it was published only some month ago, I think we have to wait until we find any article where segemehl was used to find fusion genes.

I'm curious about these future publications, since the examples shown in the paper are quite impressive. But the future will show if segemehl is really that good.

**ntn12** · 05-18-2014, 03:30 AM

Originally posted by Paul Newport View Post

Aren't most of these questions answered when reading the segemehl publication? They compared their tool with 7 other state-of-the-art tools and validated their results based on available RNA seq datasets.
...
.

Could you point to the publication where SEGEMEHL is used for finding fusion genes?

If you mean this:

http://bioinformatics.oxfordjournals.org/content/early/2014/03/13/bioinformatics.btu146.short

then there SEGEMEHL is compared to STAR, BOWTIE2, BWA-MEM, BLAT, etc. and not even one of these is a gene fusion finder! The word fusion is not mentioned even once in the entire article (except in the references). Fusion gene finders are for example: SOAPfuse, deFuse, FusionHunter, etc. How does SEGEMEHL compare to these? Here is a nice comparisons for fusion genes finders: http://code.google.com/p/fusioncatcher/wiki/comparison

Did I miss something here?

I mean by fusion genes this:

doi:10.1530/ERC-13-0390

http://erc.endocrinology-journals.org/content/21/3/R143.full.pdf

P.S. Read splitter is not the same as finding fusion genes!

**ecSeq Bioinformatics** · 05-18-2014, 04:58 AM

Dear ntn12,

thanks for your comments and questions.

segemehl itself is not a fusion-gene-finder. It is a mapping tool that can detect split-reads and its resulting set of these split-reads can be used to call fusion genes. But it has to be done in a separate downstream analysis and is not included in the segemehl algorithm. I hope that makes things clearer.

**Paul Newport** · 05-18-2014, 04:48 PM

Originally posted by ntn12 View Post

Here is a nice comparisons for fusion genes finders: http://code.google.com/p/fusioncatcher/wiki/comparison

Sorry, but I don't understand the list shown on the linked page.

My questions would be:

Where do these 40 fusion genes come from?
Why does only FusionCatcher find all of these?
Why is this list on the FusionCatcher website?

That looks a bit suspicious to me!

**Paul Newport** · 05-18-2014, 05:00 PM

Originally posted by Paul Newport View Post

Where do these 40 fusion genes come from?

I just did some research and found on the FusionCatcher website:

FusionCatcher has been used originally for finding novel and known fusion genes in breast tumor cell lines BT474, SKBR3, MCF7, KPL4 as shown in the following articles:

S. Kangaspeska, S. Hultsch, H. Edgren, D. Nicorici, A. Murumägi, O.P. Kallioniemi, Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms, PLOS One 2012. http://dx.plos.org/10.1371/journal.pone.0048745
H. Edgren, A. Murumagi, S. Kangaspeska, D. Nicorici, V. Hongisto, K. Kleivi, I.H. Rye, S. Nyberg, M. Wolf, A.L. Borresen-Dale, O.P. Kallioniemi, Identification of fusion genes in breast cancer by paired-end RNA-sequencing, Genome Biology 2011, Vol. 12. http://genomebiology.com/2011/12/1/R6

These are the same two publications shown on the "comparison" page. So the 40 genes were predicted using FusionCatcher? Honestly?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

A multi-split mapping algorithm for circular RNA, splicing, trans-splicing, and ...

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News