Seqanswers Leaderboard Ad

**adameur** · 04-29-2010, 11:07 AM

Hi Uwe,

You could try SplitSeek, a method we originally developed for junction mapping in RNA-seq data (http://genomebiology.com/2010/11/3/R34). I think it should work also in your case, at least in theory...

Adam

**Uwe Appelt** · 04-29-2010, 09:41 PM

Hi Amadeur,

thanks for your reply. According to figure 1 (and result paragraph 1), SplitSeek splits every read into two related subreads and also leaves a gap between these two. I'm afraid, this will not work for color-space data, because of the recursive nature of color-space annotations. At least the gap in between the subreads would break the recursion, wouldn't it?
Do you think that SplitSeek can (beside the color-space limitation) detect juntions that arose from recombinations, insertions or whatsoever? I perceive SplitSeek as a highly specialized Exon-boundary finder - do you see a way to set up SplitSeek to seek for insertions of certain "contigs/exons" in a huge genome (human, mouse)?

Uwe

**adameur** · 04-29-2010, 11:50 PM

Hi Uwe,

In fact its quite the opposite. The split read mapping is done using the AB WT pipeline, which was developed specifically for SOLiD. So it works for color space while normal base space is more problematic.

We have seen that SplitSeek can find small insertions (one or a few bp) and deletions (of varying lengths). I can't see why it shouldn't be possible to also find other types of rearrangements like inversions, translocations and so on... But you'll need sufficient coverage. Also, repeat regions where the split reads can't be mapped uniquely might be a problem.

You'll find the code at the SOLiD tools webpage if you want to give it a try (http://solidsoftwaretools.com/gf/project/splitseek/). We have tested it on human and mouse so I don't think the genome size will be a problem.

If you'd like to run this on genomic reads (and not RNA-seq data), I suggest that you first remove the reads that were aligned full-length to the genome with some other mapper (like corona lite). In that way you'll reduce the number of reads in the input file and speed up the program.

Adam

**fennan** · 05-27-2010, 08:09 AM

@adameur,

I am new to SOLID data and I am thinking of using your SplitSeek program since I think it is the one that fits my necessities the best. It would be very nice if you can help me with these questions:

1) Is it mandatory to use the "split_read_mapper" from the SOLiD WT Analysis Pipeline before using your program? Could I use other mappers (e.g. BFAST)?

2) I got the data in a SRF file... How could I obtain the csfasta from it? I'm thinking of using the staden program srf2fastq and later obtain the csfasta from the fastq... Is it the proper way to do it?

Thanks in advance for any help

**adameur** · 05-28-2010, 12:44 AM

Hi fennan,

Some quick answers:

1) Currently the "split_read_mapper" is the only aligner that is directly supported. You could try using some other mapping tool but then you'll have to make some processing of the output files. But it's important to note that the aligner should perform an independent mapping of sub parts of reads, as is the case for the "split_read_mapper".

2) You could try the SRF_Reader in the solid2srf package (http://solidsoftwaretools.com/gf/project/srf/).

Hope this helps!

Adam

**fennan** · 05-28-2010, 07:07 AM

Hi Adam,

It does help. Thank you for your quick response! I'll be using your program and I will report my experience here.

By the way, In case it is useful for someone, I found some problems compiling the source code from SRF_Reader (as is sadly not very uncommon). It seems that it is designed for 32 bits machines (mine is 64 bits). I had to manually include some header files (mainly cstdlib.h and string.h) and it worked. After that I found the package for 64 bits (http://yum.biopackages.net/biopackag...os4.x86_64.rpm). It worked fine for me too...

**fennan** · 06-14-2010, 05:33 AM

Hi Adam,

I've been trying to use SplitSeek but I am having a lot of problems with the "split_read_mapper" program. I have reported my problems to SOLiD support but they are not being very helpful so far. I saw that I am dealing with the same RNA-seq that you used in you paper (GSE14605), so I thought you could give some hints so I can apply your SplitSeek program.

The thing is that five days ago I launched "split_read_mapper" for one csfasta file (~600MB) and the mapper.log file says that the program is still "Waiting for mapping jobs to finish...". Three days ago I also launched "split_read_mapper" for a small csfasta file (5000 reads) with the mm9 whole genome as the reference but it is also at the same point ("Waiting for mapping jobs to finish..."). My questions are:

1) Is this normal? How long did it take for you?

2) What queue system did you use? I am using SGE (Sun Grid Engine) but I am not sure if it might not be properly supported by this program... Any idea about where the problem could be?

Thanks

**adameur** · 06-15-2010, 01:00 AM

Hi fennan,

The mapping jobs should only take a few hours so I think something went wrong. My guess is that it might be a memory issue.. Can you try again with increased memory? I'm using the PBS system.

Adam

**cczhong** · 06-28-2010, 01:15 PM

Hi Amadeur,

I have the similar question as Uwe. As I know, the color space reads are dependent of its first nucleotide (perhaps the primer), and the rest of nucleotides are resolved recursively.
i.e. T1021301230123123012301

As a result, there would be a serious problem when there is an error occur in the middle. I know the authors of BFAST have developed a specific alignment algorithm to deal with this problem.

I wonder how do you split the reads while avoiding this problem. Are you first translating the numbers to nucleotide first and do the split? Or did you use some smart idea to handle this?

Bests.
-Cuncong

Originally posted by winfried View Post

Hi Amadeur,

thanks for your reply. According to figure 1 (and result paragraph 1), SplitSeek splits every read into two related subreads and also leaves a gap between these two. I'm afraid, this will not work for color-space data, because of the recursive nature of color-space annotations. At least the gap in between the subreads would break the recursion, wouldn't it?
Do you think that SplitSeek can (beside the color-space limitation) detect juntions that arose from recombinations, insertions or whatsoever? I perceive SplitSeek as a highly specialized Exon-boundary finder - do you see a way to set up SplitSeek to seek for insertions of certain "contigs/exons" in a huge genome (human, mouse)?

Uwe

**Uwe Appelt** · 06-29-2010, 01:00 PM

Originally posted by cczhong View Post

Hi Amadeur,

I have the similar question as Uwe. As I know, the color space reads are dependent of its first nucleotide (perhaps the primer), and the rest of nucleotides are resolved recursively.
i.e. T1021301230123123012301

As a result, there would be a serious problem when there is an error occur in the middle. I know the authors of BFAST have developed a specific alignment algorithm to deal with this problem.

I wonder how do you split the reads while avoiding this problem. Are you first translating the numbers to nucleotide first and do the split? Or did you use some smart idea to handle this?

Bests.
-Cuncong

Hi cczhong,

the smart idea practically any color-space aligner is built on, is not to translate the color-space reads into base-space in order to do the mapping/alignment, but to translate the reference genome into color-space, instead. This way there are no recursions to resolve, because color-space aligners always know, how to retranslate aligned portions of the reference genome back into base-space. Using for example Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) you can build color-space indices of reference genomes and then do the mapping even with truncating reads both 5' and 3' (see the options '--trim5' and '--trim3' at the manual page) prior to mapping them. However, if you need to search for recombinations, translocations or any other kind of similar events, SplitSeek can be your best friend.

For the sake of completeness, it should be mentioned that it is of course possible to translate color-space reads into base-space first and then do the whole alignment process in base-space. It is, however, a pretty lossy process that is, you will end up with many reads that simply don't align anywhere. This is mainly because of the 'blind' interpetation (blind in terms of not comparing to any reference that provides hints as to where sequencing errors or SNPs are located at) of the recursion. I once used Bowtie and Blat (the latter being a pure base-space aligner) to quantify the loss of alignable reads. Of all reads that could be successfully mapped by Bowtie (in color-base) about one third coundn't be aligned using Blat (in base-space) after translation into base-space. Certainly, this will vary from sample to sample after all. But it at least led to the assumption that not sticking with color-space aligners is usally the last choise.

Best Uwe

**mmartin** · 09-17-2010, 01:11 AM

Has anyone succeeded in using SplitSeek without the SOLiD WT Pipeline?

We're trying to analyze some SOLiD transcriptome data, but we want to use an aligner that knows SAM/BAM format since we're more familiar with that.

**zee** · 09-17-2010, 01:48 AM

Try NovoalignCS (www.novocraft.com) as it supports direct SAM output. It may also help in aligning the full length colorspace reads. Subtracting these aligned reads yields unmapped reads that could be passed to the splitseq mapper.
It takes about 5 minutes to build a full colorspace index for human, mouse, etc using novoindex and you need a minimum of 8-9Gb or RAM per server. Multithreading, polyclonal filter, CSFAST/CSQUAL and MPI are supported.

What percentage of reads in the run are expected to contain junctions?

Originally posted by mmartin View Post

Has anyone succeeded in using SplitSeek without the SOLiD WT Pipeline?

We're trying to analyze some SOLiD transcriptome data, but we want to use an aligner that knows SAM/BAM format since we're more familiar with that.

**mmartin** · 09-17-2010, 02:10 AM

Hi, thanks for the reply, NovoalignCS looks quite good. It's not an option for me, however, as I prefer Open Source tools such as BWA, Bowtie and BFAST, which also support color space input and SAM output.

This isn't exactly what what I meant, though. I want to convert SAM/BAM output of any aligner to the input required by SplitSeek (which seems to be a file in BEDPE format). I guess it isn't difficult to write a script for that, but there may be some pitfalls I don't know about, yet.

**adameur** · 09-17-2010, 03:32 AM

Hi mmartin,

The AB WT pipeline performs a split read mapping where the two ends of the read are independently aligned to the reference. This type of split read alignment is essential if you want to run SplitSeek, otherwise you risk missing a lot of junctions. As far as I know there are currently no good alternatives to the WT pipeline for running SplitSeek.

About converting the SAM/BAM to BEDPE. I suppose this could be done quite easily. For me, the main concern is whether or not most split read alignments are included in the SAM/BAM alignment results. And that will depend on which mapping algorithm was used.

By the way, does anyone know how a split read alignment is represented in SAM?

/Adam

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

junction mapping in color space

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News