Unconfigured Ad

**colindaven** · 05-13-2011, 05:36 AM

Another reasonable choice might be hmmSplicer, at least for comparison. I've had what look to be reasonable results from it in the past. I take it you're working in sequence space, not colour space ?

**Hobbe** · 05-13-2011, 06:15 AM

Originally posted by colindaven View Post

Another reasonable choice might be hmmSplicer, at least for comparison. I've had what look to be reasonable results from it in the past. I take it you're working in sequence space, not colour space ?

Thanks for the reply. No, we are working in color space. Sequences converted to sequence space would too easily become wrong if there are any errors in the original colorspace reads. However, if you or anyone else have had good success with converting to sequence space I would love to hear about it. The general recommendation seems to be to map in colorspace.

**darked89** · 08-15-2011, 06:16 AM

Originally posted by Hobbe View Post

Hi all

We are having problems predicting splice sites from our Solid rna-seq data. We have a draft genome (125Mb, a eukaryote) assembled from 454-data and are now trying to map our Solid reads to this genome to predict splice sites. The idea is to use these predicted splice sites to make intron hints for the gene finder Augustus to create correct gene models.

Augustus can cope with "hints" created by mapping Illumina reads (converted to fasta) with splice-agnostic blat. So as long as you have some gene models for training, unspliced mappings should work, I hope.

Originally posted by Hobbe View Post

We are currently trying Bowtie/Tophat, but get weird results. For example, when working with a subset of our reads we find some splice sites, but these are not found when we add more data. Also, we have earlier tried Corona Light together with Splitseek, and Bowtie/Tophat does not find sites that were found with Corona Light/Splitseek. On the other hand, Corona Light/Splitseek is timeconsuming/awkward to run and often reports splice sites that are a few bp off, so that is not an ideal choice either.

This cannot be an uncommon situation, so what are the rest of you doing in these situations? No closely related genomes have been sequenced.

I got strange results from tophat vs bowtie mapping SOLID reads without GFF gene models guide (draft+ mamalian genome): bowtie in colorspace mapped _more_ reads than tophat. I used the latest versions (TopHat 1.3.1 and Bowtie 0.12.7).

**Hobbe** · 08-15-2011, 10:46 PM

Originally posted by darked89 View Post

Augustus can cope with "hints" created by mapping Illumina reads (converted to fasta) with splice-agnostic blat. So as long as you have some gene models for training, unspliced mappings should work, I hope.

Blat is the preferred program to use for spliced mapping (see the Augustus Rnaseq instructions). You really need those intron hints to get correct gene models. Blat doesn't work on Solid data though.

Of biggest importance in our case was to have Augustus trained on the actual organism. We did this using our 454 cDNA data, and using this training the number of correctly found genes in our small set (14) of known test genes increased from 6 to 9 (compared to using the training files for distantly related organisms that came with Augustus). Adding intron hints we are now up to 11 out of 14 genes, but this is only with a small part of our Solid rnaseq data, and we are now working on adding more hints. The only solution we have just now is using the old Corona Light pipeline together with Splitseek by Adam Ameur. Slow, but seems to work.

IMO, there is still a great need for a good spliced mapper for Solid data.

**darked89** · 08-16-2011, 05:18 AM

Originally posted by Hobbe View Post

Blat is the preferred program to use for spliced mapping (see the Augustus Rnaseq instructions). You really need those intron hints to get correct gene models. Blat doesn't work on Solid data though.

Same for FASTQ format. Maybe there is something to be gained from color 2 fasta conversion and mapping by blat.

Originally posted by Hobbe View Post

Of biggest importance in our case was to have Augustus trained on the actual organism. We did this using our 454 cDNA data, and using this training the number of correctly found genes in our small set (14) of known test genes increased from 6 to 9 (compared to using the training files for distantly related organisms that came with Augustus). Adding intron hints we are now up to 11 out of 14 genes, but this is only with a small part of our Solid rnaseq data, and we are now working on adding more hints.

Also you may try to use CEGMA (http://korflab.ucdavis.edu/Datasets/cegma/) either to produce yet another training or testing set. Also at times there is no way out except starting semi-manual annotation, again be it for the training or testing sets. Blastp your Augustus predictions: genes whith high conservation/100% coverage in other species are likely to be real.

Originally posted by Hobbe View Post

The only solution we have just now is using the old Corona Light pipeline together with Splitseek by Adam Ameur. Slow, but seems to work.

Is it the currently recommended setup by Splitseek author? In the Splitseek 1.3.4 manual the recommended one is Whole Transcriptome Pipeline.

Originally posted by Hobbe View Post

IMO, there is still a great need for a good spliced mapper for Solid data.

Indeed. I have found some other software (X-MATE), but it requires junction libraries and uses yet another pipeline (http://solidsoftwaretools.com/gf/project/mapreads/).
See:

http://openwetware.org/wiki/Wikiomics:RNA-Seq#SOLiD_data_only

**adameur** · 09-08-2011, 11:31 AM

Hi,

Just a few words about SplitSeek from the author. It only works with the split read mapper from the AB Whole Transcriptome Pipeline, always had. I'm aware it is akward but unfortunately there are currently no good alternatives.

The good news is that AB WTP actually works fine once you get it to run. I even managed to run some 75bp reads from the SOLiD5500 through WTP and SplitSeek (using 25bp anchors in the mapping) so it might be an option also in the future.

/Adam

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 97 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 117 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 109 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Splice site prediction with solid rna-seq data

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News