Seqanswers Leaderboard Ad

**KevinLam** · 11-07-2010, 09:29 PM

if you can maybe try using Bioscope/mapreads for the mapping.
it improves the odds.
~50% is expected for Whole transcriptome.
So you are getting lower than average..

**repinementer** · 11-07-2010, 09:52 PM

Hmmm I already have results with Bioscope but I want to run these by using Tophat with out giving any known refseq annotation in order to find new transcripts.

Anyone suggestions on Tophat/bowtie

**KevinLam** · 11-07-2010, 09:57 PM

I see. How's the mapping % with bioscope then?

**repinementer** · 11-07-2010, 10:07 PM

is around ~ 80 %

**adarob** · 11-08-2010, 12:12 PM

Can you try using fr-firststrand?

**repinementer** · 11-08-2010, 07:07 PM

but fr-secondstarand is meant for SOLID data right ?

**adarob** · 11-09-2010, 12:08 PM

In general yes, but it depends on the protocol you used.

**bacdirector** · 11-09-2010, 01:04 PM

if you're using 50b reads, try trimming them to 35b in colorspace (i.e., use only the first 35 bases). Most of the -1's (no-calls) and sequencing errors occur between bases 36 and 50.

We've DOUBLED our mapping using bioscope this way (from 25 million read to 50million reads).

There is a separate thread on this topic, but I'd be happy to hear back from you if you try this.

Let me know if you'd like to peek at our source code for trimming in colorspace.

**rdeborja** · 11-09-2010, 05:11 PM

I'm curious to know if people are quoting the mapping percent as identified in the alignment.txt report or if they are calculating it based on total mapped reads as a percent of total reads. The Bioscope alignment report shows mapped reads as 100% then all figures others based on this.

I've been reporting total mapped reads, unique aligned reads, ribosomal and unmapped reads as a percent of total reads. Total mapped is usually 60-70%.

With the trimming to 35bp, is it necessary since Bioscope is doing a seed and extend on the reads. I find my average aligned read length to be ~40bp with a size frequency plot being bimodal at 25bp and 50bp.

**repinementer** · 11-09-2010, 06:01 PM

@bacdirector: Yes I would be happy to do that. COuld you please provide me the code. My SOLID .csfasta data format looks like this

# Wed Mar 10 00:25:29 2010 /share/apps/corona/bin/filter_fasta.pl --output=/data/results/sl001/SL001_R00089/RHE012_01pgx2/results.F1B1/primary.20100310065316729 --name=SL001_R00089_RHE012_01pgx2 --tag=F3 --minlength=50 --mincalls=25 --prefix=T /data/results/sl001/SL001_R00089/RHE012_01pgx2/jobs/postPrimerSetPrimary.2745/rawseq
# Cwd: /home/pipeline
# Title: SL001_R00089_RHE012_01pgx2
>853_10_97_F3
T.....023..2.10..120.3.2010.031...2.1.30.22001..00.
>853_10_111_F3
T.....113....33..003.0.010..100...2.0..2.03002..02.
>853_10_157_F3
T.....230..2.00..330.2.1313.231...3.1.10.02031..10.
>853_10_194_F3
T.....031....32..323.0.322..100...3.0..1.30313..23.
>853_10_221_F3

**plabaj** · 11-24-2010, 02:30 AM

Problem with TopHat and ABI SOLiD

I don't know if you are aware but the current version TopHat is using different algorithm than was described in TopHat paper from 2009. The current algorithm is described in supplement to Cufflinks paper (Trapnell 2010).

The most important change is that read is split to segments, and discovering of the splice junctions is based where these segments aligned. New TopHat is optimsed for >=75bp reads, in this case each read is divided to 3 segments each 25bp.

You run TopHat with setting the segment length to 50 (--segment-length 50), which means that there will be just ONE segment, thus such setting cannot discover any splice junction.

And here is the problem with current TopHat, it seems to be NOT designed for ABI SOLiD reads. You have two options for 50bp reads:
- use default settings, and be aware that not all splice junctions will be discovered
- set --segment-length 16, but then 16bp segments will align everywhere and in many cases will be discarded, so again many splice junctions will not be discovered and many false positives will be found

For 36bp reads situations is even worse.
Old versions of TopHat supported such short reads but didn't support color space reads

In my opinion, so far, there is no proper software for discovering new transcripts, or even assembling properly existing ones, for ABI SOLiD data. If you know any please let me know.

**tsucheta** · 12-15-2010, 08:30 PM

@plabaj:
Very True!! My experience with tophat has been disaster so far.. While bowtie gives way more alignments; using tophat for single end reads, it reports only 24% alignment. My reads size is 50 bases. The junctions.bed file produced is extremely unreliable since it reports only 851 junctions in < 10% of the scaffolds. The remaining is unreported! On the top of it, most of the junctions are overlapping. Is there a possible reason where it gets things wrong?

**plabaj** · 12-16-2010, 12:02 AM

@tsucheta

As I wrote before.
New algorithm for finding splice junctions implemented in TopHat is responsible for that.
It starts working properly for reads of length 75bp.

If your reads are not ABI SOLiD try to install older version of TopHat (you have to find out which one analysing the versions changes).
Maybe it will help.

Or if you are interested just in transcripts expresion use BowTie against transcripts sequences (not genomic sequence like for TopHat)

**xinwu** · 12-16-2010, 01:19 AM

Originally posted by tsucheta View Post

@plabaj:
Very True!! My experience with tophat has been disaster so far.. While bowtie gives way more alignments; using tophat for single end reads, it reports only 24% alignment. My reads size is 50 bases. The junctions.bed file produced is extremely unreliable since it reports only 851 junctions in < 10% of the scaffolds. The remaining is unreported! On the top of it, most of the junctions are overlapping. Is there a possible reason where it gets things wrong?

What's your segment size?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

ABI-SOLID data with Bowtie-0.12.7 and TopHat-1.1.2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News