Seqanswers Leaderboard Ad

**GKM** · 09-01-2010, 04:40 PM

What are you trying to do with those replicates? If you are looking for replicate support of novel transcripts, you can probably do it with cuffcompare and then set whatever prevalence thresholds you feel comfortable with. Version 0.8.3 isn't replicate-aware during the transcript reconstruction process though.

P.S. Why are you using bowtie as input, you aren't going to see the novel splices without TopHat. Also, what kind of reads do you have and what is the purpose of the analysis?

**tsucheta** · 09-01-2010, 06:58 PM

GKM, Thanks for the response!

We have colorspace data and as I see, tophat is not compatible with colorspace reads. May be I have to convert the into MAQ format before I can use them for tophat. But nevertheless, I have the sam outputs from bowtie which are generated after the raw reads were aligned to the genome.

Now I want to cluster the transcripts in reference to the genome, so that they can aid in gene model correction. I am not sure how the replicates will fare in this process. I would certainly try and see what 'cuffcompare' has to say with respect to the difference.

Any advise with this regard will be great.

Thanks

**GKM** · 09-01-2010, 07:05 PM

In order for running Cufflinks to make sense, you will need spliced reads. From what you describe, I get the impression that you are aligning against the genome without the junctions, so you are missing those completely.

What you should do is convert reads to fastq (I haven't worked with SOLID RNA-Seq data myself so I am not familiar with what the available options for doing this are), then run TopHat (make sure you supply it with the correct parameters, insert length, junctions if you have short reads, etc.), then run cufflinks.

After you run cufflinks, you can use cuffcompare to compare to the existing annotation

**tsucheta** · 09-01-2010, 08:23 PM

Thanks GKM I really appreciate it!!

Is there any other software available for RNAseq clustering? And I still am just curious to know what people do when they have experimental replicates.

Thanks

**GKM** · 09-01-2010, 08:25 PM

I am pretty sure replicate support (i.e. run cufflinks on replicates) is being worked on by cufflinks developers, but how soon it will be available to the wide community, I have no idea. Other software I haven't used

In the meantime my advice is to run it on each replicate, then run cuffompare and look at the tracking files for transcripts you find in all replicates.

**Uwe Appelt** · 09-06-2010, 12:22 AM

Originally posted by tsucheta View Post

Thanks GKM I really appreciate it!!

Is there any other software available for RNAseq clustering? And I still am just curious to know what people do when they have experimental replicates.

Thanks

Hi tsucheta,

you're right, Tophat isn't able to handle color-space reads, so you definitely have to stick with bowtie, which shouldn't be a problem at all, unless your reads are not sequenced too long. Unless you are not interrested in splice-juntion tracking, a splice-mapper like Tophat only makes sense for longer reads, but not necessarily for reads up to, lets say, 50Bp. This is, because short reads are not expected to span exon boundaries to such a large extend that you will miss information.

Anyway, if you've aligned your reads already, there are at least three packages out there that properly handle biological and technical replicates:

EdgeR appears the most mature one. DESeq is very similar to EdgeR and appears to be more powerful in calling differential expression. DEGSeq follows a different statistical approach (controversially discussed), produces nice pictures implicitely and is very easy to use (although the others are as well).

Uwe

**Thomas Doktor** · 09-06-2010, 05:36 AM

I would say a splice-aware mapper like TopHat always makes sense when aligning RNA-seq reads, since you always lose information if you dont, regardless of the length of the reads. It is not more difficult to run TopHat than Bowtie and only takes a little longer. After aligning the reads, you can always chose to extract only the reads that did not span a splice junction if that is what you wish.

**tsucheta** · 09-08-2010, 02:52 PM

Thanks for your posts! It has really been useful. While I have not tried the following softwares
1. EdgeR
2. DESeq
3. DEGSeq
I am still stuck with tophat. I could convert sequences to fastq and finally running tophat ends with the following errors:

--------
[Wed Sep 8 18:38:48 2010] Beginning TopHat run (v1.0.14)
-----------------------------------------------
[Wed Sep 8 18:38:48 2010] Preparing output location ./tophat_out/
[Wed Sep 8 18:38:48 2010] Checking for Bowtie index files
[Wed Sep 8 18:38:48 2010] Checking for reference FASTA file
Warning: Could not find FASTA file /home/data/bowtie-0.12.5/index/soj
aeV1.fa
[Wed Sep 8 18:38:48 2010] Reconstituting reference FASTA file from Bowtie index

[Wed Sep 8 18:39:07 2010] Checking for Bowtie
Bowtie version: 0.12.5.0
[Wed Sep 8 18:39:07 2010] Checking reads
seed length: 50bp
format: fastq
quality scale: phred33 (default)
[Wed Sep 8 18:43:08 2010] Reading known junctions from GFF file
Warning: TopHat did not find any junctions in GFF file
[Wed Sep 8 18:44:05 2010] Mapping reads against sojaeV1 with Bowtie
[Wed Sep 8 18:44:05 2010] Joining segment hits
Traceback (most recent call last):
File "/home/data/tophat-1.0.14.Linux_x86_64/tophat", line 1854, in <module>

sys.exit(main())
File "/home/data/tophat-1.0.14.Linux_x86_64/tophat", line 1814, in main
user_supplied_juncs)
File "/home/data/tophat-1.0.14.Linux_x86_64/tophat", line 1562, in spliced_
alignment
segment_len)
File "/home/data/tophat-1.0.14.Linux_x86_64/tophat", line 1229, in split_re
ads
reads_file = open(reads_filename)
IOError: [Errno 2] No such file or directory: './tophat_out/tmp//left_kept_reads
_missing.fq'

--
I am running the binary tophat distribution.

Many thanks

**Uwe Appelt** · 09-08-2010, 11:44 PM

Hi tsucheta,

i'm not really sure, what you fed into Tophat?! Fastq, but what code? Tophat will not work with color-spaced reads, i mentioned that. So i assume, you scriptually translated into base-space-fastq and ran Tophat with that?

But thats exactly what i was referring to. You will for sure not miss that much information by not using Tophat with 50bp reads and stick with Bowtie instead. However, translating color-space reads (whatever format) to base-space reads (whatever format) introduces a vaste of nucleotide misinterpretations, unless it is not properly decoded by a color-space aligner. Just imagine you have a color-spaced read that perfectly aligns to the reference genome => translation to base-space and then aligning in base-space is no problem. Now imagine you have a color-spaced read that aligns but has a single SNP (or sequencing error) nearby its 5' end. This read could still be aligned well in color-space. But translating this one into base-space without the knowledge that there indeed is a SNP leads to a almost completely different read in base-space and therefore results in no alignments at all any more! This has been discussed several times.

Afterall, i cannot tell for sure, what problem Tophat indeed has. The error message means that not a single one of your "left reads" has been properly aligned, because the file which should contain that information doesn't even exist. So the question still is: what exactly did you fed into Tophat. And by the way: do you really have paired end data? Thought it wouldn't be available before SOLiD4? And even this is so, as far as i know paired-end with SOLiD4 produces asymetric read length, which (according to Tophat-manual) is again not supported (all reads need to have same length!).

Could you please explain in more detail what exactly you did? And please also provide the command-line used to invoke Tophat.

Best,
Uwe

ps: just noticed the warnings Tophat provided.

Code:

Warning: Could not find FASTA file /home/data/bowtie-0.12.5/index/sojaeV1.fa

i dimly remember that Tophat needs the fasta file of the reference genome in addition to the bowtie-index. Don't really know, whether this is still the case, but at least there is the warning and given you result in no reads aligned it's worth a try, isn't it.

**tsucheta** · 09-09-2010, 06:08 AM

Thanks Uwe for the responses! I have figured out the reason why tophat was exiting. I was trying to run it with colorspace indexed reference. That being fixed, it runs fine.
Coming to alignment sensitivity, I could not agree with you more about the lost alignments via tophat route.
While translating the colorspace data into fastq format, I have lost a number of reads because there was a either a "." character or a number >3.
I have not investigated the details about the alignment qualities and the number of reads that failed to align through tophat, but looking at just the sam output files, looks like tophat produces sam files 1/4th the size of bowtie sam output file. SO, there may be some information loss there.

In the coming days, I will compare bowtie -> cufflink with tophat -> cufflink output to see if there is a major difference.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Help with cufflink

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News