Seqanswers Leaderboard Ad

**nilshomer** · 05-11-2011, 08:34 AM

This post is a duplicate of one in the Bioinformatics forum. Please do not duplicate your question in multiple forums.

**pbluescript** · 05-11-2011, 10:20 AM

First, you should get the newest version of Cufflinks (1.0.1).
I don't know what your coverage is like, but with human RNA, you should supply Cufflinks with a gtf file containing known transcripts. If you don't do this and do not have sufficient coverage to reconstruct transcripts completely, cufflinks might not link all the reads mapping to a gene together if there is no evidence supporting a link (split reads, pairs mapping to neighboring exons). This might explain the higher number of transcripts you are seeing since it could cause one gene to be split up into multiple smaller transcripts that do not represent reality.

**edge** · 05-11-2011, 04:51 PM

Thanks for remind, nilshomer.
I will take note of it.
Sorry for my mistakes

**edge** · 05-11-2011, 05:07 PM

Hi pbluescript,

Thanks for your advice.
I just download the latest version of Cufflinks (1.0.1) and re-run it again.
Do you know what is the main difference between ff-unstranded and fr-unstranded?
If my input file is non-stranded-specific RNA seq, what option should I choose my library type in Cufflink and Tophat?
Is it correct I set it as "fr-unstranded" in both Cufflink and Tophat?
Thanks first for your advice

**pbluescript** · 05-11-2011, 05:28 PM

Originally posted by edge View Post

Hi pbluescript,

Thanks for your advice.
I just download the latest version of Cufflinks (1.0.1) and re-run it again.
Do you know what is the main difference between ff-unstranded and fr-unstranded?
If my input file is non-stranded-specific RNA seq, what option should I choose my library type in Cufflink and Tophat?
Is it correct I set it as "fr-unstranded" in both Cufflink and Tophat?
Thanks first for your advice

I don't use strand specific libraries for my RNA-Seq based on how I have to prepare my libraries, so I haven't played around with those settings. Perhaps someone else can offer some input?
They aren't required by Tophat, so you can do the mapping without specifying a library type.

**edge** · 05-11-2011, 05:37 PM

Based on the Tophat manual, http://tophat.cbcb.umd.edu/manual.html
Tophat will treat the read as strand specific.
Since my input raw file is non-stranded specific RNA seq.
I realized Tophat got an option to specific the "--library-type".
I just not sure whether which option I should use in "--library-type" for non-stranded specific RNA seq.
Below is the option for "--library-type" in Tophat:
--fr-unstranded; --fr-firststrand; --fr-secondstrand; --ff-unstranded; --ff-firststrand; --ff-secondstrand

**edge** · 05-11-2011, 05:43 PM

Hi pbluescript,
Do you have experience running Cufflink and Tophat before?
Are you familiar with it?
Do you know that which output file in Cufflink is refer to the transcript assembly by Cufflink?
I'm interesting to extract out the transcript sequence in FASTA format for downstream analysis.
Apart from that, since my input sample is only heart tissue and only one SAM file will generate by Cufflink at the end.
Is it impossible for me to run Cuffdiff with my sample set?
"Cuffdiff takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM files containing the fragment alignments for two or more samples."

**Wei-HD** · 05-17-2011, 05:19 AM

I have asked one of the developers of Tophat about this library type option:

"If you don't specify --library-type, TopHat just treats your reads as
unstranded. (The default is *unstranded*). Actually, library-type is only intended for paired-end, so if you specify the library-type fr-unstranded option, should be the same as non-specified. "

HTH.

Originally posted by edge View Post

Based on the Tophat manual, http://tophat.cbcb.umd.edu/manual.html
Tophat will treat the read as strand specific.
Since my input raw file is non-stranded specific RNA seq.
I realized Tophat got an option to specific the "--library-type".
I just not sure whether which option I should use in "--library-type" for non-stranded specific RNA seq.
Below is the option for "--library-type" in Tophat:
--fr-unstranded; --fr-firststrand; --fr-secondstrand; --ff-unstranded; --ff-firststrand; --ff-secondstrand

**pbluescript** · 05-17-2011, 06:48 AM

Originally posted by edge View Post

Hi pbluescript,
Do you have experience running Cufflink and Tophat before?
Are you familiar with it?
Do you know that which output file in Cufflink is refer to the transcript assembly by Cufflink?
I'm interesting to extract out the transcript sequence in FASTA format for downstream analysis.
Apart from that, since my input sample is only heart tissue and only one SAM file will generate by Cufflink at the end.
Is it impossible for me to run Cuffdiff with my sample set?
"Cuffdiff takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM files containing the fragment alignments for two or more samples."

Sorry, I missed your post.
I'm not entirely clear on your question, but the output of Cufflinks includes tables with transcript information and expression values, all of which you can read about on the Cufflinks manual page.
You can get all the sequences using the UCSC table browser.
Cuffdiff looks for differential expression between multiple samples, so if you don't have at least two samples to compare, there is no point in running Cuffdiff.

**edge** · 05-17-2011, 06:43 PM

Hi Wei-HD,

Thanks a lot for your verification

Which mean the command that I type for library-type is correct ^^

**edge** · 05-17-2011, 07:03 PM

Hi pbluescript,

Thanks for your reply.
I'm understand more about Cuffdiff right now

Apart from that, do you mind to share what is the main difference between gene and transcript in general?
I just quite confusing because some journal mention gene/transcript while some journal mention gene and transcript separately

**edge** · 05-17-2011, 08:20 PM

Hi Wei-HD,

Do you familiar with Tophat software?
I'm quite confusing regarding the "-r" option in Tophat.
My input file size are Illumina pair-end read, 2X80bp, insert library size is 300.
I should set the -r as 300 or 140 (300-2X80)?
Which of the following command are correct?

Command 1:
/tophat-1.2.0.Linux_x86_64/tophat -r 300 -p 4 --solexa1.3-quals --library-type fr-unstranded human_ref_genome s_1_1.fq s_1_2.fq

or

Command 2:
/tophat-1.2.0.Linux_x86_64/tophat -r 140 -p 4 --solexa1.3-quals --library-type fr-unstranded human_ref_genome s_1_1.fq s_1_2.fq

Thanks.

**kmcarr** · 05-19-2011, 04:28 AM

In your example the proper setting for -r is 140.

**pbluescript** · 05-19-2011, 04:38 AM

Originally posted by edge View Post

Hi pbluescript,

Thanks for your reply.
I'm understand more about Cuffdiff right now

Apart from that, do you mind to share what is the main difference between gene and transcript in general?
I just quite confusing because some journal mention gene/transcript while some journal mention gene and transcript separately

Generally, one gene can have multiple transcripts due to alternative start sites, spliced exons, etc. Assigning reads to specific known transcripts can be a tricky process, especially when an exon is shared across multiple transcripts. There is a good description of how Cufflinks does this here:

404 Not Found

http://cufflinks.cbcb.umd.edu/howitworks.html

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

RNA-seq analysis by Tophat, Cufflink Problem facing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News