Unconfigured Ad

**DZhang** · 07-24-2011, 10:55 AM

Hi fabrice,

Here are a few things for you to consider when trouble shooting:

1) use fastqc to check the quality of your reads
2) use fastx toolkit to check the length distribution of your trimmed reads. One thing to find out is whether most of the trimming is from 3'-end. If it is, you may trim the reads by fixed number of nts so you have uniform read length.
3) BWA alignment results are useful but I would use Bowtie to align the reads and check the mapping statistics, as Tophat uses Bowtie to map so the results are more relevant
4) Start with Tophat default and mandatory parameters to see if you can get decent results

Hope this helps.

Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 11:26 AM

Hi Douglas,

Thanks for your helpful suggestions.

1) use fastqc to check the quality of your reads
Yes. I have done these. I have trimed reads which quality less than 10. After trimed, the base quality is nice. All above 10.

2) use fastx toolkit to check the length distribution of your trimmed reads. One thing to find out is whether most of the trimming is from 3'-end. If it is, you may trim the reads by fixed number of nts so you have uniform read length.

After trim, the reads are not uniform. I did not trim the reads by fixed number of nts. I also trim the adaptor sequences and remove N at both side.

3) BWA alignment results are useful but I would use Bowtie to align the reads and check the mapping statistics, as Tophat uses Bowtie to map so the results are more relevant

If I used the output bam file from bwa, then used this bam file feed into cufflinks, do you think this will have some potential problems? I just think bwa will be better than bowtie (In my data, I found using bwa can get more properly paired reads).
After I want to just use Tophat to output junctions.bed, insertions.bed and deletions.bed.
It means using bwa->cufflinks to get the expression values. Tophat to estimate the junctions.

I hope there is not potential problem or you have better suggestions.

4) Start with Tophat default and mandatory parameters to see if you can get decent results.

When I donot do the trim and using Tophat, it works. So I just think the problem is caused by trimming the reads. This let the -r/--mate-inner-dist parameters are not correct.

Thank you very much for your time.

Originally posted by DZhang View Post

Hi fabrice,

Here are a few things for you to consider when trouble shooting:

1) use fastqc to check the quality of your reads
2) use fastx toolkit to check the length distribution of your trimmed reads. One thing to find out is whether most of the trimming is from 3'-end. If it is, you may trim the reads by fixed number of nts so you have uniform read length.
3) BWA alignment results are useful but I would use Bowtie to align the reads and check the mapping statistics, as Tophat uses Bowtie to map so the results are more relevant
4) Start with Tophat default and mandatory parameters to see if you can get decent results

Hope this helps.

Douglas
www.contigexpress.com

**DZhang** · 07-24-2011, 04:03 PM

Hi fabrice,

1) It is a well known fact to me that given the same reference sequence(s) and the same set of reads, BWA in general maps more reads than Bowtie. I probably should have mentioned this earlier.

2) I do not believe Tophat will take BWA-produced BAM.

3) My suggestion is if a big portion of reads have low quality scores at the 3', I'd trim the 3' with fixed number of nts so you keep the read length uniform. Or go ahead with untrimmed reads and proceed to see whether the results make sense.

Douglas

https://www.contigexpress.com

**pinki999** · 07-24-2011, 11:51 PM

Hi,

I am working on Solid data and its not paired-end. Even I get this warning message:
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges. It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
> Upper Quartile: 242.20
> Read Type: 50bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80

Is it alright to ignore this ?

Pinki

**fabrice** · 07-25-2011, 01:25 AM

Douglas,

Why you think that cufflinks cannnot take BWA-produced BAM? In cufflinks website, they said cufflinks can take bam file from others mapping. At this moment, I just run into another problem for sort the bam file.

Two y chromosome Ensembl 63? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=12939

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Originally posted by DZhang View Post

Hi fabrice,

1) It is a well known fact to me that given the same reference sequence(s) and the same set of reads, BWA in general maps more reads than Bowtie. I probably should have mentioned this earlier.

2) I do not believe Tophat will take BWA-produced BAM.

3) My suggestion is if a big portion of reads have low quality scores at the 3', I'd trim the 3' with fixed number of nts so you keep the read length uniform. Or go ahead with untrimmed reads and proceed to see whether the results make sense.

Douglas
www.contigexpress.com

**fabrice** · 07-25-2011, 01:26 AM

Pinki,

I think you need to set these parameters. Because you used single-end.

-m/--frag-len-mean <int> This is the expected (mean) fragment length. The default is 200bp.
Note: Cufflinks now learns the fragment length mean for each SAM file, so using this option is no longer recommended with paired-end reads.
-s/--frag-len-std-dev <int> The standard deviation for the distribution on fragment lengths. The default is 80bp.
Note: Cufflinks now learns the fragment length standard deviation for each SAM file, so using this option is no longer recommended with paired-end reads.

Originally posted by pinki999 View Post

Hi,

I am working on Solid data and its not paired-end. Even I get this warning message:
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges. It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
> Upper Quartile: 242.20
> Read Type: 50bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80

Is it alright to ignore this ?

Pinki

**DZhang** · 07-25-2011, 07:43 AM

Hi,

I should have been more careful in my wording. You are right that Cufflinks accepts SAM/BAM files generated by other programs. Both BWA and Tophat are mappers. Tophat considers splicing in its mapping and BWA does not. If you use BWA-generated BAM for differential expression analysis, you basically throw out the reads that overlap with the splicing junctions and I do not think it is a good idea.

Douglas

https://www.contigexpress.com

Originally posted by fabrice View Post

Douglas,

Why you think that cufflinks cannnot take BWA-produced BAM? In cufflinks website, they said cufflinks can take bam file from others mapping. At this moment, I just run into another problem for sort the bam file.

http://seqanswers.com/forums/showthread.php?t=12939

**fabrice** · 07-25-2011, 08:05 AM

Douglas,

Thank you for your calrification and helpful suggestions.

Something I am still confused in my RNA-seq data analysis.

1, Tophat considers splicing in its mapping and BWA does not. It seems that we expect Tophat will get more properly paired reads. But the fact is that bwa get more. In my analysis, I just want to get the quantitative expression of each gene in samples. I just think it is always better considers splicing in mapping. You said that for differential expression analysis it is better to considers splicing in mapping. Is there some case (or in my case ) that BWA
mapping is acceptable? Or BWA is not suitable for RNA-seq mapping to genome?

2, If not for novel junction analysis, is it better to mapping RNA-seq to transcriptome, not genome? eg, for differential expression analysis. Mapping to transcriptome also have problems because one gene have serverl isofroms. This will let the reads have mutiple hits.

Thank you very much for your time.

Originally posted by DZhang View Post

Hi,
I should have been more careful in my wording. You are right that Cufflinks accepts SAM/BAM files generated by other programs. Both BWA and Tophat are mappers. Tophat considers splicing in its mapping and BWA does not. If you use BWA-generated BAM for differential expression analysis, you basically throw out the reads that overlap with the splicing junctions and I do not think it is a good idea.

Douglas
www.contigexpress.com

**fangquan** · 08-16-2011, 07:55 PM

About the Cufflinks / Cuffdiff problem

Hi all,

I was able to run:

cuffdiff -o ./cuffdiff refGene_chr1.GTF B6341/hg19_chr1_seg/accepted_hits.bam 4242/hg19_chr1_seg/accepted_hits.bam

the refGene_chr1.GTF was downloaded from UCSC-> select
Group: Gene and Gene Prediction Tracks
Track: RefSeq Genes
Table: refGene

I only select the Chr1 and cuffdiff result is like:

Performed 3204 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 3179 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking

I'm not sure if this makes any sense.
=====================================================
So the overall question is should we run cuffdiff directly or run cuffcompare first and then cuffdiff.

I welcome any further discussion.

fangquan

**fangquan** · 08-16-2011, 10:33 PM

Hi Dario,

You are right. But if you don't go through compare step, you are still able to get some results from cuffdiff like this:

Performed 3204 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 3179 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests

It's no surprise there are some zero files because "Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use."

fangquan

Originally posted by Dario1984 View Post

Hi everyone,

The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

**Dario1984** · 08-16-2011, 11:00 PM

The GTF file from UCSC browser is not compatible with cuffdiff. You must download the annotations from the iGenomes project.

**fangquan** · 08-16-2011, 11:32 PM

Hi,

Sorry but I don't understand what do you mean "incompatible". I used the annotation from UCSC, the cuffdiff does give me some differential test results, and there is not big error reported.

On the other hand, the galaxy exercises examples also use the UCSC annotation gtf file. Though they run the cuffcompare first.

Thanks for your information. I will try the iGenomes soon.

Keep in discussion.

fangquan

Originally posted by Dario1984 View Post

The GTF file from UCSC browser is not compatible with cuffdiff. You must download the annotations from the iGenomes project.

**Balat** · 09-02-2011, 12:00 AM

Hi fabrice,
I think I figured out the problem and solution for the warning you get when you run cufflinks with paired end trimmed reads.

(Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges. It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.)

I think when you trim the paired end reads for adapters and low quality you need to have the both read mates in correct order after trimming. Cutadapt can't order the reads after trimming. You need to use a script which can trim adapters from the paired end reads and order the reads after trimming. I have used a program called 'flexible adapter remover'-far to trim the adapters and to order the paired reads after trimming. Similarly I used a script 'trim-fastq.pl' from 'Popoolation' to trim the paired end reads for low quality. This script corrects the order between the paired reads after trimming.

I used the correctly ordered and trimmed fastq files in tophat to produce a bam file. When I used this in cufflinks I did not get the warning message and it used the estimated the fragment length mean in the analysis.

I hope this helps others dealing this issue.

**krespim** · 08-08-2012, 02:05 PM

is cuffmerge necessary?

Hi Dario,

just a quick question: if I understood correctly this:

2) <outprefix>.combined.gtf

Cuffcompare reports a GTF file containing the "union" of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf.

One runs cufflinks, then cuffcompare and the output already contains a reference GFT file to use with cuffdiff and thus cuffmerge in this case is redundant, right?

Cheers.

Originally posted by Dario1984 View Post

Hi everyone,

The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

**Dario1984** · 08-08-2012, 08:00 PM

It is redundant to merge the transcripts if you use a reference GTF, but the command adds extra columns that are needed to run the isoform estimation step in cuffdiff, so it's also necessary.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 36 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 100 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 120 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 113 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News