Unconfigured Ad

**Morten** · 06-29-2011, 05:12 AM

It seems I just solved my problem.
The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
I downloaded one originating from UCSC, now it works.

**pinki999** · 07-03-2011, 11:55 PM

Hi Morten,

I am also facing the same problem partially. I get almost the same warning messages as above and all the output files are empty. I am using the annotation file which i downloaded from ensembl (hg18).

If you have any idea about this, can you please help me?

**gavin.oliver** · 07-04-2011, 06:42 AM

Originally posted by Morten View Post

It seems I just solved my problem.
The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
I downloaded one originating from UCSC, now it works.

Can you post the link to the hg19 gtf annotation? I've been trying to find one with no luck

**Dario1984** · 07-04-2011, 04:00 PM

Hi everyone,

The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

Note: If an arbitrary GTF/GFF3 file is used as input (to CuffDiff) (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this:

Code:

cuffcompare -s /path/to/genome_seqs.fa -r annotation.gtf annotation.gtf

I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

I agree that this is quite obscure and hard to find, especially since the argument description states

<transcripts.(gtf/gff)> A transcript annotation file produced by cufflinks, cuffcompare, or other source.

The other source implies that a standard GTF file from UCSC should work, but this is misleading.

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

**gavin.oliver** · 07-04-2011, 11:49 PM

Does anyone actually have links to UCSC GTF files? I have been able to find none. The only ones I have located are based on Ensembl annotation and chromosome names.

**nsl** · 07-19-2011, 11:58 AM

sorry for the naive question, but does it matter whether the annotations are from UCSC or Ensemble? I used Ensemble and get "0" values as well.

**Morten** · 07-19-2011, 12:53 PM

I downloaded the UCSC annotation file here:

NameBright - Coming Soon

http://www.nimblegen.com/downloads/annotation/hg19/

**fabrice** · 07-24-2011, 04:52 AM

I have the same problem when run cufflinks.

I run cufflinks 1.0.3 with command:

bin/rnaseq/cufflinks-1.0.3.Linux_x86_64/cufflinks -p 6 -I 5000000
--upper-quartile-norm
-G ~/bin/genome_index/annotation/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf
--output-dir mapping/7124 mapping/7124/accepted_hits.bam

I got these output.

Warning: Using default Gaussian distribution due to insufficient
paired-end reads in open ranges. It is recommended that correct
paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
> Upper Quartile: 366.70
> Read Type: 108bp x 102bp
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80

My data is Illumina pair-end RNA-seq 2*101bp which is mapped by
Tophat. Before mapping the data, I have used cutadat to trim the
adaptor, So read1 and read2 will not the same length after trimed.

1, Do you have some suggestion why cufflinks cannot learns the
fragment length mean from my data?
2, Why it output so strange Read Type: 108bp x 102bp?

Originally posted by Dario1984 View Post

Hi everyone,

The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

**DZhang** · 07-24-2011, 05:48 AM

Hi fabrice,

I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

Thank you,
Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 06:24 AM

Hi Douglas,

Thank you for your suggestions. It seems that there is not plenty of properly mapped pair-end reads in my bam files.

I think there is a problem in the mapping. Maybe this problem caused by the parameter -r/--mate-inner-dist in tophat. Because I trimed some reads, the different sizes of forward and reverse reads will let this parameter hard to set. The inner distance between mate pairs is variation.

-r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

Originally posted by DZhang View Post

Hi fabrice,

I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

Thank you,
Douglas
www.contigexpress.com

**DZhang** · 07-24-2011, 06:29 AM

Hi fabrice,

Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 06:40 AM

Hi Douglas,

I have try trimed and without trimed with BWA. I found quality trimming can get more properly paired mapping. About 30% in my dataset, so I think quality trimming is necessary to my dataset.

BWA seems no problem for different sizes. But I prefer using Tophat to output junctions.bed, insertions.bed and deletions.bed.

Thanks.

Originally posted by DZhang View Post

Hi fabrice,

Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

Douglas
www.contigexpress.com

**DZhang** · 07-24-2011, 07:28 AM

Hi fabrice,

Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 07:49 AM

tophat-1.3.1.Linux_x86_64/tophat --mate-inner-dist 194 -p 6 --segment-mismatches 2 --segment-length 25 --mate-std-dev 25 --min-anchor 8 --splice-mismatches 0 --min-intron 50 --max-intron 5000000 --min-isoform-fraction 0.15 --max-multihits 40 --solexa1.3-quals -o mapping/7124_3 ~/bin/genome_index/tophat_indexes/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.dna.chromosome mapping/7124_3/7124_3_1.fq mapping/7124_3/7124_3_2.fq

Here mate-inner-dist=PCR size - 2*101 = 194.

But read1 and read2 have been trimed.

Originally posted by DZhang View Post

Hi fabrice,

Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

Douglas
www.contigexpress.com

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Yesterday, 06:09 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 Yesterday, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Cufflinks / Cuffdiff problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News