Seqanswers Leaderboard Ad

**Morten** · 06-29-2011, 05:12 AM

It seems I just solved my problem.
The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
I downloaded one originating from UCSC, now it works.

**pinki999** · 07-03-2011, 11:55 PM

Hi Morten,

I am also facing the same problem partially. I get almost the same warning messages as above and all the output files are empty. I am using the annotation file which i downloaded from ensembl (hg18).

If you have any idea about this, can you please help me?

**gavin.oliver** · 07-04-2011, 06:42 AM

Originally posted by Morten View Post

It seems I just solved my problem.
The problem was the annotation file (Homo_sapiens_annotation_hg19-GRCh37.62.gtf)
I downloaded one originating from UCSC, now it works.

Can you post the link to the hg19 gtf annotation? I've been trying to find one with no luck

**Dario1984** · 07-04-2011, 04:00 PM

Hi everyone,

The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

Note: If an arbitrary GTF/GFF3 file is used as input (to CuffDiff) (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this:

Code:

cuffcompare -s /path/to/genome_seqs.fa -r annotation.gtf annotation.gtf

I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

I agree that this is quite obscure and hard to find, especially since the argument description states

<transcripts.(gtf/gff)> A transcript annotation file produced by cufflinks, cuffcompare, or other source.

The other source implies that a standard GTF file from UCSC should work, but this is misleading.

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

**gavin.oliver** · 07-04-2011, 11:49 PM

Does anyone actually have links to UCSC GTF files? I have been able to find none. The only ones I have located are based on Ensembl annotation and chromosome names.

**nsl** · 07-19-2011, 11:58 AM

sorry for the naive question, but does it matter whether the annotations are from UCSC or Ensemble? I used Ensemble and get "0" values as well.

**Morten** · 07-19-2011, 12:53 PM

I downloaded the UCSC annotation file here:

http://www.nimblegen.com/downloads/annotation/hg19/

**fabrice** · 07-24-2011, 04:52 AM

I have the same problem when run cufflinks.

I run cufflinks 1.0.3 with command:

bin/rnaseq/cufflinks-1.0.3.Linux_x86_64/cufflinks -p 6 -I 5000000
--upper-quartile-norm
-G ~/bin/genome_index/annotation/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.gtf
--output-dir mapping/7124 mapping/7124/accepted_hits.bam

I got these output.

Warning: Using default Gaussian distribution due to insufficient
paired-end reads in open ranges. It is recommended that correct
paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
> Upper Quartile: 366.70
> Read Type: 108bp x 102bp
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80

My data is Illumina pair-end RNA-seq 2*101bp which is mapped by
Tophat. Before mapping the data, I have used cutadat to trim the
adaptor, So read1 and read2 will not the same length after trimed.

1, Do you have some suggestion why cufflinks cannot learns the
fragment length mean from my data?
2, Why it output so strange Read Type: 108bp x 102bp?

Originally posted by Dario1984 View Post

Hi everyone,

The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.

I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

**DZhang** · 07-24-2011, 05:48 AM

Hi fabrice,

I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

Thank you,
Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 06:24 AM

Hi Douglas,

Thank you for your suggestions. It seems that there is not plenty of properly mapped pair-end reads in my bam files.

I think there is a problem in the mapping. Maybe this problem caused by the parameter -r/--mate-inner-dist in tophat. Because I trimed some reads, the different sizes of forward and reverse reads will let this parameter hard to set. The inner distance between mate pairs is variation.

-r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

Originally posted by DZhang View Post

Hi fabrice,

I am not entirely sure whether the different sizes of forward and reverse reads caused the issue but the first thing to check is to make sure you have plenty of properly mapped pair-end reads in your bam files. (One tool for that is picard.). If you do, then look into cufflinks. If you do not, please look into the Tophat step.

Thank you,
Douglas
www.contigexpress.com

**DZhang** · 07-24-2011, 06:29 AM

Hi fabrice,

Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 06:40 AM

Hi Douglas,

I have try trimed and without trimed with BWA. I found quality trimming can get more properly paired mapping. About 30% in my dataset, so I think quality trimming is necessary to my dataset.

BWA seems no problem for different sizes. But I prefer using Tophat to output junctions.bed, insertions.bed and deletions.bed.

Thanks.

Originally posted by DZhang View Post

Hi fabrice,

Can you try without trimming your reads? For most of the mapping applications, quality trimming is usually not necessary as the poor-quality reads are just simply marked as 'unmapped'. (If your dataset is unusually large, it is a different story.)

Douglas
www.contigexpress.com

**DZhang** · 07-24-2011, 07:28 AM

Hi fabrice,

Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

Douglas

https://www.contigexpress.com

**fabrice** · 07-24-2011, 07:49 AM

tophat-1.3.1.Linux_x86_64/tophat --mate-inner-dist 194 -p 6 --segment-mismatches 2 --segment-length 25 --mate-std-dev 25 --min-anchor 8 --splice-mismatches 0 --min-intron 50 --max-intron 5000000 --min-isoform-fraction 0.15 --max-multihits 40 --solexa1.3-quals -o mapping/7124_3 ~/bin/genome_index/tophat_indexes/Homo_sapiens.GRCh37.63/Homo_sapiens.GRCh37.63.dna.chromosome mapping/7124_3/7124_3_1.fq mapping/7124_3/7124_3_2.fq

Here mate-inner-dist=PCR size - 2*101 = 194.

But read1 and read2 have been trimed.

Originally posted by DZhang View Post

Hi fabrice,

Can you post your tophat command? Specifically I am looking for the library insert length for your reads.

Douglas
www.contigexpress.com

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 25 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Cufflinks / Cuffdiff problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News