I must say I'm new to Cufflinks.
I tried to use my own reference annotation in Cufflinks, but got no match between my sequences and the Ensembl GTF file:
- Used TopHat v1.0.12
- Used GRCh37 as the reference genome
- Used the command "tophat -p 4 -r 200 /databank/indices/bowtie/GRCh37/GRCh37 s_1_1_sequence.txt s_1_2_sequence.txt"
- Used Cufflinks v0.8.2
- Used the SAM file (i.e. accepted_hits.sam) generated by Tophat
- Used the GTF file (i.e. Homo_sapiens.GRCh37.57.gtf) downloaded from Ensembl
- Used the command "cufflinks -p 4 -G Homo_sapiens.GRCh37.57.gtf accepted_hits.sam"
1) I'm aware that I need to convert the chromosome ID in the GTF file from (1, 2, ..., X, Y, MT) to (chr1, chr2, ..., chrX, chrY, chrM). However, there are some ID that don't occur in the (1, 2, ..., X, Y, MT) list, e.g. HSCHR17_1, HSCHR6_MHC_COX, HSCHR6_MHC_SSTO, etc. What shall I do with them?
2) Some people said that I also have to subtract 1 from the start chromosome coordinate in the GTF file (http://seqanswers.com/forums/showthread.php?t=3972), but some said it depends on the reference genome used (http://seqanswers.com/forums/showthread.php?t=3582) and some didn't mention this at all (http://seqanswers.com/forums/showthread.php?t=3967). I was wondering what the correct way of doing this is? If I do have to subtract 1 from the start coordinate, do I have to subtract 1 from the end coordinate as well?
Thanks very much for your time and help.
I tried to use my own reference annotation in Cufflinks, but got no match between my sequences and the Ensembl GTF file:
- Used TopHat v1.0.12
- Used GRCh37 as the reference genome
- Used the command "tophat -p 4 -r 200 /databank/indices/bowtie/GRCh37/GRCh37 s_1_1_sequence.txt s_1_2_sequence.txt"
- Used Cufflinks v0.8.2
- Used the SAM file (i.e. accepted_hits.sam) generated by Tophat
- Used the GTF file (i.e. Homo_sapiens.GRCh37.57.gtf) downloaded from Ensembl
- Used the command "cufflinks -p 4 -G Homo_sapiens.GRCh37.57.gtf accepted_hits.sam"
1) I'm aware that I need to convert the chromosome ID in the GTF file from (1, 2, ..., X, Y, MT) to (chr1, chr2, ..., chrX, chrY, chrM). However, there are some ID that don't occur in the (1, 2, ..., X, Y, MT) list, e.g. HSCHR17_1, HSCHR6_MHC_COX, HSCHR6_MHC_SSTO, etc. What shall I do with them?
2) Some people said that I also have to subtract 1 from the start chromosome coordinate in the GTF file (http://seqanswers.com/forums/showthread.php?t=3972), but some said it depends on the reference genome used (http://seqanswers.com/forums/showthread.php?t=3582) and some didn't mention this at all (http://seqanswers.com/forums/showthread.php?t=3967). I was wondering what the correct way of doing this is? If I do have to subtract 1 from the start coordinate, do I have to subtract 1 from the end coordinate as well?
Thanks very much for your time and help.