Hi,
I have a question on coordinates in a gff3 file supplied to tophat for an RNA seq experiment.
I am planning on converting the Ensembl gtf file to gff3 file. For example, one of the line in the gtf file looks like (a real example based on GRCh37):
11 protein_coding exon 207303 207383 . - . gene_id "ENSG00000177951"; transcript_id "ENST00000410108"; exon_number "1"; gene_name "BET1L"; transcript_name "BET1L-005";
I have questions on whether these coordinates can be supplied directly without any change. It seems to me (and a check in UCSC browser) that the start position "207303" is a "1" based coordinate while the end position "207383" is a "0" based coordinate.
In a simplified example, the coordinates "TCG" in the following sequences would be: start = 2 and end = 4 in the gtf file
|A|T|C|G|A|T|
The manual of tophat and bowtie do not seem to give specific instructions on this issue. Should I subtract 1 from the start position or add 1 to the end position or keep as it is?
Any suggestion would be appreciated!
I have a question on coordinates in a gff3 file supplied to tophat for an RNA seq experiment.
I am planning on converting the Ensembl gtf file to gff3 file. For example, one of the line in the gtf file looks like (a real example based on GRCh37):
11 protein_coding exon 207303 207383 . - . gene_id "ENSG00000177951"; transcript_id "ENST00000410108"; exon_number "1"; gene_name "BET1L"; transcript_name "BET1L-005";
I have questions on whether these coordinates can be supplied directly without any change. It seems to me (and a check in UCSC browser) that the start position "207303" is a "1" based coordinate while the end position "207383" is a "0" based coordinate.
In a simplified example, the coordinates "TCG" in the following sequences would be: start = 2 and end = 4 in the gtf file
|A|T|C|G|A|T|
The manual of tophat and bowtie do not seem to give specific instructions on this issue. Should I subtract 1 from the start position or add 1 to the end position or keep as it is?
Any suggestion would be appreciated!