Hi, I apologize in advance if this has been asked already but I could not find any information on it.
I am generating counts, for genes, with HTSeq count and generated a GTF from cuffmerge. My question is, does a GTF file from cufflinks work well with HTSeq count. The original GFF file contained single lines for genes, but the newly merged GTF only contains lines for exons and depending on whether a gene had multiple isoforms, that exon may be repeated on multiple lines. Will HTSeq count know that they are the same gene based on the name or will it think it mapped to two separate genes?
For example: if my original gff contained
... ... Gene 0 100 gene1
... ... Exon 0 50 gene1-ra
... ... Exon 0 50 gene1-rb
... ... Exon 50 100 gene1-rb
I would tell HTSeq count to look at lines with Gene in column 3
My new GTF would have something like
... ... Exon 0 50 gene1-ra; gene1
... ... Exon 0 50 gene1-rb; gene1
... ... Exon 50 100 gene1-rb; gene1
so if it mapped between 0-50 and saw two lines with that address, would it properly count that as 1 gene or think it mapped to multiple locations?
Also, I do not want to proceed with cuffdiff, we want to use HTSeq count
Thanks
I am generating counts, for genes, with HTSeq count and generated a GTF from cuffmerge. My question is, does a GTF file from cufflinks work well with HTSeq count. The original GFF file contained single lines for genes, but the newly merged GTF only contains lines for exons and depending on whether a gene had multiple isoforms, that exon may be repeated on multiple lines. Will HTSeq count know that they are the same gene based on the name or will it think it mapped to two separate genes?
For example: if my original gff contained
... ... Gene 0 100 gene1
... ... Exon 0 50 gene1-ra
... ... Exon 0 50 gene1-rb
... ... Exon 50 100 gene1-rb
I would tell HTSeq count to look at lines with Gene in column 3
My new GTF would have something like
... ... Exon 0 50 gene1-ra; gene1
... ... Exon 0 50 gene1-rb; gene1
... ... Exon 50 100 gene1-rb; gene1
so if it mapped between 0-50 and saw two lines with that address, would it properly count that as 1 gene or think it mapped to multiple locations?
Also, I do not want to proceed with cuffdiff, we want to use HTSeq count
Thanks