Dear all
We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples.
The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly.
However, we were unable to obtain a gff3 version of our target plasmid genome, the best we can get was a gff2 file.
Then I tried to use htseq-count to summarize the gene counts based on the gtf2 file, which I failed since the 9th column does not contain the required entry i.e. gene_id=XXXX, instead, they are gene_id "XXXX", which cannot be reconigized by the file.
I then wrote a short script and manually changed the 9th field, by transforming gene_id "XXX" to gene_id="XXX".
example of the transformed gff2 file is shown below
This time I sorted the reads and reran the htseq-count, however, I got another type of error:
Can anyway tell me what is going wrong with the procedure and whether there is a way to run htseq-count with gff2 reference file?
Thanks a lot
We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples.
The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly.
However, we were unable to obtain a gff3 version of our target plasmid genome, the best we can get was a gff2 file.
Then I tried to use htseq-count to summarize the gene counts based on the gtf2 file, which I failed since the 9th column does not contain the required entry i.e. gene_id=XXXX, instead, they are gene_id "XXXX", which cannot be reconigized by the file.
I then wrote a short script and manually changed the 9th field, by transforming gene_id "XXX" to gene_id="XXX".
example of the transformed gff2 file is shown below
gi|221630420|ref|NC_011917.1| RefSeq gene 1576 1788 . - . locus_tag=plLF82_002;db_xref=GeneID:7547171
Error: The attribute string seems to contain mismatched quotes.
[Exception type: ValueError, raised in __init__.py:168]
Error occured in line 29 of file /media/max2/RNAseq-coloncancer/Microbiome/LF82_plasmid_new2.gff.
[Exception type: ValueError, raised in __init__.py:168]
Error occured in line 29 of file /media/max2/RNAseq-coloncancer/Microbiome/LF82_plasmid_new2.gff.
Thanks a lot