Hello,
Very sorry for cross posting it on other blog site but I'm under pressure to sort this out.
I tried using HTSeq-count to extract read counts per transcript from the SAM file (generated using Bowtie2 and only uniquely aligned reads were considered) mapped to de novo assembled transcripts (for DE analysis). I made GTF file for the assembled transcripts FASTA file with a Perl script. Here are few lines of my GTF file.
Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 AssembledTranscriptome exon 1 1485 . + . gene_id "AssemTrans1"; transcript_id "Locus_47_Transcript_16/31_Confidence_0.158_Length_1485";
Locus_58_Transcript_85/85_Confidence_0.017_Length_650 AssembledTranscriptome exon 1 650 . + . gene_id "AssemTrans1"; transcript_id "Locus_58_Transcript_85/85_Confidence_0.017_Length_650";
Transcript start is by default 1 and end is the length of the transcript and Strand is + for all.
It looks like it works great but I'm not sure if this is the right way to do it. Don't know if I have to worry about what Simon Anders as mentioned "If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene."
Any thoughts/comments/suggestions are much appreciated.
Thanks,
Alan
Very sorry for cross posting it on other blog site but I'm under pressure to sort this out.
I tried using HTSeq-count to extract read counts per transcript from the SAM file (generated using Bowtie2 and only uniquely aligned reads were considered) mapped to de novo assembled transcripts (for DE analysis). I made GTF file for the assembled transcripts FASTA file with a Perl script. Here are few lines of my GTF file.
Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 AssembledTranscriptome exon 1 1485 . + . gene_id "AssemTrans1"; transcript_id "Locus_47_Transcript_16/31_Confidence_0.158_Length_1485";
Locus_58_Transcript_85/85_Confidence_0.017_Length_650 AssembledTranscriptome exon 1 650 . + . gene_id "AssemTrans1"; transcript_id "Locus_58_Transcript_85/85_Confidence_0.017_Length_650";
Transcript start is by default 1 and end is the length of the transcript and Strand is + for all.
It looks like it works great but I'm not sure if this is the right way to do it. Don't know if I have to worry about what Simon Anders as mentioned "If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene."
Any thoughts/comments/suggestions are much appreciated.
Thanks,
Alan
Comment