Dear all,
I am building a pipeline for 100bp paired-end sequencing and I have a question regarding how htseq-count deals with the output from TopHat and how to best map my reads to features prior to using DESeq.
For each sample I have 2 bam files: one corresponding to alignments of paired-end reads and another containing single reads that didn't have a mate pair. Originally, I merged my 2 bam files, sorted and converted into SAM and ran htseq-count to get gene counts, but I got an "expecting mate pair error". This made sense as single reads coming from merged files lacked mate pairs that htseq-count expected in sorted SAM files.
My questions is: Is running htseq-count separately on these two files and them combining their outputs for each sample biasing my analysis, or does it render the same results as if htseq allowed runs of files containing both paired-end and single reads?
I apologize for the long explanation, and thanks for any help you can give me.
I am building a pipeline for 100bp paired-end sequencing and I have a question regarding how htseq-count deals with the output from TopHat and how to best map my reads to features prior to using DESeq.
For each sample I have 2 bam files: one corresponding to alignments of paired-end reads and another containing single reads that didn't have a mate pair. Originally, I merged my 2 bam files, sorted and converted into SAM and ran htseq-count to get gene counts, but I got an "expecting mate pair error". This made sense as single reads coming from merged files lacked mate pairs that htseq-count expected in sorted SAM files.
My questions is: Is running htseq-count separately on these two files and them combining their outputs for each sample biasing my analysis, or does it render the same results as if htseq allowed runs of files containing both paired-end and single reads?
I apologize for the long explanation, and thanks for any help you can give me.
Comment