Hi all,
I have a little question concerning HTSeq and DESeq.
I am dealing with some bacterial RNAseq data (control + test, 2 biological replicates each). Data were obtained from Illumina, single-end 50bp.
I was able to map these reads to my reference genome and generate SAM and BAM files for the alignment. Now, I would like to get some statistics done over it in order to get a list of genes that are statistically up or down regulated in between my control and test condition. As I saw here, I should be able to do that with DESeq package. So, in order to use DESeq I need a well-organized table with genes count. I thought about using HTSeq for getting such a input table. So, I installed HTSeq and I run my first SAM file with following command line:
htseq-count -m intersection-nonempty -s no -t gene -i locus_tag alignWT1.sam NC_013316.gff -o WT1_out
After the run was complete, I had printed in the command line genes locus_tag with genes count and also
no feature: 9034352
ambiguous: 958299
not aligned: 2097930
That's okay (at least I think) since I had more then 37 millions reads for this replicate.
Now, the problem is what to do next! I looked at the header of my WT1_out file, and there is way more information then it should be if I want to use this table for DESeq. So, here are my questions:
Any help would be more then welcome. Thanks in advance guys!
I have a little question concerning HTSeq and DESeq.
I am dealing with some bacterial RNAseq data (control + test, 2 biological replicates each). Data were obtained from Illumina, single-end 50bp.
I was able to map these reads to my reference genome and generate SAM and BAM files for the alignment. Now, I would like to get some statistics done over it in order to get a list of genes that are statistically up or down regulated in between my control and test condition. As I saw here, I should be able to do that with DESeq package. So, in order to use DESeq I need a well-organized table with genes count. I thought about using HTSeq for getting such a input table. So, I installed HTSeq and I run my first SAM file with following command line:
htseq-count -m intersection-nonempty -s no -t gene -i locus_tag alignWT1.sam NC_013316.gff -o WT1_out
After the run was complete, I had printed in the command line genes locus_tag with genes count and also
no feature: 9034352
ambiguous: 958299
not aligned: 2097930
That's okay (at least I think) since I had more then 37 millions reads for this replicate.
Now, the problem is what to do next! I looked at the header of my WT1_out file, and there is way more information then it should be if I want to use this table for DESeq. So, here are my questions:
- how to sort the "out" file in order to obtain a nice table required by DESeq? I read in another thread that I could use:
sort -g -r -k 2 <outfile>
Is that a good way to procede?
- Even if I get that nicely organized table with HTSeq, that would be for only one replicate. How can I manage to get a table containing all of my conditions?
Any help would be more then welcome. Thanks in advance guys!
Comment