So I have some RNA-seq data for prokaryotes. Lets say I have strain A and strain B, and for each I have two replicates and two conditions. I want to do differential expression on these.
Now, if I align my reads from strain A to the reference genome for strain A with bowtie2, fine. Then I align my reads from strain B to reference genome A. Still mostly good.
Here's the thing. This is the exact same read set, just aligned to two slightly different reference genomes. They should be the exact same for orthologous genes, or at least really close. But there are some genes that just show different numbers for certain genes.
Lets say gene X is from strain A. Gene X has an ortholog, gene Y, in strain B. When you blast these sequences, are no mutiations. 100% identity. About 600 bases long. Let's say the read counts are as follows for the genes and their replicates.
So my question is, how can this happen? It must have happened at the bowtie2 alignment step, but why? If it has multiple possible matching locations wouldn't there be a gene out there with the missing reads that I could find? Also, if it uses random seeding for alignment, shouldnt the two replicate runs have been different if that was the case? What could cause this sort of thing to happen? I would be happy to hear any thoughts. Thanks!
Now, if I align my reads from strain A to the reference genome for strain A with bowtie2, fine. Then I align my reads from strain B to reference genome A. Still mostly good.
Here's the thing. This is the exact same read set, just aligned to two slightly different reference genomes. They should be the exact same for orthologous genes, or at least really close. But there are some genes that just show different numbers for certain genes.
Lets say gene X is from strain A. Gene X has an ortholog, gene Y, in strain B. When you blast these sequences, are no mutiations. 100% identity. About 600 bases long. Let's say the read counts are as follows for the genes and their replicates.
- Gene X-1: 585
- Gene X-2: 528
- Gene Y-1: 372
- Gene Y-2: 325
So my question is, how can this happen? It must have happened at the bowtie2 alignment step, but why? If it has multiple possible matching locations wouldn't there be a gene out there with the missing reads that I could find? Also, if it uses random seeding for alignment, shouldnt the two replicate runs have been different if that was the case? What could cause this sort of thing to happen? I would be happy to hear any thoughts. Thanks!
Comment