Hi,
I am helping a colleague to conduct a differential expression analysis with RNAseq data but I have some concerns about the expression levels stated in the analysis. Based on the design of the experiment, my colleague states that protein A controls the stability of protein B; when prot A is reduced, prot B increases.
In the benchwork, my colleague used an shRNA against prot A and prot B (independently) and saw a significant reduction in the expressions (for western); I believe the shRNA targets the mRNA levels of the protein. Basically, the bench work seems to be validated.
She conducted the same experiment and sent the samples for RNA sequencing. Prior to the library preparation, the samples were subjected to rRNA depletion. When the datasets came back, I aligned them with STAR alignment, and processed them with Rsubread and DESeq2; I check the padj values for significance. I found two strange findings - (1) shRNA A was able to significantly reduce prot A, but prot B was also reduced slightly (not significantly though), and (2) shRNA B was not able to significantly reduce prot B.
I checked the PCA plots and they seemed alright; consistent patterns and clear distinguishing features between batch and treatment. Checked counts via counts['gene',] and saw very similar numbers.
Here are my questions - is it common to find an shRNA significantly reduce during benchwork, but RNAseq data not able to detect the difference? Is it then acceptable to take the results as it is, and use it for publication? Because our concern is that the reviewers will question "why would we accept the data when we used an shRNA, and not see significant reduction in the RNAseq datasets"? Would it now be mandatory for us to repeat the experiment to get the proper readouts? Is there a way for me to check in the genome browser (or any programs for that matter) to see where the RNAseq datasets have gone wrong? Usually RNA sequencing in companies does 30 million reads. Would 30 million reads be sufficient to encompass the whole library? Does the number of reads on a gene equate to the number of counts or is there an algorithm to convert the reads to counts? If they are equivalent, then wouldn't that mean requesting more reads (e.g. the standard 10million reads to 30million reads) be over-representing genes' counts?
I am helping a colleague to conduct a differential expression analysis with RNAseq data but I have some concerns about the expression levels stated in the analysis. Based on the design of the experiment, my colleague states that protein A controls the stability of protein B; when prot A is reduced, prot B increases.
In the benchwork, my colleague used an shRNA against prot A and prot B (independently) and saw a significant reduction in the expressions (for western); I believe the shRNA targets the mRNA levels of the protein. Basically, the bench work seems to be validated.
She conducted the same experiment and sent the samples for RNA sequencing. Prior to the library preparation, the samples were subjected to rRNA depletion. When the datasets came back, I aligned them with STAR alignment, and processed them with Rsubread and DESeq2; I check the padj values for significance. I found two strange findings - (1) shRNA A was able to significantly reduce prot A, but prot B was also reduced slightly (not significantly though), and (2) shRNA B was not able to significantly reduce prot B.
I checked the PCA plots and they seemed alright; consistent patterns and clear distinguishing features between batch and treatment. Checked counts via counts['gene',] and saw very similar numbers.
Here are my questions - is it common to find an shRNA significantly reduce during benchwork, but RNAseq data not able to detect the difference? Is it then acceptable to take the results as it is, and use it for publication? Because our concern is that the reviewers will question "why would we accept the data when we used an shRNA, and not see significant reduction in the RNAseq datasets"? Would it now be mandatory for us to repeat the experiment to get the proper readouts? Is there a way for me to check in the genome browser (or any programs for that matter) to see where the RNAseq datasets have gone wrong? Usually RNA sequencing in companies does 30 million reads. Would 30 million reads be sufficient to encompass the whole library? Does the number of reads on a gene equate to the number of counts or is there an algorithm to convert the reads to counts? If they are equivalent, then wouldn't that mean requesting more reads (e.g. the standard 10million reads to 30million reads) be over-representing genes' counts?
Comment