The Nanostring Data analysis user manual describes options for determining presence or absence of target transcripts:
However, I am questioning the validity of the second approach: the data are discrete numbers and also obviously present genes with high counts are returned as non-significant.
I'm looking for any comment on this? What would be a good approach for calling expressed genes?
a. Using the Background Threshold
b. Statistical Tests
1) Using a one-tailed, heteroscedastic t-test, calculate a p-value comparing the replicate gene counts to the background counts.
2) Calculate the average of the replicate gene counts.
3) If the average replicate gene counts are greater than the background threshold, and the p-value is less than your acceptable threshold confidence level, then the gene is detectable in the samples.
b. Statistical Tests
1) Using a one-tailed, heteroscedastic t-test, calculate a p-value comparing the replicate gene counts to the background counts.
2) Calculate the average of the replicate gene counts.
3) If the average replicate gene counts are greater than the background threshold, and the p-value is less than your acceptable threshold confidence level, then the gene is detectable in the samples.
I'm looking for any comment on this? What would be a good approach for calling expressed genes?