This is a purely academic question since these particular transcripts are not of particular interest to us (and they're only bad on one out of 71 lanes), but I'm curious as to what the theory may be for what happened to these particular reads.
The setup is that we've run 71 Arabidopsis samples, one per lane, at Otogenetics, and they sent the data over to DNANexus for read mapping against TAIR9. I've downloaded the mapped read counts and used them in DESeq, edgeR, etc. The data is really good, quite consistent across the lanes for almost all transcripts.
Except this one lane, lane #30, has six huge outliers in neighboring transcripts on chromosome 4:
at4g12470 799 bp from Chr4:7,401,109..7,401,907
at4g12480 833 bp from Chr4:7,406,105..7,406,937
at4g12490 786 bp from Chr4:7,409,621..7,410,406
at4g12500 778 bp from Chr4:7,414,150..7,414,927
at4g12510 568 bp from Chr4:7,417,236..7,417,803
at4g12520 683 bp from Chr4:7,421,056..7,421,738
I've attached a plot below (SVG) that shows these six plus the neighboring two transcripts that look perfectly normal. If this were a microarray assay I'd say there was a scratch on the plate; but with RNA-Seq, any simple explanation?
The setup is that we've run 71 Arabidopsis samples, one per lane, at Otogenetics, and they sent the data over to DNANexus for read mapping against TAIR9. I've downloaded the mapped read counts and used them in DESeq, edgeR, etc. The data is really good, quite consistent across the lanes for almost all transcripts.
Except this one lane, lane #30, has six huge outliers in neighboring transcripts on chromosome 4:
at4g12470 799 bp from Chr4:7,401,109..7,401,907
at4g12480 833 bp from Chr4:7,406,105..7,406,937
at4g12490 786 bp from Chr4:7,409,621..7,410,406
at4g12500 778 bp from Chr4:7,414,150..7,414,927
at4g12510 568 bp from Chr4:7,417,236..7,417,803
at4g12520 683 bp from Chr4:7,421,056..7,421,738
I've attached a plot below (SVG) that shows these six plus the neighboring two transcripts that look perfectly normal. If this were a microarray assay I'd say there was a scratch on the plate; but with RNA-Seq, any simple explanation?
Comment