Hello,
Our core recently acquired an Illumina system and we've been working through the bench and computational workflows of genome, exome, ChIP, and RNA-sequencing.
This question is about RNA-seq... My boss came up with a modified random hexamer [not a hexamer, actually] so that we can conserve strandedness. The sequence is actually NNNNN[AGCT]NNNNNN where there is a defined 12 base sequence between the random bases. The point was to see which end of the read this sequence shows up on, which would tell us which strand it came from.
Unfortunately, when he came up with the design he didn't count on the fact that reads with this sequence (half of the reads) would fail to map. I can easily trim this primer sequence with the usual tools, but he wants to retain it so that we know which end of the read it came from...we just want to ignore it, essentially.
I have not found software that will soft clip reads based on adapter, vector, or other contaminating sequences...they all shorten the reads. The only work-around I've thought of is to identify this primer sequence in the reads, and to modify [lower] the Q-score of the base so that it can be masked out and soft clipped in the next step. This would then keep the sequence around, but not allow that portion of the read to contribute to alignment...that's the idea anyways.
But I'm hoping there is a more straight-forward way to do this, as it's quite a hack of the data. Does anyone have any ideas they care to share?
Thanks!
Our core recently acquired an Illumina system and we've been working through the bench and computational workflows of genome, exome, ChIP, and RNA-sequencing.
This question is about RNA-seq... My boss came up with a modified random hexamer [not a hexamer, actually] so that we can conserve strandedness. The sequence is actually NNNNN[AGCT]NNNNNN where there is a defined 12 base sequence between the random bases. The point was to see which end of the read this sequence shows up on, which would tell us which strand it came from.
Unfortunately, when he came up with the design he didn't count on the fact that reads with this sequence (half of the reads) would fail to map. I can easily trim this primer sequence with the usual tools, but he wants to retain it so that we know which end of the read it came from...we just want to ignore it, essentially.
I have not found software that will soft clip reads based on adapter, vector, or other contaminating sequences...they all shorten the reads. The only work-around I've thought of is to identify this primer sequence in the reads, and to modify [lower] the Q-score of the base so that it can be masked out and soft clipped in the next step. This would then keep the sequence around, but not allow that portion of the read to contribute to alignment...that's the idea anyways.
But I'm hoping there is a more straight-forward way to do this, as it's quite a hack of the data. Does anyone have any ideas they care to share?
Thanks!
Comment