Hi All,
We recently started to do PE RNA seq and we encountered a problem the cause of which have eluded us so far. In summary, following cycle 30 we see rapid increase in alignment error. The error rates per cycle go up to 20% and the % reads with 1 or more mismatches goes up to 35-50%.
What we have found is:
1) The error is alingner-nonspecific ( we tested eland, bowtie and even commercial software).
2) The error is not due to contamination with amplified adapters or low quality scores for the bases in the second half of the amplicons.
3) The problem is RNA-specific. Both PhiX and genomic DNA samples had normal error and alignment plots.
4) The problem seems to depend on both the length of the insert and the length of the run. During the same run, shorter insert libraries have more misaligned reads than libraries consisting of longer fragments. A given insert size library will have more misaligned reads during a longer run.
5) There is nothing visibly and strongly wrong with the IVC plots of the affected samples. The only striking observation is that both the % basecall as well as (even more strongly) the % all intensities from the affected samples form some type of ‘fork’ structures in the later cycles of each read.
Has anybody heard about such a problem or encountered it? Any suggestions are welcomed.
We know that the problem is related to the sample prep and taking longer reads ( >200bp insert size) will help alleviate it but we wonder what is the cause of it?
We recently started to do PE RNA seq and we encountered a problem the cause of which have eluded us so far. In summary, following cycle 30 we see rapid increase in alignment error. The error rates per cycle go up to 20% and the % reads with 1 or more mismatches goes up to 35-50%.
What we have found is:
1) The error is alingner-nonspecific ( we tested eland, bowtie and even commercial software).
2) The error is not due to contamination with amplified adapters or low quality scores for the bases in the second half of the amplicons.
3) The problem is RNA-specific. Both PhiX and genomic DNA samples had normal error and alignment plots.
4) The problem seems to depend on both the length of the insert and the length of the run. During the same run, shorter insert libraries have more misaligned reads than libraries consisting of longer fragments. A given insert size library will have more misaligned reads during a longer run.
5) There is nothing visibly and strongly wrong with the IVC plots of the affected samples. The only striking observation is that both the % basecall as well as (even more strongly) the % all intensities from the affected samples form some type of ‘fork’ structures in the later cycles of each read.
Has anybody heard about such a problem or encountered it? Any suggestions are welcomed.
We know that the problem is related to the sample prep and taking longer reads ( >200bp insert size) will help alleviate it but we wonder what is the cause of it?