I have a problem with tophat v2.1 outputting read lengths that are extremely large and causing problems downstream. I have extracted the read lengths from sam file and plotted a histogram to see that majority of my read lengths are of normal size but there are some that are 10,000-100,000 bp (< 1%). This causes my data have long stretches of a chromosome which contain a constant signal over 100,000 bp (basically a straight line across the entire region).
Does TopHat have a way to filter out these erroneous reads? I have made a python script to filter out these reads but I'd rather do it in a one-step approach as opposed to having to go through multiple steps.
Thank you!
Does TopHat have a way to filter out these erroneous reads? I have made a python script to filter out these reads but I'd rather do it in a one-step approach as opposed to having to go through multiple steps.
Thank you!
Comment