I'm looking for a way to detect positions in the genome where there is a pileup of soft clipped reads. I've attached an image with an example of what this situation would look like in IGV. Essentially, I'm looking for a tool similar to samtools mpileup, but with the important difference that I want to see soft-clipped reads. I think the issue is that soft-clipped based are not technically aligned at these positions. I know that I could write a script to parse the CIGAR string for each read and detect locations like these, but I'm wondering, is there a tool that can quickly report the locations where reads start getting soft-clipped?
I'm imagining a version of samtools mpileup that would report something like this:
10 141352 N 105 a$A$A$a$aSASASaSaaaaAAaAAaAaAaaaAaAaAAAaAAaaAAaAAaaAaAaAAaAAAaAAaAaAAAAAAaAAaaAaaaaaaaaaAaaaAaAAaAaaAaaAAaaaaAaA^]a @<@?;?>A@@=@????A>@@B@@?A???@@@?>@:>?@@?A@?A?@@>A?@@@?AB??@?@?@?@@?>A@?@>>@>?@@@?A>@>?A>A@A=?>=??>=?=C>9>
where the "$", as usual, means that reads are ending at this position, whereas the "S" would mean that bases are "aligned" and soft-clipped at this position.
I'm imagining a version of samtools mpileup that would report something like this:
10 141352 N 105 a$A$A$a$aSASASaSaaaaAAaAAaAaAaaaAaAaAAAaAAaaAAaAAaaAaAaAAaAAAaAAaAaAAAAAAaAAaaAaaaaaaaaaAaaaAaAAaAaaAaaAAaaaaAaA^]a @<@?;?>A@@=@????A>@@B@@?A???@@@?>@:>?@@?A@?A?@@>A?@@@?AB??@?@?@?@@?>A@?@>>@>?@@@?A>@>?A>A@A=?>=??>=?=C>9>
where the "$", as usual, means that reads are ending at this position, whereas the "S" would mean that bases are "aligned" and soft-clipped at this position.
Comment