Hi everyone,
I have 1,377,239 paired reads, 250bp each, 82,486 of them have been soft clipped for more than 200bp (eg a read with CIGAR string like 150S30M70S), is this normal?
These data come from a small targeted sequencing project (~160Kb region). I checked my exome data set, none of the 50 millions reads show >200bp soft clipping. I kind of guess this could be due to contamination, but am not sure since I'm not the library preparation guy.
command for counting soft clipping reads
aligner:
BWA-MEM 0.7.4-r385
sequencer:
MiSeq
Thanks a lot!
I have 1,377,239 paired reads, 250bp each, 82,486 of them have been soft clipped for more than 200bp (eg a read with CIGAR string like 150S30M70S), is this normal?
These data come from a small targeted sequencing project (~160Kb region). I checked my exome data set, none of the 50 millions reads show >200bp soft clipping. I kind of guess this could be due to contamination, but am not sure since I'm not the library preparation guy.
command for counting soft clipping reads
Code:
perl -ne '@f=split /\t/; @a= $f[5]=~/(\d+)S/g; print "$_\n" if $a[0]+$a[1]>200' 1.1.sort.sam | wc -l
BWA-MEM 0.7.4-r385
sequencer:
MiSeq
Thanks a lot!
Comment