I am using soapaligner+soapsnp to find SNVs in our pair-end DNA data. I read the online manual, and the tools have been running smoothly.
However, there are three things I am not sure about.
First, how should I request soapaligner to report repetitive results for SNV discovery? Should it report none, random or all best-hit of the repetitive alignments of a certain read?
Second, I understand the input for soapsnp should be sorted according to the chromosomes and coordinates. The results from soapaligner is organized in the way that each line corresponds to one end of a pair of reads. After the soapaligner output is sorted, the lines of the two ends of the same read would appear far away from each other, with the alignments of many other reads between them. Is this the correct method to sort the soapaligner result?
Third, the minimal insert size of soapaligner is 400 while the insert size of our data is only 35bp, and so the two ends of a pair of reads could overlap each other. If there is a SNV within the overlapping part of the two ends
of a pair-end read, and the two ends may or may not have the same base, quality score on the SNV, how would the program handle this situation? I read the paper of soapsnp, but I couldn't get a clue.
Thank you for your help.
Update,
The read length is 35bp in our data.
However, there are three things I am not sure about.
First, how should I request soapaligner to report repetitive results for SNV discovery? Should it report none, random or all best-hit of the repetitive alignments of a certain read?
Second, I understand the input for soapsnp should be sorted according to the chromosomes and coordinates. The results from soapaligner is organized in the way that each line corresponds to one end of a pair of reads. After the soapaligner output is sorted, the lines of the two ends of the same read would appear far away from each other, with the alignments of many other reads between them. Is this the correct method to sort the soapaligner result?
Third, the minimal insert size of soapaligner is 400 while the insert size of our data is only 35bp, and so the two ends of a pair of reads could overlap each other. If there is a SNV within the overlapping part of the two ends
of a pair-end read, and the two ends may or may not have the same base, quality score on the SNV, how would the program handle this situation? I read the paper of soapsnp, but I couldn't get a clue.
Thank you for your help.
Update,
The read length is 35bp in our data.
Comment