Hi All
I am currently analysing captured sequence data (DNA sequencing using illumina platform), with the aim of finding somatic mutations.
Can you please advice on the following issues:
1. Shall I align the reads to the captured gene sequences or to the whole genome (and then select for those reads that fall within interesting regions)?
2. Is it important to remove duplicated reads before SNP calling? duplicated reads can be due to PCR artifact, but since I deal with captured data maybe there is higher chance to get duplicated reads?
For example for one SNP, before removing duplicated reads I get 515 reads supporting that region, but after removing duplicates, this number dropped to only 4 reads.
3. Would you recommend to filter out SNPs are indicated only from one strand?
Thanks
Mali
I am currently analysing captured sequence data (DNA sequencing using illumina platform), with the aim of finding somatic mutations.
Can you please advice on the following issues:
1. Shall I align the reads to the captured gene sequences or to the whole genome (and then select for those reads that fall within interesting regions)?
2. Is it important to remove duplicated reads before SNP calling? duplicated reads can be due to PCR artifact, but since I deal with captured data maybe there is higher chance to get duplicated reads?
For example for one SNP, before removing duplicated reads I get 515 reads supporting that region, but after removing duplicates, this number dropped to only 4 reads.
3. Would you recommend to filter out SNPs are indicated only from one strand?
Thanks
Mali
Comment