Dear group:
I need some help in understanding insert size concept.
I have a targeted exome sequencing data using paired-end approach with 76 bp. I have lots of duplicates in the sam file. I should use rmdup with insert size correctly mentioned.
i was told by technician that insert size is between 150-300 bp.
When I see 9th tag which is inferred insert size in sam file, I have lots of numbers that range from 0 to 100,000.
Since the experiment is done with an insert size 150-300 bp, and BWA inferred insert size has lots of ranges, what number should I use in using rmdup. Heng Li recommends that we should use correct insert size always. If I have range from 150-300 (technician) and SAM file inferred sizes are spanning across wide ranges, Which insert size should I select to remove duplicates and call SNPs.
OR should I make sets of reads that fall into certain ranges and call SNPs in each bin.
Also what is inferred insert size '0' mean and what is 345,039 mean.
thanks
Adrian
I need some help in understanding insert size concept.
I have a targeted exome sequencing data using paired-end approach with 76 bp. I have lots of duplicates in the sam file. I should use rmdup with insert size correctly mentioned.
i was told by technician that insert size is between 150-300 bp.
When I see 9th tag which is inferred insert size in sam file, I have lots of numbers that range from 0 to 100,000.
Since the experiment is done with an insert size 150-300 bp, and BWA inferred insert size has lots of ranges, what number should I use in using rmdup. Heng Li recommends that we should use correct insert size always. If I have range from 150-300 (technician) and SAM file inferred sizes are spanning across wide ranges, Which insert size should I select to remove duplicates and call SNPs.
OR should I make sets of reads that fall into certain ranges and call SNPs in each bin.
Also what is inferred insert size '0' mean and what is 345,039 mean.
thanks
Adrian
Comment