Hi All,
I've carried out a mapping run with Illumina paired-end reads to a genome using BWA. From this I've calculated i) x times coverage and ii) fraction of the reference sequence covered to a depth of at least one read. My supervisor now wants me to get coverage metrics which take into account regions that have no read coverage as they are repetitive elements. In essence he just wants coverage stats for the 'mappable' region.
So, for example, the genome I'm using is c. 2.3Gb in length and c. 50% of this is composed of repeats which reads are unlikely to map to. This will deflate the coverage estimates. So if, say, I have c. 50% of a reference sequence covered at at least 1 read depth; if I minus the 50% of the genome that are repeats then this rises to 100%. What I'm trying to figure out is if I know the annotations info for the repetitive elements can I figure this out with my existing .bam file or will I need to remap to a hard-masked genome or remove the repetitive elements somehow and then figure it out.
I really hope this makes sense (I suspect not!).
Thanks
I've carried out a mapping run with Illumina paired-end reads to a genome using BWA. From this I've calculated i) x times coverage and ii) fraction of the reference sequence covered to a depth of at least one read. My supervisor now wants me to get coverage metrics which take into account regions that have no read coverage as they are repetitive elements. In essence he just wants coverage stats for the 'mappable' region.
So, for example, the genome I'm using is c. 2.3Gb in length and c. 50% of this is composed of repeats which reads are unlikely to map to. This will deflate the coverage estimates. So if, say, I have c. 50% of a reference sequence covered at at least 1 read depth; if I minus the 50% of the genome that are repeats then this rises to 100%. What I'm trying to figure out is if I know the annotations info for the repetitive elements can I figure this out with my existing .bam file or will I need to remap to a hard-masked genome or remove the repetitive elements somehow and then figure it out.
I really hope this makes sense (I suspect not!).
Thanks
Comment