Hi everyone!
I’m currently analyzing NGS data and am looking for advice on the best approach to calculate gene coverage on a base-by-base level using .bam files.
Initially, I merged my .bam files and used samtools depth to assess coverage. Then, I mapped the coverage data to specific genes using a .bed file with the relevant coordinates. However, I'm now uncertain if merging all samples like this is a reliable approach, although they were all aligned to the same reference.
To address this, I also tried calculating coverage separately for each sample, averaging the base coverage across samples for each gene. This, however, gave unexpectedly lower and quite different values compared to the merged approach.
My main goal is to assess the gene performance/coverage as a whole, rather than focusing solely on individual sample coverage. I need the base-by-base details to highlight any regions that may lack adequate coverage. Given this, I would like to know:
Any guidance or shared experience would be really helpful. Thanks in advance!
I’m currently analyzing NGS data and am looking for advice on the best approach to calculate gene coverage on a base-by-base level using .bam files.
Initially, I merged my .bam files and used samtools depth to assess coverage. Then, I mapped the coverage data to specific genes using a .bed file with the relevant coordinates. However, I'm now uncertain if merging all samples like this is a reliable approach, although they were all aligned to the same reference.
To address this, I also tried calculating coverage separately for each sample, averaging the base coverage across samples for each gene. This, however, gave unexpectedly lower and quite different values compared to the merged approach.
My main goal is to assess the gene performance/coverage as a whole, rather than focusing solely on individual sample coverage. I need the base-by-base details to highlight any regions that may lack adequate coverage. Given this, I would like to know:
- Is merging .bam files and using samtools depth a valid approach for obtaining gene-level coverage?
- Does averaging base coverage across samples accurately reflect gene coverage, and what factors could cause discrepancies between these two methods?
- Any recommended best practices for calculating detailed, base-level coverage for specific genes?
Any guidance or shared experience would be really helpful. Thanks in advance!