I don't see why indel-calling needs 4x the coverage of SNP-calling; 20x per ploidy seems fine to me for indel-calling, as it does for snp-calling. In fact, I suggest you mention somewhere on the page that the recommendations are for diploid genomes; you state
The coverage values below apply to most organisms while the read recommendations are for mammalian species with genome sizes of ~3Gb
For CNVs... "1-8x" coverage seems really low to me. I would reject any data that calls virtually anything at 1x. It's important to mention the difference between amplified and unamplified libraries. I don't think amplified libraries are reliable for CNVs, due to amplification biases and randomness. Most of the time, you will probably see a 2x jump in coverage over a duplicated region using highly-amplified 8-fold coverage data... but I would not stake someone's life on that. The bias is reduced as you decrease the number of amplification cycles, but I don't know of a specific study that has analyzed this effect.
Whole-exome sequencing:
Calling a SNP homozygous at 3x coverage will be wrong (purely in terms of hom/het) ~1/8th of the time. I can hardly recommend a process that is wrong 1/8 of the time, though I should mention that when I wrote a variant caller, I got the best results when calling variants as low as 3x coverage. But I still don't recommend it as a guideline for planning things, particularly for exome-capture, which has an inherent ref-bias.
I had very good luck in calling indels from exome-capture data (consistent in trio studies, etc) but I assume it may be highly bait-system dependent. I only know about the ones that were called successfully, not what was missed, and I assume the ref-bias from baits is much more severe on indels than SNPs. So the recommendation of not selecting exome-capture with the intention of looking for indels seems appropriate. But I would still highly recommend people with exome-capture data to look for indels.
Transcriptome Sequencing/RNA-seq:
If people are interested in differential splicing, you should encourage them to use the longest possible reads (and paired reads). Also - the recommendations you have there are for a number of reads; but what is important is the transcriptome coverage, which varies by genome size and % of genome that is coding. I suggest you make your recommendations in terms of transcriptome coverage rather than a set number of reads (which does not consider read length, genome size, or transcriptome size).
I have not directly used the other categories so I'll defer to those who have.
Leave a comment: