Hello,
I used BBmap to find the coverage of a draft genome I have with this command:
Now I'd like to use the coverage on each scaffold (reported in the output covstats.txt) to identify scaffolds that might be repeats.
The way I want to do this is to look at the average coverage on all scaffolds (which I get from the stdout of BBMap), for example 70x, and see which scaffolds have double that coverage, 140x, or triple, 210x, and so on, implying those scaffolds are repeated once and twice, respectively. Do you think this is a reasonable approach to determine repeat scaffolds from an assembly?
Please let me know what you think.
I used BBmap to find the coverage of a draft genome I have with this command:
Code:
bbmap.sh in1=reads1.fq in2=reads2.fq ref=scaffolds.fasta covstats=covstats.txt
The way I want to do this is to look at the average coverage on all scaffolds (which I get from the stdout of BBMap), for example 70x, and see which scaffolds have double that coverage, 140x, or triple, 210x, and so on, implying those scaffolds are repeated once and twice, respectively. Do you think this is a reasonable approach to determine repeat scaffolds from an assembly?
Please let me know what you think.
Comment