I have aligned my shotgun metagenomics reads to NCBI eukaryotic reference database using Blastn to evaluate the dietary assessment from fecal samples of black bears. I've got the blastn output as a tabular format (outfmt 6). I am currently trying to see if a PCR bias/PCR duplicates is influencing our results. I want to see if the ratio of unique subjects to unique queries differs depending on enriched/non-enriched samples (this might indicate that it is something about the enrichment process rather than the PCR that changes the results). So, I extracted the information regarding unique queries and unique subject sequences using the following commands:
for i in $(ls blastn_out_nt/); do cut -f 1 blastn_out_nt/$i | sort | uniq | wc -l >> query; done
for i in $(ls blastn_out_nt/); do sort -k2,2 blastn_out_nt/$i | cut -f 2,9,10 | uniq | wc -l >> unique_subjects; done
Need to mention here that the first column in the blastn output is query id, the second column is subject id, 9th and 10th columns are the start and end of alignments in the subject. I wanted to verify that the work is error-free and also have an idea about what explains the pattern.
Here's what I have got:

Does a smaller ratio of unique queries and unique subjects potentially indicate that the input fasta sequences were redundant (pcr duplicates) because they hit the same database entry? Also, how I should explain these figures?
for i in $(ls blastn_out_nt/); do cut -f 1 blastn_out_nt/$i | sort | uniq | wc -l >> query; done
for i in $(ls blastn_out_nt/); do sort -k2,2 blastn_out_nt/$i | cut -f 2,9,10 | uniq | wc -l >> unique_subjects; done
Need to mention here that the first column in the blastn output is query id, the second column is subject id, 9th and 10th columns are the start and end of alignments in the subject. I wanted to verify that the work is error-free and also have an idea about what explains the pattern.
Here's what I have got:
Does a smaller ratio of unique queries and unique subjects potentially indicate that the input fasta sequences were redundant (pcr duplicates) because they hit the same database entry? Also, how I should explain these figures?