I am analyzing an RNASeq data set. I have screened reads in Trimmomatic and aligned them using bwa mem. The reads are mostly 100bp long, but a minority are as short as 30bp. Reads are submitted in two fastq files and are paired end. The input files each have a total of 61,383,869 reads.
I get the following message in the bwa standard output:
[M::main_mem] read 109456 sequences (10000139 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (45, 19766, 45, 52)
Can anyone tell me precisely what the above means? Why did bwa read only 109,456 sequences from the millions in the fastq files?
As only 19,908 "unique pairs" in total were reported, does this mean that the rest of the read data were all PCR duplicates?
I get the following message in the bwa standard output:
[M::main_mem] read 109456 sequences (10000139 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (45, 19766, 45, 52)
Can anyone tell me precisely what the above means? Why did bwa read only 109,456 sequences from the millions in the fastq files?
As only 19,908 "unique pairs" in total were reported, does this mean that the rest of the read data were all PCR duplicates?