Hi everyone, I mapped my medip-seq on mouse liver genome single end data with Bowtie 2 and only 10% of reads map to the mouse genome once, so only 1 million reads are "unique". Large proportion of reads either didn't map (12-17%) or mapped more than once (~75%). Here are the statistics below. My questions are:
1) What could have caused this? Is this a biological or technical artifact?
2) Is this data set still good?
3) Should I only use the unique reads for downstream analyses?
I appreciate all your help!! Thank you.
Sample Total reads Aligned 0 times Aligned 1 time Aligned >1 times
G109_M1 20872364 2912778 (13.96%) 1923779 (9.22%) 16035807 (76.83%)
G109_M2 17193030 2771061 (16.12%) 1460643 (8.50%) 12961326 (75.39%)
G109_M3 16982463 2786419 (16.41%) 1934671 (11.39%) 12261373 (72.20%)
G109_M4 18290035 2242861 (12.26%) 1584429 (8.66%) 14462745 (79.07%)
G109_M6 22931771 3452390 (15.06%) 2257734 (9.85%) 17221647 (75.10%)
G109_M7 17036045 2166101 (12.71%) 1466390 (8.61%) 13403554 (78.68%)
G109_M8 20080531 2736002 (13.63%) 1936385 (9.64%) 15408144 (76.73%)
G109_M9 24497398 4164766 (17.00%) 2681647 (10.95%) 17650985 (72.05%)
1) What could have caused this? Is this a biological or technical artifact?
2) Is this data set still good?
3) Should I only use the unique reads for downstream analyses?
I appreciate all your help!! Thank you.
Sample Total reads Aligned 0 times Aligned 1 time Aligned >1 times
G109_M1 20872364 2912778 (13.96%) 1923779 (9.22%) 16035807 (76.83%)
G109_M2 17193030 2771061 (16.12%) 1460643 (8.50%) 12961326 (75.39%)
G109_M3 16982463 2786419 (16.41%) 1934671 (11.39%) 12261373 (72.20%)
G109_M4 18290035 2242861 (12.26%) 1584429 (8.66%) 14462745 (79.07%)
G109_M6 22931771 3452390 (15.06%) 2257734 (9.85%) 17221647 (75.10%)
G109_M7 17036045 2166101 (12.71%) 1466390 (8.61%) 13403554 (78.68%)
G109_M8 20080531 2736002 (13.63%) 1936385 (9.64%) 15408144 (76.73%)
G109_M9 24497398 4164766 (17.00%) 2681647 (10.95%) 17650985 (72.05%)