I am using single digest RAD-Seq for analysing the population structure of a protist species, using the dual barcoding by i7 index barcodes and i5 inline barcodes. First two libraries consisting of 9 samples were sequenced on Illumina MiSeq (2x150 and 2x250), and the samples were biogeographically separated as expected. However, in two subsequent runs of libraries (80 samples) sequenced on Illumina HiSeq 2500 (2x100) the samples were artificially grouped by i5 barcodes.

I will be very happy for any help or hint, as I have no idea how this structure can be obtained. Of course, the indexes were removed prior analysing the data. The library preparation pipeline is the same for MiSeq and HiSeq libraries, with the sole difference in the number of samples pooled together. Concerning the sequencing, one difference I am aware of is in the number of cycles. MiSeq used three cycles (two read cycles, one index cycle), whereas HiSeq used four cycles (R1+R4 read cycles, R2 i7 index cycle, R3 i5 index cycle; note we do not have any i5 index barcode).

Attached is a file showing our results in detail. Thank you in advance for any feedback.

