So, I have this file with 1,314 sequences (file1) and another file with 566 sequences (file2). Each sequence has a proper barcode. What I can't understand is that in file1 the letters following the barcode are (number is count):
AAAGGGGCGCAGCAGGCGCG 1
AACGGGGCGACAGCAGGCGC 1
ACCGGGCGCAGCAGGCGCGA 1
ACGGGCGCAGCAGGCGCGAA 8
ACGGGCGCTAGCAGGCGCGA 1
ACGGGGACGCAGCAGGCGGA 1
ACGGGGCACAGCAGGCGCGA 1
ACGGGGCACAGTAGGCGCGA 1
ACGGGGCCAGCAGGCGCGAA 2
ACGGGGCCGCAGCAGGCGCG 2
ACGGGGCGAAAGCAGGCGCG 1
ACGGGGCGAGCAGGCGCGAA 1
ACGGGGCGCAAGCAGGCGCG 1
ACGGGGCGCACGCAGGCGCG 1
ACGGGGCGCAGCAAAGCGCG 1
ACGGGGCGCAGCAAGCGCGA 3
ACGGGGCGCAGCAAGGCGCG 3
ACGGGGCGCAGCACGCGCGA 1
ACGGGGCGCAGCACGGCGCC 1
ACGGGGCGCAGCACGGCGCG 3
ACGGGGCGCAGCACGGCGGC 4
ACGGGGCGCAGCAGACGCGA 4
ACGGGGCGCAGCAGGAGCGA 1
ACGGGGCGCAGCAGGCACGA 3
ACGGGGCGCAGCAGGCGACG 5
ACGGGGCGCAGCAGGCGCCG 1
ACGGGGCGCAGCAGGCGCGA 1219
ACGGGGCGCAGCAGGCGGCG 2
ACGGGGCGCAGCAGGCTCGA 1
ACGGGGCGCAGCATGCGCGA 1
ACGGGGCGCAGCTACGGCGC 3
ACGGGGCGCAGGCAGGCGCG 1
ACGGGGCGCAGTAGGCGCGA 1
ACGGGGCGCAGTCACGGCGC 1
ACGGGGCGCAGTCAGGCCGC 1
ACGGGGCGCAGTCAGGCGCG 5
ACGGGGCGCCGCAGGCGCGA 3
ACGGGGCGCGCAGGCGCGAA 1
ACGGGGCGCTAGTCAGGCGC 2
ACGGGGCGGCAGCAGGCGCG 3
ACGGGGGCGCAGCAGGCGCG 13
ACGGGGGGCGCAGCAGGCGC 1
ACGGGTCGCAGCAGCGCGAA 1
AGTGGGGCGCAGCAGGCGCG 1
ATGGGGCGCAGCAGGCGCGA 1
GACGGGGCGCAGCAGGCGAC 3
GACGGGGCGCAGCAGGCGCG 2
However, in file2:
AACGGGGCGTCAGCAGGGCG 1
ACGGGCCGCAGCAGGGCGCG 1
ACGGGCGCAGCAGGCGCGAA 2
ACGGGCGCAGCAGGGCGCGA 151
ACGGGCGCTAGCACGGCGGC 1
ACGGGCGCTAGCACGGGCGG 1
ACGGGCGCTAGCAGGCGCGA 1
ACGGGCGCTAGCAGGCGCGG 2
ACGGGCGCTAGCAGGGCGCG 1
ACGGGCGCTAGCAGGGCGGC 1
ACGGGCGGCAGCAGGGCGCG 1
ACGGGCGTCAGCAGGACGCG 1
ACGGGCGTCAGCAGGCGGCG 3
ACGGGCGTCAGCAGGGCGCG 3
ACGGGCGTCAGCAGGGCGGC 1
ACGGGCGTCAGCAGGGGCGC 1
ACGGGGCCAGCAGGGCGCGA 3
ACGGGGCCGCAGCAGGGCGC 2
ACGGGGCGAAGCAGGGCGCG 1
ACGGGGCGAGCAGGGCGCGA 1
ACGGGGCGCAGCAAGCGCGA 1
ACGGGGCGCAGCAGCGCGAA 1
ACGGGGCGCAGCAGGGCGAC 2
ACGGGGCGCAGCAGGGCGCG 340
ACGGGGCGCAGCAGGGCGGC 8
ACGGGGCGCAGCAGGGGCGC 1
ACGGGGCGCAGCAGGGTGCG 4
ACGGGGCGCAGCAGTCGCGA 1
ACGGGGCGCTAGCAGGGCGC 1
ACGGGGCGCTAGCAGGGCGG 1
ACGGGGCGTCAGCAGGGCGC 4
ACGGGGCGTCAGCAGGGCGG 4
ACGGGGCGTCAGCAGGGGCG 5
ACGGGGCGTCAGCTAGGGGC 1
ACGGGGCTAGCAGGCGCGGA 1
ACGGGGTCGCAGCAGGGCGC 5
ACGGGGTCGCAGCAGGGGCG 3
ACGGGTCGCAGCAGGGCGCG 1
ACGGGTGCAGCAGGGCGCGA 1
ACGGGTGTCAGCAGGGCGCG 1
ATGGGGGCGCAGCAGGGCGC 1
CCGGGGCGCAGCAGGGCGCG 1
GACGGGCGCAGCAGGGCGCG 1
GACGGGGCGCAGCAGGGCGA 1
GACGGGGCGCAGCAGGGCGC 1
The same forward primer 'ACGGGGCGCAGCAGGCGCGA' (the most abundant seq in file1) was used for both files. It's not even present in file2. How can this be? It's as if file2 is barcoded but the primer has been removed or something?
AAAGGGGCGCAGCAGGCGCG 1
AACGGGGCGACAGCAGGCGC 1
ACCGGGCGCAGCAGGCGCGA 1
ACGGGCGCAGCAGGCGCGAA 8
ACGGGCGCTAGCAGGCGCGA 1
ACGGGGACGCAGCAGGCGGA 1
ACGGGGCACAGCAGGCGCGA 1
ACGGGGCACAGTAGGCGCGA 1
ACGGGGCCAGCAGGCGCGAA 2
ACGGGGCCGCAGCAGGCGCG 2
ACGGGGCGAAAGCAGGCGCG 1
ACGGGGCGAGCAGGCGCGAA 1
ACGGGGCGCAAGCAGGCGCG 1
ACGGGGCGCACGCAGGCGCG 1
ACGGGGCGCAGCAAAGCGCG 1
ACGGGGCGCAGCAAGCGCGA 3
ACGGGGCGCAGCAAGGCGCG 3
ACGGGGCGCAGCACGCGCGA 1
ACGGGGCGCAGCACGGCGCC 1
ACGGGGCGCAGCACGGCGCG 3
ACGGGGCGCAGCACGGCGGC 4
ACGGGGCGCAGCAGACGCGA 4
ACGGGGCGCAGCAGGAGCGA 1
ACGGGGCGCAGCAGGCACGA 3
ACGGGGCGCAGCAGGCGACG 5
ACGGGGCGCAGCAGGCGCCG 1
ACGGGGCGCAGCAGGCGCGA 1219
ACGGGGCGCAGCAGGCGGCG 2
ACGGGGCGCAGCAGGCTCGA 1
ACGGGGCGCAGCATGCGCGA 1
ACGGGGCGCAGCTACGGCGC 3
ACGGGGCGCAGGCAGGCGCG 1
ACGGGGCGCAGTAGGCGCGA 1
ACGGGGCGCAGTCACGGCGC 1
ACGGGGCGCAGTCAGGCCGC 1
ACGGGGCGCAGTCAGGCGCG 5
ACGGGGCGCCGCAGGCGCGA 3
ACGGGGCGCGCAGGCGCGAA 1
ACGGGGCGCTAGTCAGGCGC 2
ACGGGGCGGCAGCAGGCGCG 3
ACGGGGGCGCAGCAGGCGCG 13
ACGGGGGGCGCAGCAGGCGC 1
ACGGGTCGCAGCAGCGCGAA 1
AGTGGGGCGCAGCAGGCGCG 1
ATGGGGCGCAGCAGGCGCGA 1
GACGGGGCGCAGCAGGCGAC 3
GACGGGGCGCAGCAGGCGCG 2
However, in file2:
AACGGGGCGTCAGCAGGGCG 1
ACGGGCCGCAGCAGGGCGCG 1
ACGGGCGCAGCAGGCGCGAA 2
ACGGGCGCAGCAGGGCGCGA 151
ACGGGCGCTAGCACGGCGGC 1
ACGGGCGCTAGCACGGGCGG 1
ACGGGCGCTAGCAGGCGCGA 1
ACGGGCGCTAGCAGGCGCGG 2
ACGGGCGCTAGCAGGGCGCG 1
ACGGGCGCTAGCAGGGCGGC 1
ACGGGCGGCAGCAGGGCGCG 1
ACGGGCGTCAGCAGGACGCG 1
ACGGGCGTCAGCAGGCGGCG 3
ACGGGCGTCAGCAGGGCGCG 3
ACGGGCGTCAGCAGGGCGGC 1
ACGGGCGTCAGCAGGGGCGC 1
ACGGGGCCAGCAGGGCGCGA 3
ACGGGGCCGCAGCAGGGCGC 2
ACGGGGCGAAGCAGGGCGCG 1
ACGGGGCGAGCAGGGCGCGA 1
ACGGGGCGCAGCAAGCGCGA 1
ACGGGGCGCAGCAGCGCGAA 1
ACGGGGCGCAGCAGGGCGAC 2
ACGGGGCGCAGCAGGGCGCG 340
ACGGGGCGCAGCAGGGCGGC 8
ACGGGGCGCAGCAGGGGCGC 1
ACGGGGCGCAGCAGGGTGCG 4
ACGGGGCGCAGCAGTCGCGA 1
ACGGGGCGCTAGCAGGGCGC 1
ACGGGGCGCTAGCAGGGCGG 1
ACGGGGCGTCAGCAGGGCGC 4
ACGGGGCGTCAGCAGGGCGG 4
ACGGGGCGTCAGCAGGGGCG 5
ACGGGGCGTCAGCTAGGGGC 1
ACGGGGCTAGCAGGCGCGGA 1
ACGGGGTCGCAGCAGGGCGC 5
ACGGGGTCGCAGCAGGGGCG 3
ACGGGTCGCAGCAGGGCGCG 1
ACGGGTGCAGCAGGGCGCGA 1
ACGGGTGTCAGCAGGGCGCG 1
ATGGGGGCGCAGCAGGGCGC 1
CCGGGGCGCAGCAGGGCGCG 1
GACGGGCGCAGCAGGGCGCG 1
GACGGGGCGCAGCAGGGCGA 1
GACGGGGCGCAGCAGGGCGC 1
The same forward primer 'ACGGGGCGCAGCAGGCGCGA' (the most abundant seq in file1) was used for both files. It's not even present in file2. How can this be? It's as if file2 is barcoded but the primer has been removed or something?