Hello,
Fairly new to bioinformatics and first time posting, hopefully I sound halfway intelligent
My FASTQs contain adapter sequences in the read name. This adapter sequence includes a UMI. However my postprocessing software requires the UMI to be part of the read. Is there some way of inserting the adapter sequence from the read name to the beginning of the read? I'm guessing I'll also need to insert some bogus quality scores as well.
To visualize, I need to turn this:
@M01179:478:000000000-C33YY:1:1101:17276:1520 2:N:0:CTTGTATGTATG
TTGTCGTTCCTTTCTTTTTGTCTCTTTCCTGTACCTCTAG
+
11111333@B1B11AA3A3B1A3B3BFEE333130110A0
into this:
@M01179:478:000000000-C33YY:1:1101:17276:1520 2:N:0:CTTGTATGTATG
CTTGTATGTATGTTGTCGTTCCTTTCTTTTTGTCTCTTTCCTGTACCTCTAG
+
AAAAAAAAAAAA11111333@B1B11AA3A3B1A3B3BFEE333130110A0
This would need to be a scriptable solution, and work on all reads from a High Out-put NextSeq run
Thanks in advance
Fairly new to bioinformatics and first time posting, hopefully I sound halfway intelligent
My FASTQs contain adapter sequences in the read name. This adapter sequence includes a UMI. However my postprocessing software requires the UMI to be part of the read. Is there some way of inserting the adapter sequence from the read name to the beginning of the read? I'm guessing I'll also need to insert some bogus quality scores as well.
To visualize, I need to turn this:
@M01179:478:000000000-C33YY:1:1101:17276:1520 2:N:0:CTTGTATGTATG
TTGTCGTTCCTTTCTTTTTGTCTCTTTCCTGTACCTCTAG
+
11111333@B1B11AA3A3B1A3B3BFEE333130110A0
into this:
@M01179:478:000000000-C33YY:1:1101:17276:1520 2:N:0:CTTGTATGTATG
CTTGTATGTATGTTGTCGTTCCTTTCTTTTTGTCTCTTTCCTGTACCTCTAG
+
AAAAAAAAAAAA11111333@B1B11AA3A3B1A3B3BFEE333130110A0
This would need to be a scriptable solution, and work on all reads from a High Out-put NextSeq run
Thanks in advance
Comment