Header Leaderboard Ad
Collapse
Adding barcode indexes back into FASTQ headers?
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
I am almost in the same boat. I have two fastQ file for each index (index i7 and i5) and two pair-end raw reads file (R1 and R2). Last time when I had two inline barcodes i7+i5 on the header, I used bbMap conveniently and demultiplexed them. But, I am not sure how can I add these two barcode indexes and bring them as inline header in the raw sequence files so that I can demultiplex my data as before. Is there anyway I can use bbmap to demultiplex my seq data whether my barcodes are in separate files?
Leave a comment:
-
Thanks so much to the two of you. This looks to be exactly what I want. I will be sure to verify the output. This will open up a whole new swath of tools for me to use.
Thanks
Leave a comment:
-
Slightly modifying code that @gringer supplied.
Disclaimer: Please verify that the results look right with your data.
Code:paste -d '~' <(cat R1.fq) <(cat R2.fq) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' > WithBarcode_R1.fq
R1.fq
Code:@FCID:1:1101:15473:1334 1:N:0: AGTGGACTAGGGGATGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAACGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGTAGTGCTTTGGAAACTGTGCAGCTCGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACT + AAAABFFFFFFCGGGGGGGGGGGGGGGGGGGGGHHHHHHGHHGGGHGHGGGGHHHGGGGGHHHHHHHHGGGGHHHGHHGGGGGGGGGGGGHHHHHHHGHGHHHHHHHHFHHHHHHGGGGHHHHGGGGGHHHHHHHHHHGHHHHHHFHHFHGGGGDFHHHHH.EGGGBFFGGGGGGEFFFGGGGFFGGGF-DFEFFFFFFA.-./FFFFBFFFBFFFFFFA?;/B?F@DCFEAAF-@FFBBBBFFEFFFB; @FCID:1:1101:15528:1336 1:N:0: GAATTGGACGAGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCAGCAAAGGTCTGTGGTGAAAGACTGAAGCTTAACTTCAGTAAGCCATAGAAACCGGGCAGCTAGAGTGCAGGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCCTGCAACTGAC + DDDDDFFFFCDCGGGGGGGGGGHGGGGGGGHHHHHHHGHHGHHHGHGGGGHHHGGGGGHHHHHHHHGGGGHHGHHGGGGHHHGGGGGGGHHHHGGHHHHHHHGHHHHHHHHHHHHGHHHGHGHHHHHHHHHHHHHHHHHHGGGGGGGHHHHHGHGHHHGGHGDHHGDFFGGGGGGGGGGFGGGFGGG9?EGFGGFFAD;EFFFFFFFFFFFFFFFDEEFFFFFFF-DE->CFFEEAFFFFFFFBFFFFF0
Code:@FCID:1:1101:15473:1334 2:N:0: TATTTGCGACAA + #>>>ABFFBBBBG @FCID:1:1101:15528:1336 2:N:0: GCGGGAAAAAAA + #############
Code:@FCID:1:1101:15473:1334 1:N:0:TATTTGCGACAA AGTGGACTAGGGGATGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAACGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGTAGTGCTTTGGAAACTGTGCAGCTCGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACT + AAAABFFFFFFCGGGGGGGGGGGGGGGGGGGGGHHHHHHGHHGGGHGHGGGGHHHGGGGGHHHHHHHHGGGGHHHGHHGGGGGGGGGGGGHHHHHHHGHGHHHHHHHHFHHHHHHGGGGHHHHGGGGGHHHHHHHHHHGHHHHHHFHHFHGGGGDFHHHHH.EGGGBFFGGGGGGEFFFGGGGFFGGGF-DFEFFFFFFA.-./FFFFBFFFBFFFFFFA?;/B?F@DCFEAAF-@FFBBBBFFEFFFB; @FCID:1:1101:15528:1336 1:N:0:GCGGGAAAAAAA GAATTGGACGAGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCAGCAAAGGTCTGTGGTGAAAGACTGAAGCTTAACTTCAGTAAGCCATAGAAACCGGGCAGCTAGAGTGCAGGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCCTGCAACTGAC + DDDDDFFFFCDCGGGGGGGGGGHGGGGGGGHHHHHHHGHHGHHHGHGGGGHHHGGGGGHHHHHHHHGGGGHHGHHGGGGHHHGGGGGGGHHHHGGHHHHHHHGHHHHHHHHHHHHGHHHGHGHHHHHHHHHHHHHHHHHHGGGGGGGHHHHHGHGHHHGGHGDHHGDFFGGGGGGGGGGFGGGFGGG9?EGFGGFFAD;EFFFFFFFFFFFFFFFDEEFFFFFFF-DE->CFFEEAFFFFFFFBFFFFF0
Code:paste -d '~' <(zcat R1.fq.gz) <(zcat R2.fq.gz) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' | gzip - > WithBarcode_R1.fq.gz
Last edited by GenoMax; 03-03-2017, 12:12 PM.
Leave a comment:
-
Ah, right, that makes more sense. I wonder if 'paste' would help you here. Here's an example of what it can do:
Code:$ paste -d '|' <(zcat ../ERR063640_1P.fastq.gz | head -n 16) <(zcat ../ERR063640_2P.fastq.gz | head -n 16) @ERR063640.4 HS13_6902:1:1101:1393:2125/1|@ERR063640.4 HS13_6902:1:1101:1393:2125/2 TGGGAATCAGCCGATCGAGTGTACAAAATATAGTAGTTGGAAAACTAAGGCTGAGGAGCTATAAGCTGC|TTTAGAGAGTCCAAACTGTTGTGCCCGGTANNNNNNACCTTCTTCTCGAAAA +|+ B>GGDFJEHBDLIIMJHIKBLLHIIGGKHI=ILHFAHJLMKKIDDKLGLIIFLKIFGKEFHH?@H>JLF|:DDEFH>BIIJGJKLJHKDCJHJJKLHKFB!!!!!!H@H8GKEKKGE8I-FM @ERR063640.5 HS13_6902:1:1101:1453:2128/1|@ERR063640.5 HS13_6902:1:1101:1453:2128/2 TGAGGTGTGAGATGCGTGTCAGTGCAGAGCGAGAGAAAACGGCACGAAATAAGTGCCGGTGCTTCTCATGGAGACGACGATGGTTTGCGTCGCTT|TCACAGATGAGACGGTAAGCTCACAATGACCNNNTNATGTACAATTGTAAATC +|+ CBC;DIJ>HFDLBIJLLJGJEJKKIMLKMJ<ILEIHJNJGDLLFJKFGKEIHLJHFMJGIHJLJFEJDKHGGHKJJLEJKBEL7I7C8HJ5FGF>|:DBEFHBHIIJJHILJHEKJKKI@KLHDKIJ!!!J!LFAJGLLKLIK8EKEF9 @ERR063640.7 HS13_6902:1:1101:1378:2148/1|@ERR063640.7 HS13_6902:1:1101:1378:2148/2 GAAATTTTCAAGAATTGACGGAAAACGAATTTCATTGCCGGGCATTCTAAGTGTTAAATTAGCATTGTACTCGTCGAGCTGAGTCGGATGATATGTGATA|CATATCGGCCCTATCACATATCATCCGACTCANCNNNACGAGTACAATACTAA +|+ D>GGILJEJKKLJIKLLLJOLGHJIMKKMKEJHGKLDNFMLKLILKLLLGJNLKJLKKJIGCHHIMILHMGDLKLJ>KJKKHG7MIKKKJFBAGEBBAA<|:DFGFHGILJJLKILJKKKJIDIJLLHKKIOL!G!!!JKKHKEKKIKH9KLFI @ERR063640.8 HS13_6902:1:1101:1462:2157/1|@ERR063640.8 HS13_6902:1:1101:1462:2157/2 ACGAATGCGTGGCGCGTCCTTCGGGGACTGATCAACGAAGACACCTTTACCTTTACCTTTTATACTAAAATTTATCTCCTGAAAGGAGAACTTGTAACA|TTTCATGGTGAGAGTTATGTGCAAAAACCGGAACTCGAACGGAATTCGAAGCAAGGTATCAAA +|+ CCGHIIJEKKKLIMJL@FJJL?BKJO=KH?;HCHNCDAK=K<LHHDJLEBGFLKHEHKLKH9H@FME9KHBJL7H?L8LE8E6KEDCG?L@BG@BAB?A|9DDEFHGHIJJGHILJKKKJLMHJLLHKJNMHM>JKNOGKILIKKIEKIFLFIKJHLNJKJLH
Code:paste -d '~' <(zcat ../ERR063640_1P.fastq.gz | head -n 16) <(zcat ../ERR063640_2P.fastq.gz | head -n 16) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= " [barcode ".$F[1]."]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' @ERR063640.4 HS13_6902:1:1101:1393:2125/1 [barcode TTTAGAGAGTCCAAACTGTTGTGCCCGGTANNNNNNACCTTCTTCTCGAAAA] TGGGAATCAGCCGATCGAGTGTACAAAATATAGTAGTTGGAAAACTAAGGCTGAGGAGCTATAAGCTGC + B>GGDFJEHBDLIIMJHIKBLLHIIGGKHI=ILHFAHJLMKKIDDKLGLIIFLKIFGKEFHH?@H>JLF @ERR063640.5 HS13_6902:1:1101:1453:2128/1 [barcode TCACAGATGAGACGGTAAGCTCACAATGACCNNNTNATGTACAATTGTAAATC] TGAGGTGTGAGATGCGTGTCAGTGCAGAGCGAGAGAAAACGGCACGAAATAAGTGCCGGTGCTTCTCATGGAGACGACGATGGTTTGCGTCGCTT + CBC;DIJ>HFDLBIJLLJGJEJKKIMLKMJ<ILEIHJNJGDLLFJKFGKEIHLJHFMJGIHJLJFEJDKHGGHKJJLEJKBEL7I7C8HJ5FGF> @ERR063640.7 HS13_6902:1:1101:1378:2148/1 [barcode CATATCGGCCCTATCACATATCATCCGACTCANCNNNACGAGTACAATACTAA] GAAATTTTCAAGAATTGACGGAAAACGAATTTCATTGCCGGGCATTCTAAGTGTTAAATTAGCATTGTACTCGTCGAGCTGAGTCGGATGATATGTGATA + D>GGILJEJKKLJIKLLLJOLGHJIMKKMKEJHGKLDNFMLKLILKLLLGJNLKJLKKJIGCHHIMILHMGDLKLJ>KJKKHG7MIKKKJFBAGEBBAA< @ERR063640.8 HS13_6902:1:1101:1462:2157/1 [barcode TTTCATGGTGAGAGTTATGTGCAAAAACCGGAACTCGAACGGAATTCGAAGCAAGGTATCAAA] ACGAATGCGTGGCGCGTCCTTCGGGGACTGATCAACGAAGACACCTTTACCTTTACCTTTTATACTAAAATTTATCTCCTGAAAGGAGAACTTGTAACA + CCGHIIJEKKKLIMJL@FJJL?BKJO=KH?;HCHNCDAK=K<LHHDJLEBGFLKHEHKLKH9H@FME9KHBJL7H?L8LE8E6KEDCG?L@BG@BAB?A
Last edited by gringer; 03-03-2017, 11:24 AM.
Leave a comment:
-
Not back into the read, back into the header line
Hi, my goal is not to put them back into the sequences themselves but to put them at the end of the header line for each sequence in the fastq. Let me know if this is not clear and I can cut and paste some examples.
Thanks
Leave a comment:
-
Why do you want to put barcodes back into sequences? They are usually stripped out of the reads when the demultiplexing is carried out.
Leave a comment:
-
Adding barcode indexes back into FASTQ headers?
Hi,
I have a set of fastq files from a fungal ITS2 amplicon run on a MiSeq. There are 3 corresponding fastq files: Read1, Read2 and an Index file. In the past, I have worked with fastq files where the Read 1 and Read 2 files (or when they are interleaved into one file) have the barcode indexes at the end of the fastq header lines in the actual Read 1 and Read 2 files; BBMap/BBDuk worked great for processing these files (e.g., adaptor removal, merging reads). However, in my current situation I have the barcodes in their own fastq file, and I can't seem to find a script in BBMap/BBDuk that accommodates my current situation.
My questions are:
Am I missing an argument or script in BBMap that will accommodate my situation?
Or, does anyone know of a function or script outside of BBMap that will take the barcodes indexes in the index file and put them pack into the end of the header lines in the read1 and read2 files? I have found scripts that go the opposite way (e.g., Qiime's extract_barcodes.py script).
Thanks for the assistance
Latest Articles
Collapse
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
-
by seqadmin
Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.
Whole Transcriptome RNA-seq
Whole transcriptome sequencing...-
Channel: Articles
08-31-2023, 11:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Multiplexed Biomarker Detection with Nanopore Technology: A Leap in Precision Diagnostics
by seqadmin
Started by seqadmin, Today, 07:42 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Today, 07:42 AM
|
||
Started by seqadmin, 09-22-2023, 09:05 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
09-22-2023, 09:05 AM
|
||
Started by seqadmin, 09-21-2023, 06:18 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-21-2023, 06:18 AM
|
||
Started by seqadmin, 09-20-2023, 09:17 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-20-2023, 09:17 AM
|
Leave a comment: