Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding barcode indexes back into FASTQ headers?

    Hi,

    I have a set of fastq files from a fungal ITS2 amplicon run on a MiSeq. There are 3 corresponding fastq files: Read1, Read2 and an Index file. In the past, I have worked with fastq files where the Read 1 and Read 2 files (or when they are interleaved into one file) have the barcode indexes at the end of the fastq header lines in the actual Read 1 and Read 2 files; BBMap/BBDuk worked great for processing these files (e.g., adaptor removal, merging reads). However, in my current situation I have the barcodes in their own fastq file, and I can't seem to find a script in BBMap/BBDuk that accommodates my current situation.

    My questions are:

    Am I missing an argument or script in BBMap that will accommodate my situation?

    Or, does anyone know of a function or script outside of BBMap that will take the barcodes indexes in the index file and put them pack into the end of the header lines in the read1 and read2 files? I have found scripts that go the opposite way (e.g., Qiime's extract_barcodes.py script).

    Thanks for the assistance

  • #2
    Why do you want to put barcodes back into sequences? They are usually stripped out of the reads when the demultiplexing is carried out.

    Comment


    • #3
      Not back into the read, back into the header line

      Hi, my goal is not to put them back into the sequences themselves but to put them at the end of the header line for each sequence in the fastq. Let me know if this is not clear and I can cut and paste some examples.

      Thanks

      Comment


      • #4
        Ah, right, that makes more sense. I wonder if 'paste' would help you here. Here's an example of what it can do:

        Code:
        $ paste -d '|' <(zcat ../ERR063640_1P.fastq.gz | head -n 16) <(zcat ../ERR063640_2P.fastq.gz | head -n 16)
        @ERR063640.4 HS13_6902:1:1101:1393:2125/1|@ERR063640.4 HS13_6902:1:1101:1393:2125/2
        TGGGAATCAGCCGATCGAGTGTACAAAATATAGTAGTTGGAAAACTAAGGCTGAGGAGCTATAAGCTGC|TTTAGAGAGTCCAAACTGTTGTGCCCGGTANNNNNNACCTTCTTCTCGAAAA
        +|+
        B>GGDFJEHBDLIIMJHIKBLLHIIGGKHI=ILHFAHJLMKKIDDKLGLIIFLKIFGKEFHH?@H>JLF|:DDEFH>BIIJGJKLJHKDCJHJJKLHKFB!!!!!!H@H8GKEKKGE8I-FM
        @ERR063640.5 HS13_6902:1:1101:1453:2128/1|@ERR063640.5 HS13_6902:1:1101:1453:2128/2
        TGAGGTGTGAGATGCGTGTCAGTGCAGAGCGAGAGAAAACGGCACGAAATAAGTGCCGGTGCTTCTCATGGAGACGACGATGGTTTGCGTCGCTT|TCACAGATGAGACGGTAAGCTCACAATGACCNNNTNATGTACAATTGTAAATC
        +|+
        CBC;DIJ>HFDLBIJLLJGJEJKKIMLKMJ<ILEIHJNJGDLLFJKFGKEIHLJHFMJGIHJLJFEJDKHGGHKJJLEJKBEL7I7C8HJ5FGF>|:DBEFHBHIIJJHILJHEKJKKI@KLHDKIJ!!!J!LFAJGLLKLIK8EKEF9
        @ERR063640.7 HS13_6902:1:1101:1378:2148/1|@ERR063640.7 HS13_6902:1:1101:1378:2148/2
        GAAATTTTCAAGAATTGACGGAAAACGAATTTCATTGCCGGGCATTCTAAGTGTTAAATTAGCATTGTACTCGTCGAGCTGAGTCGGATGATATGTGATA|CATATCGGCCCTATCACATATCATCCGACTCANCNNNACGAGTACAATACTAA
        +|+
        D>GGILJEJKKLJIKLLLJOLGHJIMKKMKEJHGKLDNFMLKLILKLLLGJNLKJLKKJIGCHHIMILHMGDLKLJ>KJKKHG7MIKKKJFBAGEBBAA<|:DFGFHGILJJLKILJKKKJIDIJLLHKKIOL!G!!!JKKHKEKKIKH9KLFI
        @ERR063640.8 HS13_6902:1:1101:1462:2157/1|@ERR063640.8 HS13_6902:1:1101:1462:2157/2
        ACGAATGCGTGGCGCGTCCTTCGGGGACTGATCAACGAAGACACCTTTACCTTTACCTTTTATACTAAAATTTATCTCCTGAAAGGAGAACTTGTAACA|TTTCATGGTGAGAGTTATGTGCAAAAACCGGAACTCGAACGGAATTCGAAGCAAGGTATCAAA
        +|+
        CCGHIIJEKKKLIMJL@FJJL?BKJO=KH?;HCHNCDAK=K<LHHDJLEBGFLKHEHKLKH9H@FME9KHBJL7H?L8LE8E6KEDCG?L@BG@BAB?A|9DDEFHGHIJJGHILJKKKJLMHJLLHKJNMHM>JKNOGKILIKKIEKIFLFIKJHLNJKJLH
        This could be followed up with code that reads in 4 lines, then displays the left-hand column for the four lines, appending the right-hand column for the second line if the line number is 1. Not sure if that's what you want to do, but here's an example:

        Code:
         paste -d '~' <(zcat ../ERR063640_1P.fastq.gz | head -n 16) <(zcat ../ERR063640_2P.fastq.gz | head -n 16) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= " [barcode ".$F[1]."]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;'
        @ERR063640.4 HS13_6902:1:1101:1393:2125/1 [barcode TTTAGAGAGTCCAAACTGTTGTGCCCGGTANNNNNNACCTTCTTCTCGAAAA]
        TGGGAATCAGCCGATCGAGTGTACAAAATATAGTAGTTGGAAAACTAAGGCTGAGGAGCTATAAGCTGC
        +
        B>GGDFJEHBDLIIMJHIKBLLHIIGGKHI=ILHFAHJLMKKIDDKLGLIIFLKIFGKEFHH?@H>JLF
        @ERR063640.5 HS13_6902:1:1101:1453:2128/1 [barcode TCACAGATGAGACGGTAAGCTCACAATGACCNNNTNATGTACAATTGTAAATC]
        TGAGGTGTGAGATGCGTGTCAGTGCAGAGCGAGAGAAAACGGCACGAAATAAGTGCCGGTGCTTCTCATGGAGACGACGATGGTTTGCGTCGCTT
        +
        CBC;DIJ>HFDLBIJLLJGJEJKKIMLKMJ<ILEIHJNJGDLLFJKFGKEIHLJHFMJGIHJLJFEJDKHGGHKJJLEJKBEL7I7C8HJ5FGF>
        @ERR063640.7 HS13_6902:1:1101:1378:2148/1 [barcode CATATCGGCCCTATCACATATCATCCGACTCANCNNNACGAGTACAATACTAA]
        GAAATTTTCAAGAATTGACGGAAAACGAATTTCATTGCCGGGCATTCTAAGTGTTAAATTAGCATTGTACTCGTCGAGCTGAGTCGGATGATATGTGATA
        +
        D>GGILJEJKKLJIKLLLJOLGHJIMKKMKEJHGKLDNFMLKLILKLLLGJNLKJLKKJIGCHHIMILHMGDLKLJ>KJKKHG7MIKKKJFBAGEBBAA<
        @ERR063640.8 HS13_6902:1:1101:1462:2157/1 [barcode TTTCATGGTGAGAGTTATGTGCAAAAACCGGAACTCGAACGGAATTCGAAGCAAGGTATCAAA]
        ACGAATGCGTGGCGCGTCCTTCGGGGACTGATCAACGAAGACACCTTTACCTTTACCTTTTATACTAAAATTTATCTCCTGAAAGGAGAACTTGTAACA
        +
        CCGHIIJEKKKLIMJL@FJJL?BKJO=KH?;HCHNCDAK=K<LHHDJLEBGFLKHEHKLKH9H@FME9KHBJL7H?L8LE8E6KEDCG?L@BG@BAB?A
        Last edited by gringer; 03-03-2017, 11:24 AM.

        Comment


        • #5
          Slightly modifying code that @gringer supplied.

          Disclaimer: Please verify that the results look right with your data.

          Code:
          paste -d '~' <(cat R1.fq) <(cat R2.fq) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' > WithBarcode_R1.fq
          If your files look like this

          R1.fq

          Code:
          @FCID:1:1101:15473:1334 1:N:0:
          AGTGGACTAGGGGATGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAACGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGTAGTGCTTTGGAAACTGTGCAGCTCGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACT
          +
          AAAABFFFFFFCGGGGGGGGGGGGGGGGGGGGGHHHHHHGHHGGGHGHGGGGHHHGGGGGHHHHHHHHGGGGHHHGHHGGGGGGGGGGGGHHHHHHHGHGHHHHHHHHFHHHHHHGGGGHHHHGGGGGHHHHHHHHHHGHHHHHHFHHFHGGGGDFHHHHH.EGGGBFFGGGGGGEFFFGGGGFFGGGF-DFEFFFFFFA.-./FFFFBFFFBFFFFFFA?;/B?F@DCFEAAF-@FFBBBBFFEFFFB;
          @FCID:1:1101:15528:1336 1:N:0:
          GAATTGGACGAGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCAGCAAAGGTCTGTGGTGAAAGACTGAAGCTTAACTTCAGTAAGCCATAGAAACCGGGCAGCTAGAGTGCAGGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCCTGCAACTGAC
          +
          DDDDDFFFFCDCGGGGGGGGGGHGGGGGGGHHHHHHHGHHGHHHGHGGGGHHHGGGGGHHHHHHHHGGGGHHGHHGGGGHHHGGGGGGGHHHHGGHHHHHHHGHHHHHHHHHHHHGHHHGHGHHHHHHHHHHHHHHHHHHGGGGGGGHHHHHGHGHHHGGHGDHHGDFFGGGGGGGGGGFGGGFGGG9?EGFGGFFAD;EFFFFFFFFFFFFFFFDEEFFFFFFF-DE->CFFEEAFFFFFFFBFFFFF0
          R2.fq (barcodes)
          Code:
          @FCID:1:1101:15473:1334 2:N:0:
          TATTTGCGACAA
          +
          #>>>ABFFBBBBG
          @FCID:1:1101:15528:1336 2:N:0:
          GCGGGAAAAAAA
          +
          #############
          File you want

          Code:
          @FCID:1:1101:15473:1334 1:N:0:TATTTGCGACAA
          AGTGGACTAGGGGATGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAACGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGTAGTGCTTTGGAAACTGTGCAGCTCGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACT
          +
          AAAABFFFFFFCGGGGGGGGGGGGGGGGGGGGGHHHHHHGHHGGGHGHGGGGHHHGGGGGHHHHHHHHGGGGHHHGHHGGGGGGGGGGGGHHHHHHHGHGHHHHHHHHFHHHHHHGGGGHHHHGGGGGHHHHHHHHHHGHHHHHHFHHFHGGGGDFHHHHH.EGGGBFFGGGGGGEFFFGGGGFFGGGF-DFEFFFFFFA.-./FFFFBFFFBFFFFFFA?;/B?F@DCFEAAF-@FFBBBBFFEFFFB;
          @FCID:1:1101:15528:1336 1:N:0:GCGGGAAAAAAA
          GAATTGGACGAGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCAGCAAAGGTCTGTGGTGAAAGACTGAAGCTTAACTTCAGTAAGCCATAGAAACCGGGCAGCTAGAGTGCAGGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCCTGCAACTGAC
          +
          DDDDDFFFFCDCGGGGGGGGGGHGGGGGGGHHHHHHHGHHGHHHGHGGGGHHHGGGGGHHHHHHHHGGGGHHGHHGGGGHHHGGGGGGGHHHHGGHHHHHHHGHHHHHHHHHHHHGHHHGHGHHHHHHHHHHHHHHHHHHGGGGGGGHHHHHGHGHHHGGHGDHHGDFFGGGGGGGGGGFGGGFGGG9?EGFGGFFAD;EFFFFFFFFFFFFFFFDEEFFFFFFF-DE->CFFEEAFFFFFFFBFFFFF0
          If your files are compressed then use

          Code:
          paste -d '~' <(zcat R1.fq.gz) <(zcat R2.fq.gz) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' | gzip - > WithBarcode_R1.fq.gz
          Last edited by GenoMax; 03-03-2017, 12:12 PM.

          Comment


          • #6
            Thanks so much to the two of you. This looks to be exactly what I want. I will be sure to verify the output. This will open up a whole new swath of tools for me to use.

            Thanks

            Comment


            • #7
              Hi,

              I am almost in the same boat. I have two fastQ file for each index (index i7 and i5) and two pair-end raw reads file (R1 and R2). Last time when I had two inline barcodes i7+i5 on the header, I used bbMap conveniently and demultiplexed them. But, I am not sure how can I add these two barcode indexes and bring them as inline header in the raw sequence files so that I can demultiplex my data as before. Is there anyway I can use bbmap to demultiplex my seq data whether my barcodes are in separate files?

              Comment


              • #8
                @Luckyboy7: Use deML package to demux your data. More info here.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM
                • seqadmin
                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                  by seqadmin




                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                  02-24-2025, 06:31 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                177 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-28-2025, 12:58 PM
                0 responses
                267 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-24-2025, 02:48 PM
                0 responses
                652 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-21-2025, 02:46 PM
                0 responses
                267 views
                0 likes
                Last Post seqadmin  
                Working...
                X