Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenoMax
    replied
    @Luckyboy7: Use deML package to demux your data. More info here.

    Leave a comment:


  • Luckyboy7
    replied
    Hi,

    I am almost in the same boat. I have two fastQ file for each index (index i7 and i5) and two pair-end raw reads file (R1 and R2). Last time when I had two inline barcodes i7+i5 on the header, I used bbMap conveniently and demultiplexed them. But, I am not sure how can I add these two barcode indexes and bring them as inline header in the raw sequence files so that I can demultiplex my data as before. Is there anyway I can use bbmap to demultiplex my seq data whether my barcodes are in separate files?

    Leave a comment:


  • PeatMaster
    replied
    Thanks so much to the two of you. This looks to be exactly what I want. I will be sure to verify the output. This will open up a whole new swath of tools for me to use.

    Thanks

    Leave a comment:


  • GenoMax
    replied
    Slightly modifying code that @gringer supplied.

    Disclaimer: Please verify that the results look right with your data.

    Code:
    paste -d '~' <(cat R1.fq) <(cat R2.fq) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' > WithBarcode_R1.fq
    If your files look like this

    R1.fq

    Code:
    @FCID:1:1101:15473:1334 1:N:0:
    AGTGGACTAGGGGATGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAACGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGTAGTGCTTTGGAAACTGTGCAGCTCGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACT
    +
    AAAABFFFFFFCGGGGGGGGGGGGGGGGGGGGGHHHHHHGHHGGGHGHGGGGHHHGGGGGHHHHHHHHGGGGHHHGHHGGGGGGGGGGGGHHHHHHHGHGHHHHHHHHFHHHHHHGGGGHHHHGGGGGHHHHHHHHHHGHHHHHHFHHFHGGGGDFHHHHH.EGGGBFFGGGGGGEFFFGGGGFFGGGF-DFEFFFFFFA.-./FFFFBFFFBFFFFFFA?;/B?F@DCFEAAF-@FFBBBBFFEFFFB;
    @FCID:1:1101:15528:1336 1:N:0:
    GAATTGGACGAGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCAGCAAAGGTCTGTGGTGAAAGACTGAAGCTTAACTTCAGTAAGCCATAGAAACCGGGCAGCTAGAGTGCAGGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCCTGCAACTGAC
    +
    DDDDDFFFFCDCGGGGGGGGGGHGGGGGGGHHHHHHHGHHGHHHGHGGGGHHHGGGGGHHHHHHHHGGGGHHGHHGGGGHHHGGGGGGGHHHHGGHHHHHHHGHHHHHHHHHHHHGHHHGHGHHHHHHHHHHHHHHHHHHGGGGGGGHHHHHGHGHHHGGHGDHHGDFFGGGGGGGGGGFGGGFGGG9?EGFGGFFAD;EFFFFFFFFFFFFFFFDEEFFFFFFF-DE->CFFEEAFFFFFFFBFFFFF0
    R2.fq (barcodes)
    Code:
    @FCID:1:1101:15473:1334 2:N:0:
    TATTTGCGACAA
    +
    #>>>ABFFBBBBG
    @FCID:1:1101:15528:1336 2:N:0:
    GCGGGAAAAAAA
    +
    #############
    File you want

    Code:
    @FCID:1:1101:15473:1334 1:N:0:TATTTGCGACAA
    AGTGGACTAGGGGATGCCAGCCGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAACGCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGTAGTGCTTTGGAAACTGTGCAGCTCGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACT
    +
    AAAABFFFFFFCGGGGGGGGGGGGGGGGGGGGGHHHHHHGHHGGGHGHGGGGHHHGGGGGHHHHHHHHGGGGHHHGHHGGGGGGGGGGGGHHHHHHHGHGHHHHHHHHFHHHHHHGGGGHHHHGGGGGHHHHHHHHHHGHHHHHHFHHFHGGGGDFHHHHH.EGGGBFFGGGGGGEFFFGGGGFFGGGF-DFEFFFFFFA.-./FFFFBFFFBFFFFFFA?;/B?F@DCFEAAF-@FFBBBBFFEFFFB;
    @FCID:1:1101:15528:1336 1:N:0:GCGGGAAAAAAA
    GAATTGGACGAGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGGGAGCAGGCGGCAGCAAAGGTCTGTGGTGAAAGACTGAAGCTTAACTTCAGTAAGCCATAGAAACCGGGCAGCTAGAGTGCAGGAGAGGATCGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGACGATCTGGCCTGCAACTGAC
    +
    DDDDDFFFFCDCGGGGGGGGGGHGGGGGGGHHHHHHHGHHGHHHGHGGGGHHHGGGGGHHHHHHHHGGGGHHGHHGGGGHHHGGGGGGGHHHHGGHHHHHHHGHHHHHHHHHHHHGHHHGHGHHHHHHHHHHHHHHHHHHGGGGGGGHHHHHGHGHHHGGHGDHHGDFFGGGGGGGGGGFGGGFGGG9?EGFGGFFAD;EFFFFFFFFFFFFFFFDEEFFFFFFF-DE->CFFEEAFFFFFFFBFFFFF0
    If your files are compressed then use

    Code:
    paste -d '~' <(zcat R1.fq.gz) <(zcat R2.fq.gz) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' | gzip - > WithBarcode_R1.fq.gz
    Last edited by GenoMax; 03-03-2017, 12:12 PM.

    Leave a comment:


  • gringer
    replied
    Ah, right, that makes more sense. I wonder if 'paste' would help you here. Here's an example of what it can do:

    Code:
    $ paste -d '|' <(zcat ../ERR063640_1P.fastq.gz | head -n 16) <(zcat ../ERR063640_2P.fastq.gz | head -n 16)
    @ERR063640.4 HS13_6902:1:1101:1393:2125/1|@ERR063640.4 HS13_6902:1:1101:1393:2125/2
    TGGGAATCAGCCGATCGAGTGTACAAAATATAGTAGTTGGAAAACTAAGGCTGAGGAGCTATAAGCTGC|TTTAGAGAGTCCAAACTGTTGTGCCCGGTANNNNNNACCTTCTTCTCGAAAA
    +|+
    B>GGDFJEHBDLIIMJHIKBLLHIIGGKHI=ILHFAHJLMKKIDDKLGLIIFLKIFGKEFHH?@H>JLF|:DDEFH>BIIJGJKLJHKDCJHJJKLHKFB!!!!!!H@H8GKEKKGE8I-FM
    @ERR063640.5 HS13_6902:1:1101:1453:2128/1|@ERR063640.5 HS13_6902:1:1101:1453:2128/2
    TGAGGTGTGAGATGCGTGTCAGTGCAGAGCGAGAGAAAACGGCACGAAATAAGTGCCGGTGCTTCTCATGGAGACGACGATGGTTTGCGTCGCTT|TCACAGATGAGACGGTAAGCTCACAATGACCNNNTNATGTACAATTGTAAATC
    +|+
    CBC;DIJ>HFDLBIJLLJGJEJKKIMLKMJ<ILEIHJNJGDLLFJKFGKEIHLJHFMJGIHJLJFEJDKHGGHKJJLEJKBEL7I7C8HJ5FGF>|:DBEFHBHIIJJHILJHEKJKKI@KLHDKIJ!!!J!LFAJGLLKLIK8EKEF9
    @ERR063640.7 HS13_6902:1:1101:1378:2148/1|@ERR063640.7 HS13_6902:1:1101:1378:2148/2
    GAAATTTTCAAGAATTGACGGAAAACGAATTTCATTGCCGGGCATTCTAAGTGTTAAATTAGCATTGTACTCGTCGAGCTGAGTCGGATGATATGTGATA|CATATCGGCCCTATCACATATCATCCGACTCANCNNNACGAGTACAATACTAA
    +|+
    D>GGILJEJKKLJIKLLLJOLGHJIMKKMKEJHGKLDNFMLKLILKLLLGJNLKJLKKJIGCHHIMILHMGDLKLJ>KJKKHG7MIKKKJFBAGEBBAA<|:DFGFHGILJJLKILJKKKJIDIJLLHKKIOL!G!!!JKKHKEKKIKH9KLFI
    @ERR063640.8 HS13_6902:1:1101:1462:2157/1|@ERR063640.8 HS13_6902:1:1101:1462:2157/2
    ACGAATGCGTGGCGCGTCCTTCGGGGACTGATCAACGAAGACACCTTTACCTTTACCTTTTATACTAAAATTTATCTCCTGAAAGGAGAACTTGTAACA|TTTCATGGTGAGAGTTATGTGCAAAAACCGGAACTCGAACGGAATTCGAAGCAAGGTATCAAA
    +|+
    CCGHIIJEKKKLIMJL@FJJL?BKJO=KH?;HCHNCDAK=K<LHHDJLEBGFLKHEHKLKH9H@FME9KHBJL7H?L8LE8E6KEDCG?L@BG@BAB?A|9DDEFHGHIJJGHILJKKKJLMHJLLHKJNMHM>JKNOGKILIKKIEKIFLFIKJHLNJKJLH
    This could be followed up with code that reads in 4 lines, then displays the left-hand column for the four lines, appending the right-hand column for the second line if the line number is 1. Not sure if that's what you want to do, but here's an example:

    Code:
     paste -d '~' <(zcat ../ERR063640_1P.fastq.gz | head -n 16) <(zcat ../ERR063640_2P.fastq.gz | head -n 16) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= " [barcode ".$F[1]."]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;'
    @ERR063640.4 HS13_6902:1:1101:1393:2125/1 [barcode TTTAGAGAGTCCAAACTGTTGTGCCCGGTANNNNNNACCTTCTTCTCGAAAA]
    TGGGAATCAGCCGATCGAGTGTACAAAATATAGTAGTTGGAAAACTAAGGCTGAGGAGCTATAAGCTGC
    +
    B>GGDFJEHBDLIIMJHIKBLLHIIGGKHI=ILHFAHJLMKKIDDKLGLIIFLKIFGKEFHH?@H>JLF
    @ERR063640.5 HS13_6902:1:1101:1453:2128/1 [barcode TCACAGATGAGACGGTAAGCTCACAATGACCNNNTNATGTACAATTGTAAATC]
    TGAGGTGTGAGATGCGTGTCAGTGCAGAGCGAGAGAAAACGGCACGAAATAAGTGCCGGTGCTTCTCATGGAGACGACGATGGTTTGCGTCGCTT
    +
    CBC;DIJ>HFDLBIJLLJGJEJKKIMLKMJ<ILEIHJNJGDLLFJKFGKEIHLJHFMJGIHJLJFEJDKHGGHKJJLEJKBEL7I7C8HJ5FGF>
    @ERR063640.7 HS13_6902:1:1101:1378:2148/1 [barcode CATATCGGCCCTATCACATATCATCCGACTCANCNNNACGAGTACAATACTAA]
    GAAATTTTCAAGAATTGACGGAAAACGAATTTCATTGCCGGGCATTCTAAGTGTTAAATTAGCATTGTACTCGTCGAGCTGAGTCGGATGATATGTGATA
    +
    D>GGILJEJKKLJIKLLLJOLGHJIMKKMKEJHGKLDNFMLKLILKLLLGJNLKJLKKJIGCHHIMILHMGDLKLJ>KJKKHG7MIKKKJFBAGEBBAA<
    @ERR063640.8 HS13_6902:1:1101:1462:2157/1 [barcode TTTCATGGTGAGAGTTATGTGCAAAAACCGGAACTCGAACGGAATTCGAAGCAAGGTATCAAA]
    ACGAATGCGTGGCGCGTCCTTCGGGGACTGATCAACGAAGACACCTTTACCTTTACCTTTTATACTAAAATTTATCTCCTGAAAGGAGAACTTGTAACA
    +
    CCGHIIJEKKKLIMJL@FJJL?BKJO=KH?;HCHNCDAK=K<LHHDJLEBGFLKHEHKLKH9H@FME9KHBJL7H?L8LE8E6KEDCG?L@BG@BAB?A
    Last edited by gringer; 03-03-2017, 11:24 AM.

    Leave a comment:


  • PeatMaster
    replied
    Not back into the read, back into the header line

    Hi, my goal is not to put them back into the sequences themselves but to put them at the end of the header line for each sequence in the fastq. Let me know if this is not clear and I can cut and paste some examples.

    Thanks

    Leave a comment:


  • gringer
    replied
    Why do you want to put barcodes back into sequences? They are usually stripped out of the reads when the demultiplexing is carried out.

    Leave a comment:


  • PeatMaster
    started a topic Adding barcode indexes back into FASTQ headers?

    Adding barcode indexes back into FASTQ headers?

    Hi,

    I have a set of fastq files from a fungal ITS2 amplicon run on a MiSeq. There are 3 corresponding fastq files: Read1, Read2 and an Index file. In the past, I have worked with fastq files where the Read 1 and Read 2 files (or when they are interleaved into one file) have the barcode indexes at the end of the fastq header lines in the actual Read 1 and Read 2 files; BBMap/BBDuk worked great for processing these files (e.g., adaptor removal, merging reads). However, in my current situation I have the barcodes in their own fastq file, and I can't seem to find a script in BBMap/BBDuk that accommodates my current situation.

    My questions are:

    Am I missing an argument or script in BBMap that will accommodate my situation?

    Or, does anyone know of a function or script outside of BBMap that will take the barcodes indexes in the index file and put them pack into the end of the header lines in the read1 and read2 files? I have found scripts that go the opposite way (e.g., Qiime's extract_barcodes.py script).

    Thanks for the assistance

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
37 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
47 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
57 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
43 views
0 likes
Last Post seqadmin  
Working...
X