Unconfigured Ad

**Bukowski** · 08-10-2012, 10:21 AM

I think you will find there is a reference to the barcode/sample at the end of the read name for each read. That might help.

**celzinga** · 08-10-2012, 10:31 AM

This thread may help:

How to Demultiplex a Nextera paired-end MiSeq run - SEQanswers

http://seqanswers.com/forums/showthread.php?t=17620

Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

**celzinga** · 08-10-2012, 10:42 AM

also it looks like picard can do this:

404 Not Found

http://picard.sourceforge.net/command-line-overview.shtml#ExtractIlluminaBarcodes

**Cirno** · 08-10-2012, 03:40 PM

Originally posted by celzinga View Post

also it looks like picard can do this:
http://picard.sourceforge.net/comman...luminaBarcodes

Um. I don't see how that tool has anything to do with this problem. I don't need to extract the barcodes at all. I have three fastq files. First fastq is the barcodes already, I.E.:

Code:

@M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
AACCGAGA
+
?AAAAAAB
@M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
AACCGAGA
+
???A?@@B
@M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
AACCGAGA
+
A?AAAAAA
@M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
AAACATCA
+
AAAAABBB

Then the two files for both paired ends...I.E.:

Code:

@M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
NCGGGCACGACCATCACCATCATCATACGACGAACCAACGGGCATTATTCTGGTCGTTCGTCCTGATTGCGACGTTCATGGTCGTCGAAGTCATCGGCGGATTATGGACGAACAGTTTTGCGCTCTTGTCGGACGCCGGGCATATGCTTAG
+
#5<???AADDEEEDDDGGGGGGIIIIIIIIHHHHHHIIHHHHHHIIIIIIIIHIIHHHIHHHHHIIIIHHHHHHHHFHHHHHHGGFGGGGGGGGGGEGGG'.8:C*CCCD4A''*1CE*0:8'4C.:*:?)''.'.'.''2'**0*1:?:1
@M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
NCATACGTACCACCGATGACACCACCGACAAGCGGAACCATCTTCCCAAGATTAACGACCCCCGTATTCCCGAACTTCGTCAATAAGCGGAATCCGACTTTCTGATTGATTTTTTTGATGGTCGATCCAGGAATCTTCTTAATCATATTGA
+
#5<???BBDDDDDEDDFEFFFFIIIHHHHHHHIHHEHHIHIIIIIIIIIIIIIIIIHHHHHHHHDCFHHFHHHEHFDFH?DF;DFFDFEE=EFFA?A@BAEEFFEEEF=ABA?:8>DACAECEDD8A8*?*0:CCA0*::C*:ACA*:E:*
@M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
NTCCGCGTGACGGCGATGCCAGAGCGACGGGCCGCCTCGACGTTCGAGCCGACGTAATAAAACTCACGTCCTGTCTTCGAATACGTCAAAAACAGATGCGCCCCGGCGAAGAACAGAAGCATCAAGATGGCGACGAACGGGACAGGTCCGT
+
#5<???@@DDDDDDDDEEEFFFHHIHHHHHHHHHHHHHHHHHHHHEFHHHHEFFEFFEFFEEFFFFFFFFEEFFFFFFFEFFEFFFEE8A:CEEFEFEFDEADD?DDD'8>8?C:?E:*?:CAE0?::**:2'8;>2>').?8A))1*0'*
@M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
NATCGGAAGAGCACACGTCTGAACTCCAGTCACAAACATCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAGACAGAACGAGACAAAAGAAGCACAAATCCGTAATCGATGAGACTTAATGCGAGATCATGACACCATTGTAA
+
#5<???AAEDEDDDDDGGGGGGIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIHHHIIIIIIIIHHIIIIHHHHHD4)42**,,,,,,***3*,4,,,*4,,,3,0****)0*))*)0.************)).'0*1******)*******

and the according mate-pairs of all of those.

I do not want three files as they are. I know which barcodes go with which hashes.

RUN1_I1.fastq
RUN1_R1.fastq
RUN1_R2.fastq

Need to be converted into...

RUN1_R1_AACCGAGA.fastq
RUN1_R2_AACCGAGA.fastq
RUN1_R1_AAACATCA.fastq
RUN1_R2_AAACATCA.fastq

etc etc.

Personally I am beyond flabbergasted that the output of this damnable thing is not the same as the HiSeq - I just want the fastqs sorted by the barcode, it does nothing for me the user to have the barcode/has pairs in a separate file.

**GenoMax** · 08-13-2012, 03:53 AM

Did you get this run at a core facility? I am not sure why that facility did not do the de-multiplexing for you. It should be trivial for them to do this since they would have access to the raw data folder and CASAVA pipeline.

**geertvandeweyer** · 08-13-2012, 06:44 AM

Hi,

I've attached my approach to demultiplexing the MiSeq files. Note that it uses the MiSeq assigned sample idx to name the output files, NOT the barcode. This means you get all reads for the sample, also those with a mismatch in the barcode. It outputs three files per sample: forward reads, reverse reads, and interlaced reads. We use the interlaced reads in galaxy for batch workflow starting.

For files:
RUN1_I1.fastq
RUN1_R1.fastq
RUN1_R2.fastq

Run as:
perl demultiplex_miseq.pl RUN1

Output will be in 'output/' folder. It will also create a file containing all barcodes used per sample, and print the read count per sample.

Attached Files

demultiplex_MiSeq.pl (2.8 KB, 264 views)

**JackieBadger** · 08-13-2012, 05:04 PM

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/ea-utils/

or

Galaxy

https://main.g2.bx.psu.edu/root

Galaxy is a community-driven web-based analysis platform for life science research.

Look under NGS Toolbox Beta, NGS: QC and manipulation

Barcode splitter and other FASTQ manipulations

**swNGS** · 08-16-2012, 01:51 PM

What is an interlaced read?

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Help with De-Multiplexing MiSeq Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News