Originally posted by silin284
View Post

With demultiplex, the fastq file is like:
@D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1
without demultiplexing:
@D3B4KKQ1_0182:3:1101:11400:2655#0/1
@D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1
without demultiplexing:
@D3B4KKQ1_0182:3:1101:11400:2655#0/1
You indicate that you are using TruSeq so your index (barcode) read is separate from your sequence read(s). Is your core providing you a FASTQ file of the index read? If they expect you to do the demultiplexing they will have to. If all you have as inputs are FASTQ files of the sequence read(s) and index read then no Illumina script will help. Their scripts perform demultiplexing prior to writing the FASTQ files. Let's assume that you have one FASTQ file with the index read and one fastq file with the sequence read (single read in this example). A second assumption is that the order of the reads in the two files is identical; this should be a safe assumption if the two files were produces concurrently by the Illumina software. The processes is conceptually simple in this case: 1) read the first entry from the index FASTQ and compare the sequence to a list of your tags to decide which one it is, 2) read the first sequence from your read FASTQ and then write it out to the appropriate file depending on the barcode, 3) repeat 200 million times for each lane of HiSeq data.
Leave a comment: