Seqanswers Leaderboard Ad

**Heisman** · 10-11-2011, 06:38 PM

There is a demultiplexer script with Casava that you will probably find useful. I actually got it from a friend in a different lab so I am unsure how to find it online, but if you cannot find it I will send it to you somehow.

**kmcarr** · 10-12-2011, 05:34 AM

Silin,

I'm frankly a little appalled at the attitude displayed by your sequencing core facility. Delivering data to the client in a usable format is the job they contracted to do; if they find this job to onerous then perhaps they should get out of the business. Our core does demultiplexing of run data regularly, sometimes at much higher orders than 10 samples and would never think of shifting this task to the researcher. The latest versions of the Illumina software, specifically CASAVA 1.8.x, make demultiplexing dead simple. They have to run CASAVA anyway to convert the .bcl files to FASTQ anyway so there's no excuse to complain about doing it.

O.K., rant over.

As Heisman stated and I mentioned above, CASAVA makes demultiplexing easy but it uses the .bcl files as input, not FASTQs. To run this yourself you would need access to the entire run directory which is unreasonable. For you to demultiplex from FASTQ files would require writing (or finding) some custom scripts to read the FASTQ of the index read, store the IDs associated with each tag and then parse through the read file(s) to sort them. Doing it this way just seems so silly to me when I know how easy it is to do with CASAVA.

You indicated that they "tire" of "sending" the large number of files to you. How are they doing this? If they were smart they would post user data to an FTP server with individual client directories and logons to download their data. This is how we do it at our core and it is no more difficult to post 20 files than it is to post 1.

**silin284** · 10-12-2011, 07:11 AM

Hi Kmcarr and Heisman

Thanks for the replies. I can't blame the core for it too much, they have been stretched a lot recently. But I do need to find a way to sort the data...

With demultiplex, the fastq file is like:
@D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1

without demultiplexing:
@D3B4KKQ1_0182:3:1101:11400:2655#0/1

The ID does not have the barcode read sequence. So I can't write a script to sore it. Unless they can send me the fastq file for the index read. I can paste them together and sort it.

I am not sure how the core deal with the raw image file and convert them to fastq. Is there a way for them to generate a single fastq (or 2 fastq files: read1 + index read2) that can be sorted by us?

They are "tired" of sending data because they dont use FTP! they just scp each file to us. I am pretty sure they might not know how to use scp -r as well

**kmcarr** · 10-12-2011, 07:49 AM

Originally posted by silin284 View Post

I can't blame the core for it too much, they have been stretched a lot recently. But I do need to find a way to sort the data...

Perhaps is was a little under-caffeinated earlier and initial reaction reflected that.

I can certainly understand being stretched as I'm the only person to deal with the data stream from one HiSeq, one GAIIx, one 454 and soon an IonTorrent PGM.

With demultiplex, the fastq file is like:
@D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1

without demultiplexing:
@D3B4KKQ1_0182:3:1101:11400:2655#0/1

It's obvious from the format of the read IDs that your core is not using the latest versions of the Illumina software; the ID format changed with CASAVA 1.8.

You indicate that you are using TruSeq so your index (barcode) read is separate from your sequence read(s). Is your core providing you a FASTQ file of the index read? If they expect you to do the demultiplexing they will have to. If all you have as inputs are FASTQ files of the sequence read(s) and index read then no Illumina script will help. Their scripts perform demultiplexing prior to writing the FASTQ files. Let's assume that you have one FASTQ file with the index read and one fastq file with the sequence read (single read in this example). A second assumption is that the order of the reads in the two files is identical; this should be a safe assumption if the two files were produces concurrently by the Illumina software. The processes is conceptually simple in this case: 1) read the first entry from the index FASTQ and compare the sequence to a list of your tags to decide which one it is, 2) read the first sequence from your read FASTQ and then write it out to the appropriate file depending on the barcode, 3) repeat 200 million times for each lane of HiSeq data.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

can user do the demultiplexing (truseq)

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News