Announcement

Collapse
No announcement yet.

can user do the demultiplexing (truseq)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • can user do the demultiplexing (truseq)

    Hi

    Our sequencing core is getting "tried" of demultiplexing the data for us since we put 20 homemade truseq index into each lane. They think it is a pain to send 7 x 20 fastq files to us each time.

    I suppose they can send us just 2 fastq files for read1 and the read2 (index read)? If that can be done, we can sort the reads ourselves.

    Has anyone done that before? Or there is a better way to do it?

    thanks in advance
    silin

  • #2
    There is a demultiplexer script with Casava that you will probably find useful. I actually got it from a friend in a different lab so I am unsure how to find it online, but if you cannot find it I will send it to you somehow.

    Comment


    • #3
      Silin,

      I'm frankly a little appalled at the attitude displayed by your sequencing core facility. Delivering data to the client in a usable format is the job they contracted to do; if they find this job to onerous then perhaps they should get out of the business. Our core does demultiplexing of run data regularly, sometimes at much higher orders than 10 samples and would never think of shifting this task to the researcher. The latest versions of the Illumina software, specifically CASAVA 1.8.x, make demultiplexing dead simple. They have to run CASAVA anyway to convert the .bcl files to FASTQ anyway so there's no excuse to complain about doing it.

      O.K., rant over.

      As Heisman stated and I mentioned above, CASAVA makes demultiplexing easy but it uses the .bcl files as input, not FASTQs. To run this yourself you would need access to the entire run directory which is unreasonable. For you to demultiplex from FASTQ files would require writing (or finding) some custom scripts to read the FASTQ of the index read, store the IDs associated with each tag and then parse through the read file(s) to sort them. Doing it this way just seems so silly to me when I know how easy it is to do with CASAVA.

      You indicated that they "tire" of "sending" the large number of files to you. How are they doing this? If they were smart they would post user data to an FTP server with individual client directories and logons to download their data. This is how we do it at our core and it is no more difficult to post 20 files than it is to post 1.

      Comment


      • #4
        Hi Kmcarr and Heisman

        Thanks for the replies. I can't blame the core for it too much, they have been stretched a lot recently. But I do need to find a way to sort the data...

        With demultiplex, the fastq file is like:
        @D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1

        without demultiplexing:
        @D3B4KKQ1_0182:3:1101:11400:2655#0/1

        The ID does not have the barcode read sequence. So I can't write a script to sore it. Unless they can send me the fastq file for the index read. I can paste them together and sort it.

        I am not sure how the core deal with the raw image file and convert them to fastq. Is there a way for them to generate a single fastq (or 2 fastq files: read1 + index read2) that can be sorted by us?

        They are "tired" of sending data because they dont use FTP! they just scp each file to us. I am pretty sure they might not know how to use scp -r as well

        Comment


        • #5
          Originally posted by silin284 View Post
          I can't blame the core for it too much, they have been stretched a lot recently. But I do need to find a way to sort the data...
          Perhaps is was a little under-caffeinated earlier and initial reaction reflected that. I can certainly understand being stretched as I'm the only person to deal with the data stream from one HiSeq, one GAIIx, one 454 and soon an IonTorrent PGM.
          With demultiplex, the fastq file is like:
          @D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1

          without demultiplexing:
          @D3B4KKQ1_0182:3:1101:11400:2655#0/1
          It's obvious from the format of the read IDs that your core is not using the latest versions of the Illumina software; the ID format changed with CASAVA 1.8.

          You indicate that you are using TruSeq so your index (barcode) read is separate from your sequence read(s). Is your core providing you a FASTQ file of the index read? If they expect you to do the demultiplexing they will have to. If all you have as inputs are FASTQ files of the sequence read(s) and index read then no Illumina script will help. Their scripts perform demultiplexing prior to writing the FASTQ files. Let's assume that you have one FASTQ file with the index read and one fastq file with the sequence read (single read in this example). A second assumption is that the order of the reads in the two files is identical; this should be a safe assumption if the two files were produces concurrently by the Illumina software. The processes is conceptually simple in this case: 1) read the first entry from the index FASTQ and compare the sequence to a list of your tags to decide which one it is, 2) read the first sequence from your read FASTQ and then write it out to the appropriate file depending on the barcode, 3) repeat 200 million times for each lane of HiSeq data.

          Comment

          Working...
          X