Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Cirno
    Junior Member
    • Jun 2011
    • 6

    Help with De-Multiplexing MiSeq Data

    Hello,

    I cannot find a decent program or script to de-multiplex data where I have 3 fastq files: XXX_R1.fastq, XXX_R2.fastq, and XXX_I.fastq. The Index file has the same structure as a fastq and shares all the read hashes, but only has the barcode; I want to split the R1 and R2 files based on this barcode.

    Any Suggestions?

    Thanks.
  • Bukowski
    Senior Member
    • Jan 2010
    • 388

    #2
    I think you will find there is a reference to the barcode/sample at the end of the read name for each read. That might help.
    Last edited by Bukowski; 08-10-2012, 10:24 AM.

    Comment

    • celzinga
      Junior Member
      • Nov 2011
      • 2

      #3
      This thread may help:
      Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

      Comment

      • celzinga
        Junior Member
        • Nov 2011
        • 2

        #4
        also it looks like picard can do this:

        Comment

        • Cirno
          Junior Member
          • Jun 2011
          • 6

          #5
          Originally posted by celzinga View Post
          also it looks like picard can do this:
          http://picard.sourceforge.net/comman...luminaBarcodes

          Um. I don't see how that tool has anything to do with this problem. I don't need to extract the barcodes at all. I have three fastq files. First fastq is the barcodes already, I.E.:

          Code:
          @M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
          AACCGAGA
          +
          ?AAAAAAB
          @M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
          AACCGAGA
          +
          ???A?@@B
          @M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
          AACCGAGA
          +
          A?AAAAAA
          @M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
          AAACATCA
          +
          AAAAABBB
          Then the two files for both paired ends...I.E.:

          Code:
          @M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
          NCGGGCACGACCATCACCATCATCATACGACGAACCAACGGGCATTATTCTGGTCGTTCGTCCTGATTGCGACGTTCATGGTCGTCGAAGTCATCGGCGGATTATGGACGAACAGTTTTGCGCTCTTGTCGGACGCCGGGCATATGCTTAG
          +
          #5<???AADDEEEDDDGGGGGGIIIIIIIIHHHHHHIIHHHHHHIIIIIIIIHIIHHHIHHHHHIIIIHHHHHHHHFHHHHHHGGFGGGGGGGGGGEGGG'.8:C*CCCD4A''*1CE*0:8'4C.:*:?)''.'.'.''2'**0*1:?:1
          @M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
          NCATACGTACCACCGATGACACCACCGACAAGCGGAACCATCTTCCCAAGATTAACGACCCCCGTATTCCCGAACTTCGTCAATAAGCGGAATCCGACTTTCTGATTGATTTTTTTGATGGTCGATCCAGGAATCTTCTTAATCATATTGA
          +
          #5<???BBDDDDDEDDFEFFFFIIIHHHHHHHIHHEHHIHIIIIIIIIIIIIIIIIHHHHHHHHDCFHHFHHHEHFDFH?DF;DFFDFEE=EFFA?A@BAEEFFEEEF=ABA?:8>DACAECEDD8A8*?*0:CCA0*::C*:ACA*:E:*
          @M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
          NTCCGCGTGACGGCGATGCCAGAGCGACGGGCCGCCTCGACGTTCGAGCCGACGTAATAAAACTCACGTCCTGTCTTCGAATACGTCAAAAACAGATGCGCCCCGGCGAAGAACAGAAGCATCAAGATGGCGACGAACGGGACAGGTCCGT
          +
          #5<???@@DDDDDDDDEEEFFFHHIHHHHHHHHHHHHHHHHHHHHEFHHHHEFFEFFEFFEEFFFFFFFFEEFFFFFFFEFFEFFFEE8A:CEEFEFEFDEADD?DDD'8>8?C:?E:*?:CAE0?::**:2'8;>2>').?8A))1*0'*
          @M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
          NATCGGAAGAGCACACGTCTGAACTCCAGTCACAAACATCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAGACAGAACGAGACAAAAGAAGCACAAATCCGTAATCGATGAGACTTAATGCGAGATCATGACACCATTGTAA
          +
          #5<???AAEDEDDDDDGGGGGGIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIHHHIIIIIIIIHHIIIIHHHHHD4)42**,,,,,,***3*,4,,,*4,,,3,0****)0*))*)0.************)).'0*1******)*******
          and the according mate-pairs of all of those.

          I do not want three files as they are. I know which barcodes go with which hashes.

          RUN1_I1.fastq
          RUN1_R1.fastq
          RUN1_R2.fastq

          Need to be converted into...

          RUN1_R1_AACCGAGA.fastq
          RUN1_R2_AACCGAGA.fastq
          RUN1_R1_AAACATCA.fastq
          RUN1_R2_AAACATCA.fastq

          etc etc.

          Personally I am beyond flabbergasted that the output of this damnable thing is not the same as the HiSeq - I just want the fastqs sorted by the barcode, it does nothing for me the user to have the barcode/has pairs in a separate file.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Did you get this run at a core facility? I am not sure why that facility did not do the de-multiplexing for you. It should be trivial for them to do this since they would have access to the raw data folder and CASAVA pipeline.

            Comment

            • geertvandeweyer
              Member
              • Jan 2011
              • 14

              #7
              Hi,

              I've attached my approach to demultiplexing the MiSeq files. Note that it uses the MiSeq assigned sample idx to name the output files, NOT the barcode. This means you get all reads for the sample, also those with a mismatch in the barcode. It outputs three files per sample: forward reads, reverse reads, and interlaced reads. We use the interlaced reads in galaxy for batch workflow starting.

              For files:
              RUN1_I1.fastq
              RUN1_R1.fastq
              RUN1_R2.fastq

              Run as:
              perl demultiplex_miseq.pl RUN1

              Output will be in 'output/' folder. It will also create a file containing all barcodes used per sample, and print the read count per sample.
              Attached Files

              Comment

              • JackieBadger
                Senior Member
                • Mar 2009
                • 385

                #8


                or

                Galaxy is a community-driven web-based analysis platform for life science research.

                Look under NGS Toolbox Beta, NGS: QC and manipulation

                Barcode splitter and other FASTQ manipulations

                Comment

                • swNGS
                  Member
                  • Nov 2011
                  • 83

                  #9
                  What is an interlaced read?

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    Yesterday, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 11:08 AM
                  0 responses
                  6 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  18 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  53 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...