Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with De-Multiplexing MiSeq Data

    Hello,

    I cannot find a decent program or script to de-multiplex data where I have 3 fastq files: XXX_R1.fastq, XXX_R2.fastq, and XXX_I.fastq. The Index file has the same structure as a fastq and shares all the read hashes, but only has the barcode; I want to split the R1 and R2 files based on this barcode.

    Any Suggestions?

    Thanks.

  • #2
    I think you will find there is a reference to the barcode/sample at the end of the read name for each read. That might help.
    Last edited by Bukowski; 08-10-2012, 10:24 AM.

    Comment


    • #3
      This thread may help:
      Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

      Comment


      • #4
        also it looks like picard can do this:

        Comment


        • #5
          Originally posted by celzinga View Post
          also it looks like picard can do this:
          http://picard.sourceforge.net/comman...luminaBarcodes

          Um. I don't see how that tool has anything to do with this problem. I don't need to extract the barcodes at all. I have three fastq files. First fastq is the barcodes already, I.E.:

          Code:
          @M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
          AACCGAGA
          +
          ?AAAAAAB
          @M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
          AACCGAGA
          +
          ???A?@@B
          @M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
          AACCGAGA
          +
          A?AAAAAA
          @M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
          AAACATCA
          +
          AAAAABBB
          Then the two files for both paired ends...I.E.:

          Code:
          @M00511:27:000000000-A1F08:1:1:17545:1321 1:N:0:0
          NCGGGCACGACCATCACCATCATCATACGACGAACCAACGGGCATTATTCTGGTCGTTCGTCCTGATTGCGACGTTCATGGTCGTCGAAGTCATCGGCGGATTATGGACGAACAGTTTTGCGCTCTTGTCGGACGCCGGGCATATGCTTAG
          +
          #5<???AADDEEEDDDGGGGGGIIIIIIIIHHHHHHIIHHHHHHIIIIIIIIHIIHHHIHHHHHIIIIHHHHHHHHFHHHHHHGGFGGGGGGGGGGEGGG'.8:C*CCCD4A''*1CE*0:8'4C.:*:?)''.'.'.''2'**0*1:?:1
          @M00511:27:000000000-A1F08:1:1:16720:1322 1:N:0:0
          NCATACGTACCACCGATGACACCACCGACAAGCGGAACCATCTTCCCAAGATTAACGACCCCCGTATTCCCGAACTTCGTCAATAAGCGGAATCCGACTTTCTGATTGATTTTTTTGATGGTCGATCCAGGAATCTTCTTAATCATATTGA
          +
          #5<???BBDDDDDEDDFEFFFFIIIHHHHHHHIHHEHHIHIIIIIIIIIIIIIIIIHHHHHHHHDCFHHFHHHEHFDFH?DF;DFFDFEE=EFFA?A@BAEEFFEEEF=ABA?:8>DACAECEDD8A8*?*0:CCA0*::C*:ACA*:E:*
          @M00511:27:000000000-A1F08:1:1:17118:1322 1:N:0:0
          NTCCGCGTGACGGCGATGCCAGAGCGACGGGCCGCCTCGACGTTCGAGCCGACGTAATAAAACTCACGTCCTGTCTTCGAATACGTCAAAAACAGATGCGCCCCGGCGAAGAACAGAAGCATCAAGATGGCGACGAACGGGACAGGTCCGT
          +
          #5<???@@DDDDDDDDEEEFFFHHIHHHHHHHHHHHHHHHHHHHHEFHHHHEFFEFFEFFEEFFFFFFFFEEFFFFFFFEFFEFFFEE8A:CEEFEFEFDEADD?DDD'8>8?C:?E:*?:CAE0?::**:2'8;>2>').?8A))1*0'*
          @M00511:27:000000000-A1F08:1:1:17183:1322 1:N:0:0
          NATCGGAAGAGCACACGTCTGAACTCCAGTCACAAACATCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAGACAGAACGAGACAAAAGAAGCACAAATCCGTAATCGATGAGACTTAATGCGAGATCATGACACCATTGTAA
          +
          #5<???AAEDEDDDDDGGGGGGIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIHHHIIIIIIIIHHIIIIHHHHHD4)42**,,,,,,***3*,4,,,*4,,,3,0****)0*))*)0.************)).'0*1******)*******
          and the according mate-pairs of all of those.

          I do not want three files as they are. I know which barcodes go with which hashes.

          RUN1_I1.fastq
          RUN1_R1.fastq
          RUN1_R2.fastq

          Need to be converted into...

          RUN1_R1_AACCGAGA.fastq
          RUN1_R2_AACCGAGA.fastq
          RUN1_R1_AAACATCA.fastq
          RUN1_R2_AAACATCA.fastq

          etc etc.

          Personally I am beyond flabbergasted that the output of this damnable thing is not the same as the HiSeq - I just want the fastqs sorted by the barcode, it does nothing for me the user to have the barcode/has pairs in a separate file.

          Comment


          • #6
            Did you get this run at a core facility? I am not sure why that facility did not do the de-multiplexing for you. It should be trivial for them to do this since they would have access to the raw data folder and CASAVA pipeline.

            Comment


            • #7
              Hi,

              I've attached my approach to demultiplexing the MiSeq files. Note that it uses the MiSeq assigned sample idx to name the output files, NOT the barcode. This means you get all reads for the sample, also those with a mismatch in the barcode. It outputs three files per sample: forward reads, reverse reads, and interlaced reads. We use the interlaced reads in galaxy for batch workflow starting.

              For files:
              RUN1_I1.fastq
              RUN1_R1.fastq
              RUN1_R2.fastq

              Run as:
              perl demultiplex_miseq.pl RUN1

              Output will be in 'output/' folder. It will also create a file containing all barcodes used per sample, and print the read count per sample.
              Attached Files

              Comment


              • #8


                or

                Galaxy is a community-driven web-based analysis platform for life science research.

                Look under NGS Toolbox Beta, NGS: QC and manipulation

                Barcode splitter and other FASTQ manipulations

                Comment


                • #9
                  What is an interlaced read?

                  Comment

                  Latest Articles

                  Collapse

                  • noor121
                    Reply to Latest Developments in Precision Medicine
                    by noor121
                    Qadri offers efficient online services designed for students and staff of University Targu Mures Medical Campus Hamburg. We streamline your academic and administrative processes for a hassle-free experience.

                    VIsit us:
                    https://qadri-international.com/univ...s-hamburg-umch...
                    Today, 09:33 PM
                  • seqadmin
                    Non-Coding RNA Research and Technologies
                    by seqadmin




                    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                    Nobel Prize for MicroRNA Discovery
                    This week,...
                    Yesterday, 08:07 AM
                  • seqadmin
                    Recent Developments in Metagenomics
                    by seqadmin





                    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                    09-23-2024, 06:35 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 10-02-2024, 04:51 AM
                  0 responses
                  96 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-01-2024, 07:10 AM
                  0 responses
                  107 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-30-2024, 08:33 AM
                  1 response
                  107 views
                  0 likes
                  Last Post EmiTom
                  by EmiTom
                   
                  Started by seqadmin, 09-26-2024, 12:57 PM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X