Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplex Illumina reads

    Hi Everyone,
    I am kind of stuck with my Illumina data, I want to remove the barcodes from my reads. My read file looks like this
    @HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 1:N:0:
    NACAGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    +
    #1=DDDFFCADHHIIIIIIIIGIIIIIIIGIIIIIIIFHIIIIICFHH################################
    #######################################################################
    and my barcode files look like this:
    @HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 2:N:0:
    NNNNNANACACA
    +
    ############
    When I am using fastx toolkit to trim barcodes I am getting error.
    The command I am using is:
    cat lane5_NoIndex_L005_R1_001.fastq | /u2/software/fastx/fastx_toolkit-0.0.13.2/bin/fastx_barcode_splitter.pl --bcfile lane5_NoIndex_L005_R2_001.fastq --bol --prefix x --suffix ".fastq"
    The error I am getting is:
    Error: bad barcode value (2:N:0 at barcode file (lane5_NoIndex_L005_R2_001.fastq) line 1
    The reason I think is beacuse of 2:N:0 in the barcode header and 1:N:0 in the reads header.
    I am not sure how to rectify this, please if anyone has any idea could you please help me.

    Thanks!!!!

  • #2
    Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.

    Comment


    • #3
      Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

      All our customers are quite happy that this work has already been done when they get their data :-)

      Sven

      Comment


      • #4
        BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
        I would do everything with a program where you set the parameters and know what is going in and what should come out.

        Comment


        • #5
          Originally posted by JackieBadger View Post
          Use Trimmomatic... much more versatile and mate-pair aware. Or just use the trim function in text manipulation in Galaxy.
          Thanks jackieBadger,
          I will try it!

          Comment


          • #6
            Originally posted by JackieBadger View Post
            BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
            I would do everything with a program where you set the parameters and know what is going in and what should come out.
            Thanks JackieBadger,
            I am planning to use QIME, so hopefully I will not encounter such issues.



            Thanks for the help!!!

            Comment


            • #7
              Originally posted by sklages View Post
              Why don't you ask your sequencing provider to demultiplex the data for you? It is not more work for them, as it is part of the fastq processing; you just need to provide some kind of sample description, a samplesheet.

              All our customers are quite happy that this work has already been done when they get their data :-)

              Sven
              Thanks Seven,
              Ii would be good if they demultiplex the data before sending, but in my case it is not.

              Comment


              • #8
                Originally posted by JackieBadger View Post
                BEWARE! You must always have the ability to check any processing a provider does for you... for e.g. the trimming script on the MiSeq software should be avoided as it is VERY promiscuous...even in the "remedied" latest update. Also...I have come across significant errors in the MiSeq de-multiplexing.
                I would do everything with a program where you set the parameters and know what is going in and what should come out.
                hhm, it's pretty easy to check what the provider does if you also have the "Undetermined_indices" data files. MiSeq is another thing ... the trimming issue is known and should not be used (currently). You could also ask for some (demultiplexing) stats, to see if the results are "good" or as expected.

                If you don't trust in your sequence provider at all, you should look for another one ;-)

                What "significant errors" did you encounter in the MiSeq demultiplexing?
                We are not plexing Miseq libs, so I am just curious :-)

                Sven

                Comment


                • #9
                  Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.

                  Comment


                  • #10
                    Originally posted by NextGenSeq View Post
                    Are you sure they did an index read? The title of your files say No_Index. The only time we ever get data titled like this is if a index read wasn't done.
                    No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
                    You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

                    Sven

                    Comment


                    • #11
                      Originally posted by sklages View Post
                      No, you'll always get that naming when there is not index specified in the samplesheet for that run (irrespective if there was run an index read).
                      You cannot safely deduce from the naming wether there has been run an index read or not (at least for the "_NoIndex_" case)..

                      Sven
                      Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.

                      Comment


                      • #12
                        Originally posted by GenoMax View Post
                        Let us hope that is the case. If the facility did not run this as a multiplex sample then OP is out of luck. This run will have to be repeated.
                        Sure, you are absolutely right.

                        This problem might arise if the customer doesn't mention any indices that need to be demultiplexed in their "order" (however this order looks like), maybe assuming that this is not relevant for the sequencing run itself but for the post-processing only.
                        ... and the sequencing core organizes their FCs with respect to read length and MP/no MP ...

                        We had a similar post a while ago, where the OP has hand-written a little note on the "order sheet" and as a result the sequencing didn't recognize it as "please do an index read, as my libraries have indices" ...

                        Sven

                        Comment


                        • #13
                          Originally posted by newBioinfo View Post
                          Thanks JackieBadger,
                          I am planning to use QIME, so hopefully I will not encounter such issues.



                          Thanks for the help!!!
                          If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.

                          Comment


                          • #14
                            Originally posted by AKrohn View Post
                            If you are using QIIME, then you will have the option to remove unwanted sequences (such as barcodes) during the split_libraries_fastq.py step. See http://qiime.org/scripts/split_libraries_fastq.html for more information.
                            Oops for this application you want split_libraries.py script instead.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            27 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            24 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X