Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Retrieve MiSeq Data still containing index primers etc

    Hi all

    We have a pipeline that we have developed that currently works on both 454 and Ion Torrent data.

    The pipeline is always run on multiplexed data and the sequence input is currently a fastq file that contains all of the information to undertake the demultiplexing of the data and all subsequent analysis is run on all the data from each MID separately.

    Collaborators have now generated similar data using an Illumina MiSeq however when they sent us the data we see that the data is already demultiplexed with tags etc stripped.

    What I want to know is there anyway that a single fastq/sff/etc file can be created during a MiSeq run from the output data (or during data generation) that contains the MIDs etc still on the data and has all the data together in one file?

    I've done extensive reading on this and it seems that the best way to do this is to convert the multiple .bcl files to fastq?

    Is there a better/easier way to do this?


  • #2
    One can potentially create a single file from a multiplexed run by running CASAVA pipeline with a single barcode like (NNNNNN-NNNNN). This way all data ends up in "undetermined" file along with all tag information intact in the header. You would need to write a script to de-multiplex this data or reformat it in a way your pipeline expects it. This could potentially work on the MiSeq itself (for analysis) though we have not tried it.

    If you are looking to get the tag reads in a separate file (e.g. Qiime) then one can reanalyze the data using the MiSeq reporter after making a change to the "MiSeqReporterConfig" file. Unless you own the MiSeq (or have direct access to it) this may not be an option for you.
    Last edited by GenoMax; 06-05-2014, 05:21 AM.


    • #3
      Thanks for your reply, very helpful!

      I hadn't realised that it was possible to output the index information in the header. In the data I've just received the last element of the descriptor is a number as opposed to the index sequence. This may be related to the fact that both the i5 index and i7 index were used?

      It appears to link to the order in which the sequences are listed in the SampleSheet.csv file:

      1st @M02143:21:000000000-A8YDD:1:1101:16271:1876 2:N:0:1 used N701 and S501
      7th @M02143:21:000000000-A8YDD:1:1101:18009:1813 1:N:0:7 used N701 and S502

      So that would mean that we could use the SampleSheet.csv file to 'demultiplex' the data if people are uploading all the data in a single fastq file (or two in the case of paired-end sequencing).

      Thanks again, you helped me find the missing clue to solve the puzzle!!


      • #4
        I always forget that actual sequence tag information is included in the header ONLY if the data was processed by CASAVA (i.e. not on the MiSeq) offline. So for the first option I mentioned the analysis will have to be done offline (would not work on MiSeq) to get the tag information in the read ID's.
        Last edited by GenoMax; 06-05-2014, 05:24 AM.


        Latest Articles


        • seqadmin
          Quality Control Essentials for Next-Generation Sequencing Workflows
          by seqadmin

          Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

          Nucleic Acid Quality Control
          Preparing for NGS starts with isolating the...
          02-10-2025, 01:58 PM
        • seqadmin
          An Introduction to the Technologies Transforming Precision Medicine
          by seqadmin

          In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
          01-27-2025, 07:46 AM





        Topics Statistics Last Post
        Started by seqadmin, 02-07-2025, 09:30 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 02-05-2025, 10:34 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 02-03-2025, 09:07 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 01-31-2025, 08:31 AM
        0 responses
        Last Post seqadmin  