Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discrepancies in Demultiplexing programs - help request

    I have Illumina PE data from a MiSeq run and am finding large discrepancies in the number of reads recovered while demultiplexing.
    Originally, the fastq files provided with the run had very few reads per individual and a large (>4Gb) Undetermined file. I investigated the DemultiplexSummary provided with the run and found that the top 30 indexes found were in fact my indexes but for some reason (??) were not de-multiplexed correctly.
    Using the Fastx-barcode splitter (HammonLab), allowing a 1nucleotide mismatch, I was able to recover a small number of additional reads from the Undetermined file but not nearly the number stated in the summary.
    I then inputted the raw data into Geneious and after trimming the adapters with the bbduk input I de-multiplexed and found significantly more reads but still only approximately half of the number that should be present.

    Some Numbers:
    Barcode CAAAAG/CTTTTG
    DemultiplexSummary: 1,119,171 reads
    Fastq file unaltered from run: 1, 428 reads
    Fastx Barcode Splitter (on undetermined file): 118,031 reads
    Geneious: 574,498 reads

    My questions are:

    1 - why are the different programs giving such disparate results?

    2- Am I misunderstanding the orientation of the barcodes in the reads and thus perhaps searching for them incorrectly? It is my understanding that the "adapter - barcode - read" order should have the barcode as the first 6 bases in the R1 read after adapter trimming (in my example CAAAAG). R2 should not have(?) the barcode - or I should not have to search for a barcode in R2 anyways as I have paired the data? I recovered the reads in Geneious using CTTTTG as that was the index listed in the demultiplexsummary but my barcode as listed in my primer order was CAAAAG so I am concerned that I am misunderstanding a fundamental piece of the puzzle here.

    3 - Most importantly - how do I recover the 1million+ reads?!!?
    Attached Files

  • #2
    Originally posted by biodiverse View Post
    I have Illumina PE data from a MiSeq run and am finding large discrepancies in the number of reads recovered while demultiplexing.
    Originally, the fastq files provided with the run had very few reads per individual and a large (>4Gb) Undetermined file. I investigated the DemultiplexSummary provided with the run and found that the top 30 indexes found were in fact my indexes but for some reason (??) were not de-multiplexed correctly.
    Using the Fastx-barcode splitter (HammonLab), allowing a 1nucleotide mismatch, I was able to recover a small number of additional reads from the Undetermined file but not nearly the number stated in the summary.
    I then inputted the raw data into Geneious and after trimming the adapters with the bbduk input I de-multiplexed and found significantly more reads but still only approximately half of the number that should be present.

    Some Numbers:
    Barcode CAAAAG/CTTTTG
    DemultiplexSummary: 1,119,171 reads
    Fastq file unaltered from run: 1, 428 reads
    Fastx Barcode Splitter (on undetermined file): 118,031 reads
    Geneious: 574,498 reads

    My questions are:

    1 - why are the different programs giving such disparate results?

    2- Am I misunderstanding the orientation of the barcodes in the reads and thus perhaps searching for them incorrectly? It is my understanding that the "adapter - barcode - read" order should have the barcode as the first 6 bases in the R1 read after adapter trimming (in my example CAAAAG). R2 should not have(?) the barcode - or I should not have to search for a barcode in R2 anyways as I have paired the data? I recovered the reads in Geneious using CTTTTG as that was the index listed in the demultiplexsummary but my barcode as listed in my primer order was CAAAAG so I am concerned that I am misunderstanding a fundamental piece of the puzzle here.

    3 - Most importantly - how do I recover the 1million+ reads?!!?
    BD,

    First things first; assuming that your libraries are standard Illumina design your barcodes are NOT part of R1. In the standard Illumina library design and run configuration the index read is completely separate from R1 or R2. Unless your sequence provider also gave you the index read file (would have something like "I1" in place of R1 or R2 in the file name) then you don't have the information needed to demultiplex your reads.

    Can you provide a copy of the SampleSheet.csv file that was used for the MiSeq run? That will show the indexes actually used by the MiSeq to demultiplex your data. It sounds very likely that the SampleSheet.csv had the indexes entered in the wrong orientation.

    To determine the proper orientation we would need to see an example of your completed adapter sequences, or know what kit was used to construct your libraries.

    Comment


    • #3
      Thank you so much for your reply. I've attached both my sample sheet and my illumina adapter list with the barcodes. I didn't prepare the libraries with a kit but it was a standard prep.
      Could you please explain how the reads and barcodes are oriented? I have searched the literature for this but am clearly misunderstanding something.
      Attached Files

      Comment


      • #4
        Originally posted by biodiverse View Post
        Thank you so much for your reply. I've attached both my sample sheet and my illumina adapter list with the barcodes. I didn't prepare the libraries with a kit but it was a standard prep.
        Could you please explain how the reads and barcodes are oriented? I have searched the literature for this but am clearly misunderstanding something.
        As I suspected your SampleSheet has the indexes written in the wrong orientation. Here is a partial schematic of how the index read would be performed on libraries with the Index 1 primer in your protocol.

        Code:
        Index 1 p7 end primer with i7 index read primer annealed
        
                                          Index (i7) Read Sequencing Primer
                                       <-CACTGACCTCAAGTCTGCACACGAGAAGGCTAG-5’
        5'-CAAGCAGAAGACGGCATACGAGATatcacgGTGACTGGAGTTCAGACGTGT-3'
        
        Index 1 will be read as CGTGAT
        
        (Obviously your actual library molecules would extend out from the 3' end of
        the Index 1 primer, further complementary to the i7 index read sequencing
        primer.)
        To fix your problem you will need to ask your sequencing provider to repeat the original BCL to FastQ conversion with demultiplexing after reverse complementing all of the indexes in the SampleSheet.csv file.

        Comment


        • #5
          It looks like the index sequences in your sample sheet are in the wrong orientation. The demux report shows that the reverse complement of your sample sheet index sequences is what was found in your run.


          Here is a link to a useful document that shows all of the reads used by Illumina sequencers.
          Josh Kinman

          Comment


          • #6
            Originally posted by biodiverse View Post
            Thank you so much for your reply. I've attached both my sample sheet and my illumina adapter list with the barcodes. I didn't prepare the libraries with a kit but it was a standard prep.
            Could you please explain how the reads and barcodes are oriented? I have searched the literature for this but am clearly misunderstanding something.
            As suspected the index sequences in the SampleSheet are in the wrong orientation.

            Code:
            Index1 p7 end primer showing index read (i7) primer annealed 
            
                                              Index (i7) Read Sequencing Primer
                                           <-CACTGACCTCAAGTCTGCACACGAGAAGGCTAG-5’
            5'-CAAGCAGAAGACGGCATACGAGATatcacgGTGACTGGAGTTCAGACGTGT-3'
            
            (Obviously your full library fragment will extend beyond the 3' end of 
            Index1 primer. Truncated for clarity.)
            
            Index1 will be read as CGTGAT
            To fix this problem you will need to correct the SampleSheet.csv file by reverse complementing the indexes. Then repeat the Bcl2Fastq conversion with demultiplexing. I imagine you will need to get your sequencing provider involved since they have the original base call (.bcl) files with the index read data.

            Comment


            • #7
              Thank you so much for your help, I will do this!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin




                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin


                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 11:09 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-19-2024, 07:20 AM
              0 responses
              148 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-16-2024, 05:49 AM
              0 responses
              124 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-15-2024, 06:53 AM
              0 responses
              111 views
              0 likes
              Last Post seqadmin  
              Working...
              X