Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adapters and multiplexing

    I am trying to trim adapaters and there is a thing that doesn't quite make sense to me. I've done some simplifications in the description here to focus on the issue. Thanks in advance for any help on the matter.


    I've had two samples sequenced with Illumina HiSeq 2000, 50 PE in the same line.

    Sample 1 has indexes (barcodes)
    Code:
    TAAGGCGA and [COLOR="Blue"]TAGATCGC[/COLOR]
    Sample 2 has indexes
    Code:
    [B][COLOR="DarkGreen"]GCTACGCT[/COLOR][/B] and [COLOR="blue"]TAGATCGC[/COLOR]
    Note that both samples have same index 2.


    When I try to trim for the adapters in mate 1 of sample 1, I sometimes find indexes from sample 2. See example below.

    Code:
    [B]Sample 1, mate 1[/B]
    
    @HISEQ2:697:H2NFYBCXX:1:1101:11477:57300 1:N:0:TAAGGCGA[COLOR="Blue"]TAGATCGC[/COLOR]
    [COLOR="DarkOrange"]TCTCCGAGCCCACGAGAC[/COLOR][B][COLOR="DarkGreen"]GCTACGCT[/COLOR][/B][COLOR="Red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]A
    +
    D@DDBCEHHIIIIHIIIIIIIIIHIDHE?HHHIEEGHHHEHCHFEFG@CHH

    As far as I am concerned, this should not happen.

    I've tried to read up on the theories on how this work as good as I can, but cannot find a good explanation for this phenomena. Is there a rational explanation to this? What am I missing?

    Thanks,

  • #2
    That's quite interesting! You're right, it should not happen.

    But there are a few possibilities. 1 is chimeric molecules that have an adapter sequence internally in addition to the adapters on the ends. Perhaps that's more common with Nextera...

    Another is that the index and read cycles occur at different times, and not necessarily on the same physical cluster (due to cluster regeneration). IIRC the HiSeq reads both barcodes from the same cluster but I can't remember if it's from the initial cluster or regenerated cluster. Anyway, it's possible that there are two clusters very close together, and one gets assigned the other's index... or something like that. We've been trying to determine exactly what causes cross-contamination (multiplexed samples getting assigned the wrong barcode) for a year without anything absolutely conclusive, but this is a neat piece of evidence.

    I'm going to guess chimerism, in this case. How often does this happen, relative to "correct" adapters? Also, with normal adapters, the position of the adapter in read 1 is the same as the position of the adapter in read 2. Are you seeing that in this case? Would you mind posting this reads' mate?

    Comment


    • #3
      You need to check with the sequencing centre, in case there was some mixup when demultiplexing the reads.

      How many of the sample1 reads which read far enough into the adapter sequences have the GCTACGCT sequence, and what barcode do you see in the sample 2 reads which have read into the adapter sequences?

      Comment


      • #4
        Thanks for the reply.

        I haven’t done the wet part, but I am a bit doubtful regarding chimeric molecules. When the tubes are attached the molecules are in different tubes so I don’t see how index 1 would end up in tube 2, which is essential for having two indexes in the same molecule.

        I don’t know if the fact that both samples have the same index 2 have anything to do with it.

        I came to think about the same regarding the second mates. I’ve extracted them for a few of the reads and included at the bottom of the post. They don't seem to have any adapter sequences, so the second option sounds more reasonable to be.

        Alos, just doing a line count I get the following results:

        Code:
        [COLOR="DarkGreen"]GCTACGCT[/COLOR][COLOR="Red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]    161
        TAAGGCGA[COLOR="red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]    725
        That is more than 20%, which is rather concerning. Makes me wonder how many of the mates actually correspond to each other.


        Here are the first four hits

        Pair 1
        Code:
        @HISEQ2:697:H2NFYBCXX:1:1101:11477:57300 1:N:0:TAAGGCGATAGATCGC
        TCTCCGAGCCCACGAGAC[COLOR="DarkGreen"]GCTACGCT[/COLOR]ATCTCGTATGCCGTCTTCTGCTTGA
        +
        D@DDBCEHHIIIIHIIIIIIIIIHIDHE?HHHIEEGHHHEHCHFEFG@CHH
        
        @HISEQ2:697:H2NFYBCXX:1:1101:11477:57300 2:N:0:TAAGGCGATAGATCGC
        AGTGAGAGCAGAGATTACAGGACATTGCGAGCAGATTGCGTAGGGACTCTC
        +
        B@0B@GEHGHEE?@G@EEGH@<CGCC?1@/</?FC?1110<D<011<11C@
        Pair 2
        Code:
        @HISEQ2:697:H2NFYBCXX:1:1101:9774:77082 1:N:0:TAAGGCGATAGATCGC
        TCTCCGAGCCCACGAGAC[COLOR="darkgreen"]GCTACGCT[/COLOR]ATCTCGTATGCCGTCTTCTGCTTGA
        +
        DDDDDIIIIIIIIIIIIIIIIIIIIII<<FHIIIIIIIIIIIHIIIIIIII
        
        @HISEQ2:697:H2NFYBCXX:1:1101:9774:77082 2:N:0:TAAGGCGATAGATCGC
        AAAGGAAAAGAGCAACTGCTGTGTTGTCCCCACACACACCTGCTCACCTCT
        +
        ###################################################
        Pair 3
        Code:
        @HISEQ2:697:H2NFYBCXX:1:1104:9306:37068 1:N:0:TAAGGCGATAGATCGC
        TCTCCGAGCCCACGAGAC[COLOR="darkgreen"]GCTACGCT[/COLOR]ATCTCGTATGCCGTCTTCTGCTTGA
        +
        DDDBDI?EFCHIIIIIIIIHIIIHCEHCCFFHHGHHHIHGHECC<CHHHIH
        
        @HISEQ2:697:H2NFYBCXX:1:1104:9306:37068 2:N:0:TAAGGCGATAGATCGC
        AGAATGCACTATGCTTAAGCTCTGACGATTCTTCCGTGCAGCAAGGAGGTC
        +
        0<00<<1<@1D1<D<<11<1D<@1<10<01<<D110D101111<<1<1<01
        Pair 4
        Code:
        @HISEQ2:697:H2NFYBCXX:1:1105:6252:85097 1:N:0:TAAGGCGATAGATCGC
        TCTCCGAGCCCACCGAGAC[COLOR="darkgreen"]GCTACGCT[/COLOR]ATCTCGTATGCCGTCTTCTGCTTG
        +
        BBDD@HHH<EHHHIHIIHHIHEHIGHE=0DCFHIHIEHCECHEE<CHCE?G
        
        @HISEQ2:697:H2NFYBCXX:1:1105:6252:85097 2:N:0:TAAGGCGATAGATCGC
        GACTTAAACTACTGAAGGAAAACCTATACCAGCTGCCCAATCTCTGTTACA
        +
        00000111<<111111111<1110<11<<11<1111<111<<1111<<11<

        Comment


        • #5
          Originally posted by mastal View Post
          You need to check with the sequencing centre, in case there was some mixup when demultiplexing the reads.

          How many of the sample1 reads which read far enough into the adapter sequences have the GCTACGCT sequence, and what barcode do you see in the sample 2 reads which have read into the adapter sequences?
          Obviously, I have more than two samples and I feel it is going to be a nightmare investigating all the barcodes in all the samples =S

          Just comparing the first mate of these two samples and two barcodes, I get following results:

          Samples 1, TAAGGCGA TAGATCGC
          Code:
          GCTACGCT[COLOR="Red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]    161
          TAAGGCGA[COLOR="red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]    725

          Samples 2, GCTACGCT TAGATCGC
          Code:
          GCTACGCT[COLOR="red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]     10
          TAAGGCGA[COLOR="red"]ATCTCGTATGCCGTCTTCTGCTTG[/COLOR]    513

          Comment


          • #6
            How many samples were run in the same lane, and did all the samples have the same barcode for read2?

            How many reads do you have for samples 1 and 2, what percentage of the total reads do the counts you showed represent?

            Reagents can always become contaminated, but it seems a bit strange that read1 and read2 would show a different sequence than the runs where they read the barcodes.

            Comment


            • #7
              Originally posted by mastal View Post
              How many samples were run in the same lane, and did all the samples have the same barcode for read2?
              I have a total of 17 samples, 12 different index 1 and 8 different index 2. Samples are all mixed and ran in 4 lanes.

              Originally posted by mastal View Post
              How many reads do you have for samples 1 and 2, what percentage of the total reads do the counts you showed represent?
              The numbers I gave are just line counts. To get a real representation, I need to to use cutadapt or something that allows for miss match and truncations etc. to give a proper representation. However, I don't believe that this is the issue at the moment...


              Originally posted by mastal View Post
              Reagents can always become contaminated, but it seems a bit strange that read1 and read2 would show a different sequence than the runs where they read the barcodes.
              Currently running trough all of the samples. So far it apperas the contamination is consistent between samples using one of the same barcodes...

              Comment


              • #8
                Processing all the data, it seems the contamination only exists among samples that have one of the two barcodes the same.

                For instance, sample 1 and 2 both have the same index 2, but different index 1. Both these two indexes are found in sample 1 and 13. This is true for every sample I have tested. Likewise when same index 2 is shared between 3 samples, you will find the three different index 1 in all three samples.

                I am starting to fear this could be quite an issue. Do people usually multiplex their sample in similar way? What is your experience?

                Comment


                • #9
                  There is nothing wrong with the way the samples have been multiplexed. With 12 barcodes for the first index and 8 barcodes for the second index you could multiplex up to 96 different samples.

                  Comment


                  • #10
                    The adapter always appears in the same position in read 1, and does not appear at all in read 2. This is not simple misassignment or an error in demultiplexing. It looks like you had free unligated adapters from sample 1 floating around that attached themselves to fully-ligated reads from sample 2. But, I don't think Nextera works that way, so I don't really know what's going on. Maybe Illumina would have an idea, if you contacted them.

                    Comment


                    • #11
                      @arash82: Probably time to go back to the experimental people with your observations. No logical informatics based explanation seems to apply here. These libraries may need to be re-made from scratch, provided starting material is of good quality.

                      Comment


                      • #12
                        Brian, I was thinking the same thing. The Nextera mate pair protocol actually uses tagmentation followed by end repair and ligation of TruSeq adapters.

                        When were the samples pooled in the protocol?
                        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                        Comment


                        • #13
                          @SNPsaurus: All the libraries are made individually. Then the samples are pooled and sequenced.

                          @GenoMax: Core facility cannot figure it out either. Very unlikely we would do this from scratch. There is no RNA left, and getting new samples is too much time and resources. We do have indexed libraries though, so pooling in a different way is possible.

                          I have a feeling Illumina is very defensive in their communication. Their last input on the matter is "material handling error during some point of library construction". I was thinking this could happen if you use the same pipet tips when preparing for adapter/indexes.

                          Doubt it though, as the core facility is doing this routinely. Also that would create contamination in one way (i.e. tip into sample 1 and then sample 2 would create contamination only in sample 2), but I see contamination consistently in all, and only, samples that share index.

                          Sh*t happens and sometimes you cannot explain it. What is puzzling though is the fact that others have seen similar things. Something is fishy and I need to find out what

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            The Impact of AI in Genomic Medicine
                            by seqadmin



                            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                            02-26-2024, 02:07 PM
                          • seqadmin
                            Multiomics Techniques Advancing Disease Research
                            by seqadmin


                            New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                            A major leap in the field has
                            ...
                            02-08-2024, 06:33 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Today, 06:12 AM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-23-2024, 04:11 PM
                          0 responses
                          65 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-21-2024, 08:52 AM
                          0 responses
                          70 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-20-2024, 08:57 AM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X