Announcement

Collapse
No announcement yet.

The insert-size in paired-end data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The insert-size in paired-end data

    Hi,

    I have a question about term "insert-size"


    If |-----75----|------------------------100-----------------|-----75-----|

    paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

    I am confused with fragment size

    Thanks

    Best regard!
    Last edited by louis7781x; 08-01-2011, 12:51 AM.

  • #2
    The insert is normally the stretch of sequence between the paired-end adapters, so in your case the insert size would be 250 bp (2x75 bp reads + 100 bp unsequenced middle piece). The fragment size (which you need to select for during a gel purification for example) would be the insert size + length of both adapters (around 120 bp extra for both Illumina adapters).

    Comment


    • #3
      Originally posted by fkrueger View Post
      The insert is normally the stretch of sequence between the paired-end adapters, so in your case the insert size would be 250 bp (2x75 bp reads + 100 bp unsequenced middle piece). The fragment size (which you need to select for during a gel purification for example) would be the insert size + length of both adapters (around 120 bp extra for both Illumina adapters).
      Hi,Does adapter also sequence too? I mean row data has adapter sequence?

      Comment


      • #4
        Normally sequencing starts right after the adapter but does not include adapter sequence.

        Comment


        • #5
          Originally posted by louis7781x View Post
          Hi,

          I have a question about term "insert-size"


          If |-----75----|------------------------100-----------------|-----75-----|

          paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

          I am confused with fragment size

          Thanks

          Best regard!
          In your example I would say the insert size is 250bp. But as fkrueger noted above there is more than one way to describe things. When the wet lab sends data to me they report the library fragment size which includes the ligated Illumina adapters; continuing with your example the fragment size in this case would have been 320bp. Certain software may use different measurements. For example TopHat requests the mate inner distance, the length between the two sequence reads, which in your example is 100bp.

          The lesson is to be very clear about what is being asked or reported.

          Comment


          • #6
            Originally posted by louis7781x View Post
            Hi,

            I have a question about term "insert-size"


            If |-----75----|------------------------100-----------------|-----75-----|

            paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

            I am confused with fragment size

            Thanks

            Best regard!
            As far as I know it is the 250..

            150 is a typo...sorry
            Last edited by arkilis; 09-24-2013, 03:44 PM.

            Comment


            • #7
              As noted, when most analysis programs ask for an insert size, they are referring to the size of your fragment with the adapters excluded, 250bp in your case. However, some programs use the term insert size to mean the gap distance between the 3' end of the two reads (assuming standard forward/reverse orientation), which in your case is 100bp. Most programs are decently documented enough to state which version they mean when they say insert size, but you shouldn't assume that it's interchangeable. The term pair-distance is also used, and just like insert size has been taken to mean both the size of the fragment minus the adapters (250bp) or the gap distance (100bp).

              For assemblies interchanging the two value won't cause huge problems, but for read mapping methods where you want to look for insertions/deletions or splice variation then inputting the correct value can be very important.

              Comment


              • #8
                Expression quintiles

                Sorry, I am recently study some about transcription assembly. Can you tell me the meaning of Expression quintiles? Thank you very much.

                Comment


                • #9
                  Originally posted by Yue Xu View Post
                  Sorry, I am recently study some about transcription assembly. Can you tell me the meaning of Expression quintiles? Thank you very much.
                  Oh, sorry, I post it wrongly.

                  Comment


                  • #10
                    P5 --- Index/Barcode1 --- Read 1 Primer --- Insert/TargetFragment --- Read 2 Primer --- Index/Barcode2 --- P7


                    The Insert/TargetFragment region needs to be less than the size of the base length sequencing kit you're using. For example if you use a 2 x 100 PE kit, and you require at least 20 bases of overlap from Read 1 and Read 2, your insert fragments cannot be larger than 180 bases in length.

                    As stated above, the P5 --- Index/Barcode1 --- Read 1 Primer, and Read 2 Primer --- Index/Barcode2 --- P7 add about 120-130 bases of length onto your insert fragment (depending on the size of the index barcodes and type of read 1 and read 2 primers you have chosen).

                    Keep in mind that others have reported/observed, and myself included, that the efficiency and success rate of the clustering step is significantly reduced when a final library template molecule is <250 or >800 bases. Thus, make sure the sum of the lengths falls between these ranges if possible. Quantitation between deletion/insertion alleles that straddle these upper and lower ranges cannot be trusted for reproducibility between different library preps (Just an FYI, personal observation).

                    -Tom

                    Comment


                    • #11
                      Originally posted by thomasblomquist View Post
                      P5 --- Index/Barcode1 --- Read 1 Primer --- Insert/TargetFragment --- Read 2 Primer --- Index/Barcode2 --- P7


                      The Insert/TargetFragment region needs to be less than the size of the base length sequencing kit you're using. For example if you use a 2 x 100 PE kit, and you require at least 20 bases of overlap from Read 1 and Read 2, your insert fragments cannot be larger than 180 bases in length.
                      It is not required that the two reads overlap. For most applications you do not, in fact want them to overlap and thus want an insert size larger than 2x read length.

                      Keep in mind that others have reported/observed, and myself included, that the efficiency and success rate of the clustering step is significantly reduced when a final library template molecule is <250 or >800 bases. Thus, make sure the sum of the lengths falls between these ranges if possible.
                      -Tom
                      Having found over the years a metric crap-ton of adapter dimers (120 bp fragment size) in read data where none is visible in the Bioanalyzer trace of the library I would say that fragments ≤ 150bp cluster and amplify efficiently as hell.

                      Comment


                      • #12
                        Originally posted by kmcarr View Post
                        It is not required that the two reads overlap. For most applications you do not, in fact want them to overlap and thus want an insert size larger than 2x read length.
                        Correct, I did not place the "if you need overlap" qualifier.

                        Originally posted by kmcarr View Post
                        Having found over the years a metric crap-ton of adapter dimers (120 bp fragment size) in read data where none is visible in the Bioanalyzer trace of the library I would say that fragments ≤ 150bp cluster and amplify efficiently as hell.
                        LMAO. Yes, they do indeed cluster. I think, and I'm just surmising here, that the adapter dimers (ssDNA), heterodimerize with actual target template (dsDNA). My evidence to this statement is that in my amplicon libraries, wherein I stop the PCR prep in early cycles, when the target size peak is just starting to crop up on the electropherogram on the bioanalyzer DNA chip, then size extract that target peak, I get virtually no primer/adapter dimers sequenced. However, as the target peak begins to reach plateau in PCR, the dimer peak starts to diminish a bit, and my thoughts are that the adapter dimer, is non-specifically annealing to other target-specific templates. These electrophorese on the Bioanalyzer at or around the target specific size, and in a non-denaturing size-based extraction, will be pulled into the final library. In these latter cases with over-shooting the cycles in the PCR based library prep, I see a ton of adapter or read1/2 dimer products formed.

                        As for ligation type approach, my assumption is that it is probably fairly easy to subsequently accidently denature and reanneal a complex library and the adapter/read primer dimers get heterodimerized with other large complexes.

                        The key then is to pull out the ssDNA that is the target length. PAGE purification? But yield tends to be too low.

                        Thus, I tend to aim for a low minimal number of PCR cycles, and keeping the prepped library cool to minimize this issue.

                        Good point to bring up! :-)

                        -Tom

                        Comment


                        • #13
                          Hi,
                          I am a newbie in metagenomics. I just sequenced my soil DNA samples through Illumina HiSeq2000 (2X151 bp). Now I need to assemble my sequences and for doing that I need the insert size, the minimum and maximum distance between the sequences. I asked the sequencing facility about this but they send me the bioanalyzer result which looks complicated to me. I attached the bioanalyzer result here. I will appreciate if anyone can explain this bioanalyzer result.

                          Thanks

                          Comment


                          • #14
                            Is this a case of running 2x250 on MiSeq but is getting 150-350 PE reads???

                            http://www.ncbi.nlm.nih.gov/sra/?term=SRR1145846

                            Comment


                            • #15
                              Originally posted by mohiuddinbdfh View Post
                              Hi,
                              I am a newbie in metagenomics. I just sequenced my soil DNA samples through Illumina HiSeq2000 (2X151 bp). Now I need to assemble my sequences and for doing that I need the insert size, the minimum and maximum distance between the sequences. I asked the sequencing facility about this but they send me the bioanalyzer result which looks complicated to me. I attached the bioanalyzer result here. I will appreciate if anyone can explain this bioanalyzer result.

                              Thanks
                              It may mean the size of your DNA before sequencing. By observing the sizes of DNAs, we can check contaminants. And they will fragment DNAs and sequence them. Later you will get the sequencing data.
                              By the way, your data looks strange.
                              And this result does not have nothing with Illumina library insertion data.
                              Generally, the insertion size can be 180-350 bp.
                              You better BLAST both sequences of the same id and manually check the insertion size.
                              Last edited by sunguk; 02-06-2017, 02:15 AM.

                              Comment

                              Working...
                              X