Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • louis7781x
    Member
    • Oct 2010
    • 74

    The insert-size in paired-end data

    Hi,

    I have a question about term "insert-size"


    If |-----75----|------------------------100-----------------|-----75-----|

    paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

    I am confused with fragment size

    Thanks

    Best regard!
    Last edited by louis7781x; 08-01-2011, 12:51 AM.
  • fkrueger
    Senior Member
    • Sep 2009
    • 627

    #2
    The insert is normally the stretch of sequence between the paired-end adapters, so in your case the insert size would be 250 bp (2x75 bp reads + 100 bp unsequenced middle piece). The fragment size (which you need to select for during a gel purification for example) would be the insert size + length of both adapters (around 120 bp extra for both Illumina adapters).

    Comment

    • louis7781x
      Member
      • Oct 2010
      • 74

      #3
      Originally posted by fkrueger View Post
      The insert is normally the stretch of sequence between the paired-end adapters, so in your case the insert size would be 250 bp (2x75 bp reads + 100 bp unsequenced middle piece). The fragment size (which you need to select for during a gel purification for example) would be the insert size + length of both adapters (around 120 bp extra for both Illumina adapters).
      Hi,Does adapter also sequence too? I mean row data has adapter sequence?

      Comment

      • fkrueger
        Senior Member
        • Sep 2009
        • 627

        #4
        Normally sequencing starts right after the adapter but does not include adapter sequence.

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          Originally posted by louis7781x View Post
          Hi,

          I have a question about term "insert-size"


          If |-----75----|------------------------100-----------------|-----75-----|

          paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

          I am confused with fragment size

          Thanks

          Best regard!
          In your example I would say the insert size is 250bp. But as fkrueger noted above there is more than one way to describe things. When the wet lab sends data to me they report the library fragment size which includes the ligated Illumina adapters; continuing with your example the fragment size in this case would have been 320bp. Certain software may use different measurements. For example TopHat requests the mate inner distance, the length between the two sequence reads, which in your example is 100bp.

          The lesson is to be very clear about what is being asked or reported.

          Comment

          • arkilis
            Senior Member
            • Jul 2013
            • 119

            #6
            Originally posted by louis7781x View Post
            Hi,

            I have a question about term "insert-size"


            If |-----75----|------------------------100-----------------|-----75-----|

            paired-end data both 75 mer,In this example, The insert-size is 100 or 250?

            I am confused with fragment size

            Thanks

            Best regard!
            As far as I know it is the 250..

            150 is a typo...sorry
            Last edited by arkilis; 09-24-2013, 03:44 PM.

            Comment

            • mcnelson.phd
              Senior Member
              • Jul 2011
              • 162

              #7
              As noted, when most analysis programs ask for an insert size, they are referring to the size of your fragment with the adapters excluded, 250bp in your case. However, some programs use the term insert size to mean the gap distance between the 3' end of the two reads (assuming standard forward/reverse orientation), which in your case is 100bp. Most programs are decently documented enough to state which version they mean when they say insert size, but you shouldn't assume that it's interchangeable. The term pair-distance is also used, and just like insert size has been taken to mean both the size of the fragment minus the adapters (250bp) or the gap distance (100bp).

              For assemblies interchanging the two value won't cause huge problems, but for read mapping methods where you want to look for insertions/deletions or splice variation then inputting the correct value can be very important.

              Comment

              • Yue Xu
                Member
                • Jun 2013
                • 16

                #8
                Expression quintiles

                Sorry, I am recently study some about transcription assembly. Can you tell me the meaning of Expression quintiles? Thank you very much.

                Comment

                • Yue Xu
                  Member
                  • Jun 2013
                  • 16

                  #9
                  Originally posted by Yue Xu View Post
                  Sorry, I am recently study some about transcription assembly. Can you tell me the meaning of Expression quintiles? Thank you very much.
                  Oh, sorry, I post it wrongly.

                  Comment

                  • thomasblomquist
                    Member
                    • Jul 2012
                    • 68

                    #10
                    P5 --- Index/Barcode1 --- Read 1 Primer --- Insert/TargetFragment --- Read 2 Primer --- Index/Barcode2 --- P7


                    The Insert/TargetFragment region needs to be less than the size of the base length sequencing kit you're using. For example if you use a 2 x 100 PE kit, and you require at least 20 bases of overlap from Read 1 and Read 2, your insert fragments cannot be larger than 180 bases in length.

                    As stated above, the P5 --- Index/Barcode1 --- Read 1 Primer, and Read 2 Primer --- Index/Barcode2 --- P7 add about 120-130 bases of length onto your insert fragment (depending on the size of the index barcodes and type of read 1 and read 2 primers you have chosen).

                    Keep in mind that others have reported/observed, and myself included, that the efficiency and success rate of the clustering step is significantly reduced when a final library template molecule is <250 or >800 bases. Thus, make sure the sum of the lengths falls between these ranges if possible. Quantitation between deletion/insertion alleles that straddle these upper and lower ranges cannot be trusted for reproducibility between different library preps (Just an FYI, personal observation).

                    -Tom

                    Comment

                    • kmcarr
                      Senior Member
                      • May 2008
                      • 1181

                      #11
                      Originally posted by thomasblomquist View Post
                      P5 --- Index/Barcode1 --- Read 1 Primer --- Insert/TargetFragment --- Read 2 Primer --- Index/Barcode2 --- P7


                      The Insert/TargetFragment region needs to be less than the size of the base length sequencing kit you're using. For example if you use a 2 x 100 PE kit, and you require at least 20 bases of overlap from Read 1 and Read 2, your insert fragments cannot be larger than 180 bases in length.
                      It is not required that the two reads overlap. For most applications you do not, in fact want them to overlap and thus want an insert size larger than 2x read length.

                      Keep in mind that others have reported/observed, and myself included, that the efficiency and success rate of the clustering step is significantly reduced when a final library template molecule is <250 or >800 bases. Thus, make sure the sum of the lengths falls between these ranges if possible.
                      -Tom
                      Having found over the years a metric crap-ton of adapter dimers (120 bp fragment size) in read data where none is visible in the Bioanalyzer trace of the library I would say that fragments ≤ 150bp cluster and amplify efficiently as hell.

                      Comment

                      • thomasblomquist
                        Member
                        • Jul 2012
                        • 68

                        #12
                        Originally posted by kmcarr View Post
                        It is not required that the two reads overlap. For most applications you do not, in fact want them to overlap and thus want an insert size larger than 2x read length.
                        Correct, I did not place the "if you need overlap" qualifier.

                        Originally posted by kmcarr View Post
                        Having found over the years a metric crap-ton of adapter dimers (120 bp fragment size) in read data where none is visible in the Bioanalyzer trace of the library I would say that fragments ≤ 150bp cluster and amplify efficiently as hell.
                        LMAO. Yes, they do indeed cluster. I think, and I'm just surmising here, that the adapter dimers (ssDNA), heterodimerize with actual target template (dsDNA). My evidence to this statement is that in my amplicon libraries, wherein I stop the PCR prep in early cycles, when the target size peak is just starting to crop up on the electropherogram on the bioanalyzer DNA chip, then size extract that target peak, I get virtually no primer/adapter dimers sequenced. However, as the target peak begins to reach plateau in PCR, the dimer peak starts to diminish a bit, and my thoughts are that the adapter dimer, is non-specifically annealing to other target-specific templates. These electrophorese on the Bioanalyzer at or around the target specific size, and in a non-denaturing size-based extraction, will be pulled into the final library. In these latter cases with over-shooting the cycles in the PCR based library prep, I see a ton of adapter or read1/2 dimer products formed.

                        As for ligation type approach, my assumption is that it is probably fairly easy to subsequently accidently denature and reanneal a complex library and the adapter/read primer dimers get heterodimerized with other large complexes.

                        The key then is to pull out the ssDNA that is the target length. PAGE purification? But yield tends to be too low.

                        Thus, I tend to aim for a low minimal number of PCR cycles, and keeping the prepped library cool to minimize this issue.

                        Good point to bring up! :-)

                        -Tom

                        Comment

                        • mohiuddinbdfh
                          Junior Member
                          • Jun 2013
                          • 2

                          #13
                          Hi,
                          I am a newbie in metagenomics. I just sequenced my soil DNA samples through Illumina HiSeq2000 (2X151 bp). Now I need to assemble my sequences and for doing that I need the insert size, the minimum and maximum distance between the sequences. I asked the sequencing facility about this but they send me the bioanalyzer result which looks complicated to me. I attached the bioanalyzer result here. I will appreciate if anyone can explain this bioanalyzer result.

                          Thanks

                          Comment

                          • ymc
                            Senior Member
                            • Mar 2010
                            • 496

                            #14
                            Is this a case of running 2x250 on MiSeq but is getting 150-350 PE reads???

                            Comment

                            • sunguk
                              Junior Member
                              • Nov 2016
                              • 1

                              #15
                              Originally posted by mohiuddinbdfh View Post
                              Hi,
                              I am a newbie in metagenomics. I just sequenced my soil DNA samples through Illumina HiSeq2000 (2X151 bp). Now I need to assemble my sequences and for doing that I need the insert size, the minimum and maximum distance between the sequences. I asked the sequencing facility about this but they send me the bioanalyzer result which looks complicated to me. I attached the bioanalyzer result here. I will appreciate if anyone can explain this bioanalyzer result.

                              Thanks
                              It may mean the size of your DNA before sequencing. By observing the sizes of DNAs, we can check contaminants. And they will fragment DNAs and sequence them. Later you will get the sequencing data.
                              By the way, your data looks strange.
                              And this result does not have nothing with Illumina library insertion data.
                              Generally, the insertion size can be 180-350 bp.
                              You better BLAST both sequences of the same id and manually check the insertion size.
                              Last edited by sunguk; 02-06-2017, 02:15 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              57 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              50 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              201 views
                              0 reactions
                              Last Post seqadmin  
                              Working...