Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Graham Etherington
    Member
    • Apr 2010
    • 22

    Technically, what is 'insert-size'?

    I've become slightly confused as to the use of the word 'insert-size' so I was wondering if someone could just confirm that my understanding of the term is correct.

    If the fragment size excised from the gel is 500 base pairs and I'm doing 76bp paired-end sequencing, is my insert size (say for use in the -a option in bwa sampe) 500 or 348 (500 - (2*76))?

    Many thanks.
  • Chipper
    Senior Member
    • Mar 2008
    • 323

    #2
    It is 500 minus the total length of adaptors, unless you size-selected your sample before doing the library prep.

    Comment

    • elizzybethy
      Junior Member
      • Sep 2009
      • 8

      #3
      I think that most people define it as Chipper describes. So if you are doing Illumina PE sequencing, it would be the approximate size of the band you cut out of the gel or measured on a bioanalyzer at the end of library prep minus 119 bp, because the total adapter length on one side for Illumina is 58 bp and the other side is 61bp.

      If your sample aligns pretty well, you wouldn't have to specify the -a option in bwa though, and keep in mind that the bwa -a option is the MAX insert size not the AVERAGE insert size if you do specify it, so in this example you would want it to be quite a bit greater than 500-119, depending on the width of your library fragment size distribution.

      I think that people in general use different terms for the distance BETWEEN the sequenced ends of an insert. For instance, the -r option in TopHat is for "mate inner distance", which in this example would be 500-119-(76*2) I believe.

      Comment

      • frymor
        Senior Member
        • May 2010
        • 151

        #4
        Hi elizzybethy

        Originally posted by elizzybethy View Post
        I think that people in general use different terms for the distance BETWEEN the sequenced ends of an insert. For instance, the -r option in TopHat is for "mate inner distance", which in this example would be 500-119-(76*2) I believe.
        According to the description on the tophat manual the option -r is:
        -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.
        So if I understand it correctly, I only need to subtract the reads length without taking into account the adapters (or in your examples only 500-(76*2), without the 119bo aapter length).

        Is this correct?

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          Originally posted by frymor View Post
          Hi elizzybethy



          According to the description on the tophat manual the option -r is:
          -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.
          So if I understand it correctly, I only need to subtract the reads length without taking into account the adapters (or in your examples only 500-(76*2), without the 119bo aapter length).

          Is this correct?
          O.K. I just redacted my post in case anyone saw in within the few minutes it was up. I realized that I misread fymor's post (reading as "with" where it said "without".
          Last edited by kmcarr; 01-11-2011, 06:47 AM.

          Comment

          • elizzybethy
            Junior Member
            • Sep 2009
            • 8

            #6
            I subtract the adapter length and the read lengths from the final average library fragment size when I use the TopHat -r option, because that is how it makes sense to me.


            Originally posted by frymor View Post
            Hi elizzybethy



            According to the description on the tophat manual the option -r is:


            So if I understand it correctly, I only need to subtract the reads length without taking into account the adapters (or in your examples only 500-(76*2), without the 119bo aapter length).

            Is this correct?

            Comment

            • catbus
              Member
              • Feb 2011
              • 21

              #7
              It would be nice to have an authoritative answer here, since the commentators above appear to disagree.

              > e.g. for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200.

              If you have two 50bp paired-end files, and the original fragment was 300 bp, then the result is (300 - 50 - 50 = 200).

              But: is the adapter included in the fragment size? I am getting the impression that it should NOT be included. However, I am not 100% sure.

              Comment

              • Heisman
                Senior Member
                • Dec 2010
                • 534

                #8
                Originally posted by catbus View Post
                It would be nice to have an authoritative answer here, since the commentators above appear to disagree.

                > e.g. for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200.

                If you have two 50bp paired-end files, and the original fragment was 300 bp, then the result is (300 - 50 - 50 = 200).

                But: is the adapter included in the fragment size? I am getting the impression that it should NOT be included. However, I am not 100% sure.
                Insert size typically means without any universal/barcoded adapter sequence attached. When considering long barcodes where the length actually matters the definition may get a bit hazy.

                Comment

                • spapillon
                  Junior Member
                  • Nov 2011
                  • 4

                  #9
                  Make sure you adjust the standard deviation parameter accordingly when trimming out the adapters/barcodes and poor quality ends.

                  Comment

                  • rskr
                    Senior Member
                    • Oct 2010
                    • 249

                    #10
                    Lets all do ourselves a favor and refer to the total length of a pairs from end to end as the fragment length(start of the first pair on the reference to the end of the second pair including all bases in either of the reads), and refer to the the space between as the "space between the pairs" or every base which wasn't sequenced of the fragment. That way there will be no confusion. And ignore any attempts to define insert size, which will inevitably be misconstrued by whom ever you are trying to 'splain it.

                    Comment

                    • maubp
                      Peter (Biopython etc)
                      • Jul 2009
                      • 1544

                      #11
                      This was partly why the SAM/BAM spec wording was changed from insert size to fragment size.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Pathogen Surveillance with Advanced Genomic Tools
                        by seqadmin




                        The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                        03-24-2025, 11:48 AM
                      • seqadmin
                        New Genomics Tools and Methods Shared at AGBT 2025
                        by seqadmin


                        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                        The Headliner
                        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                        03-03-2025, 01:39 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-20-2025, 05:03 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-19-2025, 07:27 AM
                      0 responses
                      57 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-18-2025, 12:50 PM
                      0 responses
                      50 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-03-2025, 01:15 PM
                      0 responses
                      201 views
                      0 reactions
                      Last Post seqadmin  
                      Working...