Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to determine the insert length?

    I have some Solexa pair-end data. But my colleague forgot to tell me the insert length How can I determine the insert length of the data? First, I have no reference genome.

  • #2
    The easiest way would be to go back to your colleague and get the insert length. But if you really insist on doing this the hard way then I would suggest doing a de-novo assembly using the reads as if they were fragments (i.e., not as paired ends). This should give you some decent size contigs. Then map the paired ends map onto the contigs. From this you should be able to figure out how far apart are the paired ends that do map. After you obtain the numbers that define the range you can then do an new assembly but this time as a 'paired end' instead of a 'fragment' assembly.

    Comment


    • #3
      thank you, i think i know what i should do

      Comment


      • #4
        Originally posted by anyone1985 View Post
        I have some Solexa pair-end data. But my colleague forgot to tell me the insert length How can I determine the insert length of the data? First, I have no reference genome.
        The summary.htm file from Pipeline should have that info in it at the bottom of the file

        Comment


        • #5
          Originally posted by westerman View Post
          The easiest way would be to go back to your colleague and get the insert length.
          The problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.

          But if you really insist on doing this the hard way then I would suggest doing a de-novo assembly using the reads as if they were fragments (i.e., not as paired ends). This should give you some decent size contigs. Then map the paired ends map onto the contigs. From this you should be able to figure out how far apart are the paired ends that do map. After you obtain the numbers that define the range you can then do an new assembly but this time as a 'paired end' instead of a 'fragment' assembly.
          This is good advice. If you have a close reference sequence, you can use that instead of de novo contigs. I usually use MAQ to align a SUBSET of the reads in paired-end mode, and MAQ itself will print out the mean and s.d. of the insert size.

          And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.

          Comment


          • #6
            Originally posted by Torst View Post
            The problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.
            Well yes this could be a problem if your colleague only gives a single number then you have problems. I always ask for a minimum and maximum insert length knowing that those numbers are also uncertain. Also sometimes you can have a mixture of libraries with different insert sizes; e.g. average of 500 bp; 3K, 20K. Then one needs to know not only the range but also which reads corresponds to which library.

            And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
            Ah, but the original poster said he did not have a reference genome.

            It was an interesting theoretical question -- how does one figure out insert sizes when only given paired ends. A question that I am glad that I do not have to do in practice!

            Comment


            • #7
              How do you use maq to determine the insert size?

              Originally posted by Torst View Post
              The problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.



              This is good advice. If you have a close reference sequence, you can use that instead of de novo contigs. I usually use MAQ to align a SUBSET of the reads in paired-end mode, and MAQ itself will print out the mean and s.d. of the insert size.

              And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
              Hi, If you do not have ref sequence, how do you use maq to determine the insert size. Could you please have a sample command line.

              Thanks

              Comment


              • #8
                Originally posted by system7 View Post
                The summary.htm file from Pipeline should have that info in it at the bottom of the file
                Where is this summary.htm that people are talking about?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  11-06-2024, 07:24 PM
                • seqadmin
                  Choosing Between NGS and qPCR
                  by seqadmin



                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                  10-18-2024, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 11-08-2024, 11:09 AM
                0 responses
                211 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-08-2024, 06:13 AM
                0 responses
                156 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-01-2024, 06:09 AM
                0 responses
                80 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-30-2024, 05:31 AM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Working...
                X