I have some Solexa pair-end data. But my colleague forgot to tell me the insert length How can I determine the insert length of the data? First, I have no reference genome.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
The easiest way would be to go back to your colleague and get the insert length. But if you really insist on doing this the hard way then I would suggest doing a de-novo assembly using the reads as if they were fragments (i.e., not as paired ends). This should give you some decent size contigs. Then map the paired ends map onto the contigs. From this you should be able to figure out how far apart are the paired ends that do map. After you obtain the numbers that define the range you can then do an new assembly but this time as a 'paired end' instead of a 'fragment' assembly.
-
Originally posted by anyone1985 View PostI have some Solexa pair-end data. But my colleague forgot to tell me the insert length How can I determine the insert length of the data? First, I have no reference genome.
Comment
-
Originally posted by westerman View PostThe easiest way would be to go back to your colleague and get the insert length.
But if you really insist on doing this the hard way then I would suggest doing a de-novo assembly using the reads as if they were fragments (i.e., not as paired ends). This should give you some decent size contigs. Then map the paired ends map onto the contigs. From this you should be able to figure out how far apart are the paired ends that do map. After you obtain the numbers that define the range you can then do an new assembly but this time as a 'paired end' instead of a 'fragment' assembly.
And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
Comment
-
Originally posted by Torst View PostThe problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.
And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
It was an interesting theoretical question -- how does one figure out insert sizes when only given paired ends. A question that I am glad that I do not have to do in practice!
Comment
-
How do you use maq to determine the insert size?
Originally posted by Torst View PostThe problem with this is that the DNA fragment selection step is inexact. You may be aiming for 250 bp, but the average is 220 say, with a standard deviation of 30.
This is good advice. If you have a close reference sequence, you can use that instead of de novo contigs. I usually use MAQ to align a SUBSET of the reads in paired-end mode, and MAQ itself will print out the mean and s.d. of the insert size.
And as another poster said, if this is Illumina GA Pipeline, the Summary HTML files contain an estimate of the insert size which it obtains by using ELAND to map the reads to the reference genome specified in the gerald.cfg file.
Thanks
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
211 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
||
Started by seqadmin, 11-08-2024, 06:13 AM
|
0 responses
156 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
80 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment