If I have two 100 base pair, paired end read files each having x reads (2x total number of reads), is the total number of bases pairs represented by these two together 2x or x?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by shyam_la View PostWhat if there are gaps (the opposite of overlaps)? Thats possible, right? In that case, would the number of bases be more than 2x?
Comment
-
???
We were talking about gap/overlap between members of one pair, right? The gaps is going to be covered by some other pair of reads, won't it?
For eg. If ABCDEFGHIJKLMNOPQRSTUVWXYZ was my target region, and my sequencer sliced it up, made libraries, the libraries would be something like ABCDE, BCDEF, CDEFGHI, DEFGHI and so on, of varying size. Then it sequences the library and gave me 3 unit long paired end reads. ABCDE will give me ABC and EDC (with an overlap) but CDEFGHI would give me CDE and IHG (with a gap).
Sorry I am new to NGS and want to get the bare basics correct.
Is my concept correct?Last edited by shyam_la; 06-11-2012, 06:48 PM.
Comment
-
Originally posted by shyam_la View Post???
We were talking about gap/overlap between members of one pair, right? The gaps is going to be covered by some other pair of reads, won't it?
For eg. If ABCDEFGHIJKLMNOPQRSTUVWXYZ was my target region, and my sequencer sliced it up, made libraries, the libraries would be something like ABCDE, BCDEF, CDEFGHI, DEFGHI and so on, of varying size. Then it sequences the library and gave me 3 unit long paired end reads. ABCDE will give me ABC and CED (with an overlap) but CDEFGHI would give me CDE and IHG (with a gap).
Sorry I am new to NGS and want to get the bare basics correct.
Is my concept correct?
Comment
-
If you have 2 100 bp reads, the total number of bases you get is 200. Period, end of story. The question is how many of those 200 base pairs gives you more information? If you are only sequencing a 100 bp insert, then each of the paired reads will sequence the same bases. So, you get 200 base pairs, but 100 base pairs are redundant (assuming there are no errors in the read). Hence, you only get as an outcome 100 base pairs that count for your coverage.
If you are sequencing a 300 base pair insert, you get 200 base pairs of information. Here, those 200 base pairs will be unique, because there will be 100 base pairs in between. So you get 200 base pairs that count for your coverage.
Is this clear? If so, what else is confusing you?
Comment
-
Originally posted by shyam_la View PostIf I have two 100 base pair, paired end read files each having x reads (2x total number of reads), is the total number of bases pairs represented by these two together 2x or x?
Comment
-
Heisman: As to a 300 bp insert, 200 bp from one read pair will cover it from the ends but the 100 bp gap will be covered by some other read pair from a different overlapping insert, from the library. What I don't understand is why your answer sounds like those 100 bp have just vanished..
Maybe I am just not able to get the big picture at the moment, from our rather simple discussion, but whatever..
Comment
-
Originally posted by Jeremy View PostIf you have 30 Gbp of sequence data then you have 30 Gbp of sequence data, what difference does it make how many files that data is separated into?
Maybe this is getting confusing, because we haven't invoked the idea of coverage in a proper way..
Comment
-
Originally posted by shyam_la View PostYou think it makes no difference knowing whether that 30Gbp of raw seq data corresponds to 300Mbp or 150Mbp of a reference after alignment?
Maybe this is getting confusing, because we haven't invoked the idea of coverage in a proper way..
If your original question was pertaining to coverage of a genome then assume random read distribution and divide sequence data size by genome size. For example if you have 30 Gbp of sequence data and a 1 Gbp genome then theoretically you have an average of 30x coverage. It makes little difference how much of that redundancy is from overlap in paired reads (which are size selected to be very little to nill anyway since the fragments are 200-500 bp and you sequence 75-100 bp of each end) vs different reads.
Comment
-
Originally posted by Jeremy View PostI think perhaps your original question was worded in a confusing manner if you were asking about reference coverage. Your original question only talked about raw data, you mentioned nothing of coverage.
If your original question was pertaining to coverage of a genome then assume random read distribution and divide sequence data size by genome size. For example if you have 30 Gbp of sequence data and a 1 Gbp genome then theoretically you have an average of 30x coverage. It makes little difference how much of that redundancy is from overlap in paired reads (which are size selected to be very little to nill anyway since the fragments are 200-500 bp and you sequence 75-100 bp of each end) vs different reads.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment