what is a paired-end read?

mido1951 replied

10-21-2015, 08:07 AM
how to do an assembly if we have paired end reads? (two files R1.fq and R2.fq)?
thankyou
Leave a comment:
mastal replied

05-22-2014, 01:41 AM
It could be either 250 or 100.

Different software packages may have their own definition of 'insert length', so it's best to read the documentation carefully.

For example, in the case you have illustrated, velvet would define the 'insert length' as 250.

This is the definition given in the velvet manual:

"The insert length is understood to be the length of the sequenced fragment, i.e. it includes the length of the reads themselves."

Error: 404 | EMBL-EBI

https://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf

Bowtie2 also uses the same definition of fragment length.
Leave a comment:
binlangman replied

05-21-2014, 05:29 PM
what is the paired-end distance?

I read papers, and they mentioned 'the paired-end distance' many times. What is the paired -end distance?
Example:
If |-----75----|----------------------100-----------------|-----75-----|,
and paired-end data both 75bp, and in this case,the paired-end distance is 100 or 250 or others?

Thanks!
Leave a comment:
westerman replied

08-19-2013, 06:56 AM
No. Not the same as 'paired-end reads' although it has to do with paired-end. Google the term. That should be enlightening.
Leave a comment:
OTU replied

08-15-2013, 07:37 AM
Hi all!

I have a question... Found in some old papers (1999) a term "forward-reverse constraints". The question is - is this term the same as "paired-end reads"???

OTU
Leave a comment:
mastal replied

06-14-2013, 03:56 AM
what is a paired-end read?

Originally posted by naveedakhtar View Post

my another question is that in the whole genome shot gun assembly the paired end sequencing of large insert clone, specially prepared, used as strategy to overcome the genome assembly problem due to repetitive sequences in the sequencing of complex genome. how does paired end sequencing perform this in the absence of a reference genome?

You use a program that does de novo assembly (velvet, abyss, mira, soapdenovo, many others). If there is a reference genome of a related species you can use that as a reference for the assembly. Having paired reads can help to scaffold the contigs.
Leave a comment:
naveedakhtar replied

06-13-2013, 11:18 PM
my another question is that in the whole genome shot gun assembly the paired end sequencing of large insert clone, specially prepared, used as strategy to overcome the genome assembly problem due to repetitive sequences in the sequencing of complex genome. how does paired end sequencing perform this in the absence of a reference genome?
Leave a comment:
naveedakhtar replied

06-13-2013, 11:07 PM
to [QUOTE=ECO;1350]biocc,

Originally posted by ECO View Post

biocc,

"paired end" or "mate pair" refers to how the library is made, and then how it is sequenced. Both are methodologies that, in addition to the sequence information, give you information about the physical distance between the two reads in your genome.

For example, you shear up some genomic DNA, and cut a region out at ~500bp. Then you prepare your library, and sequence 35bp from each end of each molecule. Now you have three pieces of information:

--the tag 1 sequence
--the tag 2 sequence
--that they were 500bp ± (some) apart in your genome

This gives you the ability to map to a reference (or denovo for that matter) using that distance information. It helps dramatically to resolve larger structural rearrangements (insertions, deletions, inversions), as well as helping to assemble across repetitive regions.

Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

Mapping over repeats is similar...if one read is unmappable because it falls in a very repetitive region (eg. LINE, LTR, SINE), but the other is unique, you can again use that distance information to map both reads. The first read would likely come from the repeat that is ~500bp away from your unique second read.

Hope that helps. It's a weird concept at first, but very useful for all types of sequencing. It's been around at some levels since the days of shotgun sequencing.

And lastly, the terminology between "paired end" and "mate pair" is typically that "paired end" refers to sequencing both ends of the same molecule, while "mate pair" (in ABI's case) refers to sequencing only two tags (made by Type IIS restriction enzymes a la SAGE) from the ends of a typically much larger molecule. I could be wrong here though...

how can paired end sequencing detect inversion? that you mentioned it along the detection of strucural rearrangment?
Leave a comment:
carmeyeii replied

12-12-2012, 05:39 PM
Hi!

I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

Any knowledge would be greatly appreciated!

Thanks a lot,

Carmen
Leave a comment:
wanfahmi replied

11-09-2012, 01:34 AM
Hey,

I just wanna ask about paired-end data filtering. Do I need to filter read 1 and read 2 separately or combine read 1 and read 2 then filter? Because later on I want to use the filter data for RNA-seq analysis, using Tophat and cufflink. But, the Tophat require the read 1 and read 2 as input not as paired-end.
Leave a comment:
Arturo S.G. replied

02-08-2012, 06:22 AM
Hello to the SEQanswers community!

I came looking for the answer of a simple question on paired-end reads and I found much more (useful) info on this thread.

Thanks to all the contributors
Leave a comment:
ywlim replied

11-08-2011, 10:38 PM
Thank you so much! Now that I make sure that both input files have the same fragment's reads in the same orders, everything is working now.
Leave a comment:
swbarnes2 replied

11-07-2011, 09:53 AM
If the reads have wildly different names, they aren't supposed to be paired with eaach other.

bwa assumes that the first read of the first fq goes with the first read of the second fq, and so on. That doesn't appear to be the case here, that's why your "pairs" are all over the place.
Leave a comment:
ywlim replied

11-06-2011, 10:33 PM
Hi all I am still struggling with using BWA to align my paired end reads. I used the command:

bwa sampe -P -s hg19.fasta CATTCG_1.sai CATTCG_3.sai CATTCG_1.fastq CATTCG_3.fastq > CATTCG_PE.sam

and the first few lines of the program running look like this:

[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] fail to infer insert size: too few good pairs
[bwa_sai2sam_pe_core] time elapses: 10.96 sec
[bwa_sai2sam_pe_core] changing coordinates of 6 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_sai2sam_pe_core] time elapses: 0.00 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
[bwa_sai2sam_pe_core] print alignments... 1.99 sec
[bwa_sai2sam_pe_core] 262144 sequences have been processed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] (25, 50, 75) percentile: (3520, 39961, 70863)
[infer_isize] low and high boundaries: 94 and 205549 for estimating avg and std
[infer_isize] inferred external isize from 27 pairs: 37726.370 +/- 35311.353
[infer_isize] skewness: 0.341; kurtosis: -1.395; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 251007 (6.04 sigma)
[bwa_sai2sam_pe_core] time elapses: 10.87 sec
[bwa_sai2sam_pe_core] changing coordinates of 178 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
[bwa_sai2sam_pe_core] time elapses: 0.00 sec
[bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
[bwa_sai2sam_pe_core] print alignments... 1.97 sec
[bwa_sai2sam_pe_core] 524288 sequences have been processed.

The program seems to be running fine and quite quickly, but when I look at the output file, I see something like this:

DJB775P1_0215:5:1101:1262:2347#0 65 chr13 92966174 37 94M chr5 33346129 0 AATAACCACCTAGATAAATGTTCACTCATCTCGCCTGTCTAGCCTGTCTTGAGGCCGGTTTCATCATGAGTCACTCCACCAATTACTTCAAAAC cgggeghhhhfhhfffgffegbcfgffdhfffhhhdb^efgfddfhhhhffS\eefgg\W`c]RZ^__GU]UMMZ_Z]\_a^_TR_bbbbYYYW XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
DJB775P1_0215:5:1101:1495:2155#0/3 129 chr5 33346129 37 94M chr13 92966174 0 AATTAACTTCCTTTTTTTGTCTTCATATAACACTGTTGACCTACTCATATTGAGCCCTCAGTCTTTTTTGTACACATGCTCATCCCTGGCATGT ceggggiiiiiiiiiiiiihiiiiihiiihihifgffhiig`fgfghhfhghffhhihigfgfgeeceacabbcccb`b_bcccccc_X[`b^Y XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
DJB775P1_0215:5:1101:1465:2351#0 81 chr14 44601925 37 94M chr5 33346129 0 TCTCCTACCTCCTCTCCCTTATAGAAATCCCTGTGATTCCATTAGTCTCACCTGGATAAACCAGAGTATTCTTATTATCCCAAGATCCCCATCT XTR^YGG\^ZZZRbb]]ccc_db]d^dgfcb`bZZe_bSfc`fgf_fc^Iffhhhhfhfagddhfhhhhfffgeefddfedbbd_agfcgebe\ XT:A:U NM:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:38A55
DJB775P1_0215:5:1101:1483:2169#0/3 161 chr5 33346129 37 94M chr14 44601925 0 AATTAACTTCCTTTTTTTGTCTTCATATAACACTGTTGACCTACTCATATTGAGCCCTCAGTCTTTTTTGTACACATGCTCATCCCTGGCATGT ceYbgae_egihffhhd_^efhihfhhfXcghfcgacgfI^[cba\eecgZ_HWWHLaZ`VVVb`gacccZb_RZ]`]bbcbbbb^bc^`X][S XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
DJB775P1_0215:5:1101:1317:2430#0 81 chr9 8239023 37 94M chr12 51288201 0 TTAAGTATTAAATGACATAAAACCTATAAAGCACATAGCAGGTAAATGTGGTAAACTCTTGATAAATGTTATTGTTATCATCATCATCATCACT b`]a^VHRcaggggbgeghgc\bhgefbZ\MW[gfgce^[gfff^^aa^OXeaYIae[hhhhgf^[d[hgfge_hhhgd_hY^bfdbcegecZc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
DJB775P1_0215:5:1101:1461:2198#0/3 161 chr12 51288201 37 94M chr9 8239023 0 CTGTTGGCTGGAATGTAAAATGGTGCAGCTGCTGTGGAAAACTGCATGGCAGTTCCTAGAAAAATTAAAAATAGAATTACCATATGATCCAGCA egggggefdf`egg[bdgh`]gh^dbe`dfhhbffbgIIX^^e_fgffhabgH\\_\HM\d]dUGV\\VV_ZVVHHUZ__bbc]`BBBBBBBBB XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0

It is really peculiar that most if not all read pairs have been mapped to different chromosomes. How is this possible? When I use samtools to filter out the correctly paired reads, I only obtained a very small file.

Can someone tell me why my reads are paired so weirdly? Any help is greatly appreciated!
Leave a comment:
ywlim replied

11-03-2011, 05:04 PM
I have some questions about aligning paired end sequencing reads. I am using BWA sampe function to align my paired end reads and it worked, but surprisingly almost all reads are being paired with reads on a different chromosomes, resulting in a lot of "improper reads". I don't understand why BWA did that and I wonder if it was because I used the command "bwa sampe -a 15000 -A" to force bwa to not run smith waterman alignment for unmapped reads.

Also, if paired end reads share the same x and y coordinates, which are indicated by the first line of their fastq files, why doesn't bwa just pair them up by their coordinates? That seems like the most straightforward way to find the right pair to me.
Leave a comment:

Previous 1 2 3 4 5 6 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News