Seqanswers Leaderboard Ad

**carmeyeii** · 12-12-2012, 05:39 PM

Hi!

I'm analyzing a "second-hand" dataset generated using SOLiD 4. It is a transcriptome mate pair library that is 52 x 37 nt, and I cannot for the sake of me find the protocol that was used to generate those specific read lengths. I have F3 and R3 reads, so I am assuming it is a circularization protocol, but I do not know what the size selection parameters were, or how the circles were cut to produce the final fragments. This info would be very valuable for a more accurate mapping.

Any knowledge would be greatly appreciated!

Thanks a lot,

Carmen

**naveedakhtar** · 06-13-2013, 11:07 PM

to [QUOTE=ECO;1350]biocc,

Originally posted by ECO View Post

biocc,

"paired end" or "mate pair" refers to how the library is made, and then how it is sequenced. Both are methodologies that, in addition to the sequence information, give you information about the physical distance between the two reads in your genome.

For example, you shear up some genomic DNA, and cut a region out at ~500bp. Then you prepare your library, and sequence 35bp from each end of each molecule. Now you have three pieces of information:

--the tag 1 sequence
--the tag 2 sequence
--that they were 500bp ± (some) apart in your genome

This gives you the ability to map to a reference (or denovo for that matter) using that distance information. It helps dramatically to resolve larger structural rearrangements (insertions, deletions, inversions), as well as helping to assemble across repetitive regions.

Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

Mapping over repeats is similar...if one read is unmappable because it falls in a very repetitive region (eg. LINE, LTR, SINE), but the other is unique, you can again use that distance information to map both reads. The first read would likely come from the repeat that is ~500bp away from your unique second read.

Hope that helps. It's a weird concept at first, but very useful for all types of sequencing. It's been around at some levels since the days of shotgun sequencing.

And lastly, the terminology between "paired end" and "mate pair" is typically that "paired end" refers to sequencing both ends of the same molecule, while "mate pair" (in ABI's case) refers to sequencing only two tags (made by Type IIS restriction enzymes a la SAGE) from the ends of a typically much larger molecule. I could be wrong here though...

how can paired end sequencing detect inversion? that you mentioned it along the detection of strucural rearrangment?

**naveedakhtar** · 06-13-2013, 11:18 PM

my another question is that in the whole genome shot gun assembly the paired end sequencing of large insert clone, specially prepared, used as strategy to overcome the genome assembly problem due to repetitive sequences in the sequencing of complex genome. how does paired end sequencing perform this in the absence of a reference genome?

**mastal** · 06-14-2013, 03:56 AM

what is a paired-end read?

Originally posted by naveedakhtar View Post

my another question is that in the whole genome shot gun assembly the paired end sequencing of large insert clone, specially prepared, used as strategy to overcome the genome assembly problem due to repetitive sequences in the sequencing of complex genome. how does paired end sequencing perform this in the absence of a reference genome?

You use a program that does de novo assembly (velvet, abyss, mira, soapdenovo, many others). If there is a reference genome of a related species you can use that as a reference for the assembly. Having paired reads can help to scaffold the contigs.

**OTU** · 08-15-2013, 07:37 AM

Hi all!

I have a question... Found in some old papers (1999) a term "forward-reverse constraints". The question is - is this term the same as "paired-end reads"???

OTU

**westerman** · 08-19-2013, 06:56 AM

No. Not the same as 'paired-end reads' although it has to do with paired-end. Google the term. That should be enlightening.

**binlangman** · 05-21-2014, 05:29 PM

what is the paired-end distance?

I read papers, and they mentioned 'the paired-end distance' many times. What is the paired -end distance?
Example:
If |-----75----|----------------------100-----------------|-----75-----|,
and paired-end data both 75bp, and in this case,the paired-end distance is 100 or 250 or others?

Thanks!

**mastal** · 05-22-2014, 01:41 AM

It could be either 250 or 100.

Different software packages may have their own definition of 'insert length', so it's best to read the documentation carefully.

For example, in the case you have illustrated, velvet would define the 'insert length' as 250.

This is the definition given in the velvet manual:

"The insert length is understood to be the length of the sequenced fragment, i.e. it includes the length of the reads themselves."

Error: 404 | EMBL-EBI

https://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf

Bowtie2 also uses the same definition of fragment length.

**mido1951** · 10-21-2015, 08:07 AM

how to do an assembly if we have paired end reads? (two files R1.fq and R2.fq)?
thankyou

**OTU** · 10-21-2015, 08:15 AM

What is your data on? Metagenome, single genome?
What sequencing platform did you use? What is the processing computer power that you can use?

**mido1951** · 10-21-2015, 08:48 AM

I have llumina paired end data.
I want to make an assembly of these data.
But the problem I do not understand the two F1.fq file and F2.fq.
Is that reads and reads of F1.fq F2.fq are complementary or not?
for the assembly do I have to overlap F1.fq or I have to overlap and F1.fq F2.fq?
thanky

**GenoMax** · 10-21-2015, 09:02 AM

Originally posted by mido1951 View Post

I have llumina paired end data.
I want to make an assembly of these data.
But the problem I do not understand the two F1.fq file and F2.fq.
Is that reads and reads of F1.fq F2.fq are complementary or not?
for the assembly do I have to overlap F1.fq or I have to overlap and F1.fq F2.fq?
thanky

Cross-posted: https://www.biostars.org/p/162806/

@mido1951: See this page for a simple explanation of "shotgun sequencing": https://en.wikipedia.org/wiki/Shotgun_sequencing In the past people used sanger sequencing for this, which has now been replaced with NGS.

R1/R2 are merely sequences from the two ends of a fragment. They do not need to be complementary (in fact in most cases they will not be). You do not need to worry about R1/R2 reads individually but use them as a set for assembly.

**mido1951** · 10-22-2015, 01:53 PM

for example:
we have the sequence: S1: ATCGTTGAGCAGACT and the sequence S2: TGAGCAGACTTAAGTAGTTTT .
and for example, was the first sequenced reads from S1: R1 = ATCGTTGAG
R2 = AGTCTGCTC (reverse complement from the right)
and from the second sequence: R1: TGAGCAGAC
R2: AAAACTACT (reverse complement from the right)
So we have the two files paired end:
F1.fq:
S1: R1=ATCGTTGAG
S2: R1=TGAGCAGAC
F2.fq:
S1: R2=AGTCTGCTC
S2: R2=AAAACTACT

in the assembly here there is an overlap between R1(S1) and R1(S2).
in assembly, we can have overlap between R1 and R2 from two differents sequence??

**OTU** · 10-22-2015, 01:58 PM

Don't see why
"F1.fq:
S1: R1=ATCGTTGAG
S2: R1=TGAGCAGAC " would overlap. They only match at 3 bp. Assemblers won't combine them.

**mido1951** · 10-22-2015, 02:01 PM

I speak of an example.
it is assumed that it is an overlap.
I want to create an assembly tool but first I need to know how to detect overlap between the paired ends (from two files). and make assembly with paired end.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News