Unconfigured Ad

**Simon Anders** · 03-15-2010, 10:39 PM

I don't know how TopHat reacts to it but I can already tell you that Bowtie won't like it, and hence Tophat will fail, too.

I'm currently working with a similar data set and noted that Bowtie fails to find an alignment for an overlapping paired read (and so does Eland). I ended up aligning the two ends separately and then stitching things together manually.

Of course, this is not an ideal solution.

Simon

**KevinLam** · 03-16-2010, 12:03 AM

Originally posted by Simon Anders View Post

I don't know how TopHat reacts to it but I can already tell you that Bowtie won't like it, and hence Tophat will fail, too.

I'm currently working with a similar data set and noted that Bowtie fails to find an alignment for an overlapping paired read (and so does Eland). I ended up aligning the two ends separately and then stitching things together manually.

Of course, this is not an ideal solution.

Simon

how did you stitch them?
samtools merge?

**KevinLam** · 03-16-2010, 05:51 AM

Originally posted by wenhuang View Post

Hi,

I have a paired end (2x75) Illumina data set that might have overlap at the ends. The fragment size selected was 240 and after subtracting adapter/primer sequences, there was about 120 bp left, which generated about 30bp overlap at the ends.

Thanks for your help!

Why not convert your paired end data into single end?
Since there is a 30 bp overlap. they should assemble into a single read quite nicely.

so you end up with a 120 bp SE data.

**wenhuang** · 03-16-2010, 06:05 AM

My alignment did not seem to have too much problem. Here is just a sample of the first few alignments. It appeared to me that the two reads were processed separately, but I am not so sure about that.

HWUSI-EAS787_0001:5:70:1610:809#AAATAG 99 chr1 5312 255 81M = 5366 0
GCGAGGAAAGAAATGCACTAAGTAAAAAACTTAGTCATTTTTTAAAGAGAATTAAAATGAAGTCCAATTCCTTTGAGTTAC HGHHI
HHHGHHHGGGHHHHHHHHIHHHGHFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHEHHFHEHGHHG NM:i:0
HWUSI-EAS787_0001:5:70:1610:809#AAATAG 147 chr1 5366 255 81M = 5312 0
AAATGAAGTCCAATTCCTTTGAGTTACAAATTTACAATCACTACTCAGTAATTAAAACTATTCAGTTATAGTGAACTGATT IHFHH
IHBGHHHHHGHHFEHHHHHHHHHHHHHHHHHHHHEHHGHHHHHHHHHHHHGGHHHHHHHHHHIHHHHHHGHHHHHH NM:i:0

HWUSI-EAS787_0001:5:30:1504:1763#TTGTCG 163 chr1 5822 255 81M = 5860 0
CCAGAGCCCACAGCTTACTTTTGGTGGTACCCATCCTAAGGGTCTGGGCAAACATATAACGATAAATGTCCATCATTATAA HHGHH
GGFHHHHHHHHHEHHHHHHHHHHHEHHGHDEGHHHHHBBBGGG7FHH2HEHBHH0FHEFHC+?6><CC-CEDDBA@ NM:i:0
HWUSI-EAS787_0001:5:30:1504:1763#TTGTCG 83 chr1 5860 255 81M = 5822 0
AGGGTCTGGGCAAACATATAACGATAAATGTCCATCATTATAATATCACACAGAGTAGTTTCACTGCCCTGAAACTCTTTT G@CBF
HE?G=HHGIHHHHGHGHBHGHHHEGHDHHGHHFFHHHHHHHHHHGHHGHGFHCHHGHHHHFHHHHHHHHHHHHHHH NM:i:0

Originally posted by Simon Anders View Post

I don't know how TopHat reacts to it but I can already tell you that Bowtie won't like it, and hence Tophat will fail, too.

I'm currently working with a similar data set and noted that Bowtie fails to find an alignment for an overlapping paired read (and so does Eland). I ended up aligning the two ends separately and then stitching things together manually.

Of course, this is not an ideal solution.

Simon

**wenhuang** · 03-16-2010, 06:09 AM

I think this is a decent solution. Many of my reads suffered from bad quality at the end though. Can you recommend a type of tools that might do this job ? Thanks!

Originally posted by KevinLam View Post

Why not convert your paired end data into single end?
Since there is a 30 bp overlap. they should assemble into a single read quite nicely.

so you end up with a 120 bp SE data.

**KevinLam** · 03-16-2010, 06:47 AM

Originally posted by wenhuang View Post

I think this is a decent solution. Many of my reads suffered from bad quality at the end though. Can you recommend a type of tools that might do this job ? Thanks!

I only know phrap which can do this but if applied to so many reads I am not sure how long it will take.

**Cole Trapnell** · 03-16-2010, 08:56 AM

Originally posted by wenhuang View Post

Hi,

I have a paired end (2x75) Illumina data set that might have overlap at the ends. The fragment size selected was 240 and after subtracting adapter/primer sequences, there was about 120 bp left, which generated about 30bp overlap at the ends.

My questions are:

1) is this going to affect tophat alignment ? how should the -m option be specified?

2) when counting coverage, my intuition is that those overlapping bases might be counted twice, while they only appear in the library once, is there any way to get around this?

3) is this going to affect cufflinks transcript assembly and quantitation?

Thanks for your help!

As of TopHat 1.0.13, you should be able to specify a negative inner distance of -30. TopHat does map the reads independently, and has a different algorithm from Bowtie for handling the ends. The coverage.wig file display depth of read coverage, not depth of physical coverage, so those bases will be double counted, as you suggest. However, Cufflinks operates at the fragment level, not the read level, and so should do the right thing here.

**ecabot** · 03-16-2010, 10:27 AM

Here are more details about Wen's run which was 2x75.

The minimum fragment size, including flanking adapters is 150 bp. Thus fragments with the smallest insert could be diagrammed like this with 32 bases of overlapping cDNA

[adapter:59][cDNA 32][adapter:59]
o~~~~~~~~~~~> (with 43bp of adapter)
<~~~~~~~~~~~~o

I am assuming, however that reads this short would fail to map because of the high proportion of adapter-derived sequences embedded in the reads.

These considerations lead me to the following questions:

1) Does the negative inner distance of, for example, -30 reflect an expected mean of 30 bp of overlap or does it specify a maximum amount of overlap.

Afterall, most of Wen's reads don't overlap and the overlap could be as high as a full 75bp for a 193bp fragment. If I were to calculate the actual mean inner distance taking overlaps as having negative distances, the overall mean might well turn out to be positive.

2) If we were to trim the adapters this would invariably lead to a distribution of read lengths rather than a uniform 75 bases. Can Bowtie and TopHat deal with unequal read lengths or is this likely to be a problem?

**ecabot** · 03-16-2010, 10:29 AM

Here is how the diagram from my previous posting should look (with dots replacing whitespace). Sorry for the confusion.

[adapter:59][cDNA 32][adapter:59]
.............................o~~~~~~~~~~~> (with 43bp of adapter)
...........<~~~~~~~~~~~~o

**Auction** · 03-18-2010, 11:01 AM

Originally posted by Simon Anders View Post

I don't know how TopHat reacts to it but I can already tell you that Bowtie won't like it, and hence Tophat will fail, too.

I'm currently working with a similar data set and noted that Bowtie fails to find an alignment for an overlapping paired read (and so does Eland). I ended up aligning the two ends separately and then stitching things together manually.

Of course, this is not an ideal solution.

Simon

In my case, it seems bowtie 0.12.3 (and also BWA) works well for overlap pair-end. I have 2*59 reads, and I found the ISIZE for many records is less than 118 and the FLAG field indicate they are properly mapped.

**Cole Trapnell** · 03-18-2010, 11:10 AM

Originally posted by Simon Anders View Post

I don't know how TopHat reacts to it but I can already tell you that Bowtie won't like it, and hence Tophat will fail, too.

I'm currently working with a similar data set and noted that Bowtie fails to find an alignment for an overlapping paired read (and so does Eland). I ended up aligning the two ends separately and then stitching things together manually.

Of course, this is not an ideal solution.

Simon

TopHat and Bowtie use completely different procedures to handle paired ends, and their policies are not the same. TopHat maps the left and right reads independently, and recent versions should have no trouble with paired end libraries with negative inner distances and overlapping reads. With TopHat 1.0.13 and Cufflinks 0.8.0, I have processed an RNA-Seq library size selected to 100bp and sequenced with 2x76bp GAII. The mean inner distance in this case is negative, and the TopHat/Cufflinks stack produced fine results.

To answer a previous question - TopHat will not handle reads of different lengths gracefully, so if you make "virtual" long reads from overlapping mates, make sure to trim the products down to a uniform length.

**ACTGangster** · 06-15-2010, 04:47 AM

Another possible solution

I had to edit this post. I wrote a program that assembles overlapping paired ends from illumina. It used to be public but now it's private because I want to do a paper on it.

If you want a copy, you can e-mail me and I'll send it to you.

I tested it on 1.5 million reads that overlapping ~25 bp and it assembled about 78% into larger contigs which can then be de novo assembled. In the overlapping region, it chooses the nucleotide with the best quality score (if there is a discrepancy). If the there is a discrepancy and the quality scores are the same it chooses the appropriate ambiguous nucleotide.

**Zigster** · 07-29-2010, 12:35 PM

I uploaded a python script I wrote for this to SVAR:

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

**ACTGangster** · 07-29-2010, 12:39 PM

stitch

I open-sourced my Stitch program as I do not plan on writing a paper on it specifically.

GitHub - audy/stitch: Overlap assembler of paired-end DNA sequences generated by Illumina

http://github.com/audy/stitch

Overlap assembler of paired-end DNA sequences generated by Illumina - audy/stitch

It runs on as many cores as you have. I did 20 million reads in 40 minutes on a 16-core mac pro.

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Overlapping paired end - tophat

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News