We recently did a RNA-seq experiment and have a paired end read RNA-seq data. We ran TopHat and generated a bam file. When we visualize this bam file there were only 5-10 % of paired end reads rest are not. I was wondering how TopHat takes paired end reads? I know by reading other posts that Bowtie and TopHat treat paired end reads differently. How come that it is not displaying only small fraction of reads as paired in Bam output file of TopHat?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
paired end reads
Thanks peromhc,
If I understand Bowtie and TopHat will treat reads independently;
Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
I also read somewhere in this forum Cole's post on similar line that both TopHat and Bowtie uses different alogrithm.
Any advise please
Comment
-
I'm just playing with tophat but here is one of our human RNAseq with standard illumina protocol mapping results with the must recent version of tophat using a GTF annotation file. (I noticed the percent mapped is always better if I use the GTF option)
30795354 in total
0 QC failure
19282362 duplicates
30795354 mapped (100.00%)
30795354 paired in sequencing
15965925 read1
14829429 read2
18961808 properly paired (61.57%)
26151540 with itself and mate mapped
4643814 singletons (15.08%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)
Comment
-
Hi,
We have illumina sequence data for paired end reads and we are analyzing them for an RNASeq experiment. The reads are 101 bases long. The library size selected was 225-500 bp which INCLUDES the 2 adapters (60 bp each, on each end of the cDNA fragment).
Subtracting a total of 120 bp, we are left with an insert size of 105 - 380 bases.
Since our read length is 101 bases, I am wondering whether there is literally going to be an overlap or rather a redundancy in reading the lower end of the range of fragment sizes (i.e. the 105 base long fragments) in the two opposite directions (paired-end reads)?
If this is the case, what do I set the mean insert size to when I use the tophat command (-r option) and the standard deviation option? - I know it takes integer, but in my case the mean inner distance is negative since there is an overlap.
Also known to us is that the median insert size is 170. Hope someone has the answers and can help me out as soon as possible. Would really appreciate it. Thanks
Comment
-
Tophat will accept a negative value
We calculate from the sequencing data by aligning to a transcriptome reference around 1 million reads then use picard to get the actual library distribution metrics and feed those values to tophat. Check my intro thread or the code on our website (www.keatslab.org)
Comment
-
Thanks for the link to your website, looks great, but where can I find this intro/script? That makes sense if you would be able to use a portion of your reads to estimate the inner distance - but what is picard?
Would I be able to estimate both the mean transcript size (from which I subtract my paired end reads length) and also the variation (std deviation)?
Thanks a lot for your help
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-08-2024, 11:09 AM
|
0 responses
227 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 11:09 AM
|
||
Started by seqadmin, 11-08-2024, 06:13 AM
|
0 responses
166 views
0 likes
|
Last Post
by seqadmin
11-08-2024, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
80 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment