Tophat reads that weren't used for mapping?

vkartha

Member

Join Date: Feb 2012

Posts: 28
- Share
- Tweet
#1

Tophat reads that weren't used for mapping?

04-18-2012, 07:45 AM

Hey,
I was running TopHat for the first time for paired-end reads which were 101 bp long. Since it was a test run, I only performed it for the first 500,000 reads that were in each fastq file. I noticed that before it aligns/maps the left and right ends to the hg19 build, it got rid of 19 reads from the left file and 7 reads from the right file.

In case you are wondering about the output below, 'reads_in' is 487,355 and not 500,000 in each file because the original fastq files were quality filtered using the fastxtoolkit. This would remove reads independently in each file and so I had to use a script to find intersecting/matching pairs from the first 500,000 reads in both the filtered fastq files. That resulted in 487,355 with the remainder being orphaned (I will map this later separately)

cat left_kept_reads.info
min_read_len=101
max_read_len=101
reads_in =487355
reads_out=487336

cat right_kept_reads.info
min_read_len=101
max_read_len=101
reads_in =487355
reads_out=487348

Does anyone know why this happens when TopHat is run? Also, is there a way I can find those left out reads, maybe using the output bam file and the 2 original fastq files?

Also, what is traditionally the next step after obtaining this bam file? This is a part of an RNA-seq analysis.

Any help would be greatly appreciated.

Thanks

Last edited by vkartha; 04-18-2012, 07:54 AM.
Tags: None

Previous template Next

Genetic Variation in Immunogenetics and Antibody Diversity

by seqadmin

The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
- Channel: Articles
11-06-2024, 07:24 PM
Choosing Between NGS and qPCR

by seqadmin

Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
- Channel: Articles
10-18-2024, 07:11 AM

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Tophat reads that weren't used for mapping?

Latest Articles

ad_right_rmr

News