Seqanswers Leaderboard Ad

**dpryan** · 03-06-2015, 11:58 AM

That's not usually in the BAM file. Use htseq-count with the -o option on the file.

**mashbaugh** · 03-08-2015, 08:45 PM

Thank you Devon.
Htseq-count shows me how many transcripts mapped to each gene. What I want, however, is the opposite; a list of transcripts an what genes they mapped to. Any idea if Htseq-count is capable of getting that information? Tophat must retain the read and alignment information, I just can't figure out how to get that information.
-Melissa

**zinky** · 03-08-2015, 10:41 PM

Originally posted by mashbaugh View Post

Thank you Devon.
Htseq-count shows me how many transcripts mapped to each gene. What I want, however, is the opposite; a list of transcripts an what genes they mapped to. Any idea if Htseq-count is capable of getting that information? Tophat must retain the read and alignment information, I just can't figure out how to get that information.
-Melissa

I don't know the reference used in your tophat mapping procedure, gene level or chromosome level. If chromosome level, you may also need a gene annotation file with .gff suffixed. Then the htseq-count command can help you to finish what u want; If your reference was gene level, a samtools view command can read bamfile into samfile. This file is readable and you can write a text process script to count the reads in each gene.
PS: Since the output bamfile of tophat contains mapped reads only, you can just count the reads without filtering flags

**dpryan** · 03-09-2015, 12:50 AM

You seem to be using "transcript" when you mean "alignments". There's a very fundamental difference between the two concepts. I'll assume that you meant to write "alignments".

Htseq-count will normally produce a table of how many reads/pairs mapped to each gene. With the -o option, it will also produce a SAM file with each alignment annotated as to which gene (if any) it overlaps (assuming it overlaps only one gene). You can then simply use "grep" to find all alignments for each gene of interest, should you want to do that.

**mashbaugh** · 03-10-2015, 07:52 PM

Thank you for pointing that out Devon, I do in fact mean read alignments rather than transcripts. Within the HTseq-count SAM output file my alignments are still being identified on the chromosome level rather than the gene level, even though it does also produce the normal table you described. Is there any way to get the gene information into the SAM file?

**dpryan** · 03-11-2015, 12:04 AM

It should be adding an XF:Z:some_gene_name auxiliary tag to the output SAM file. The coordinates will still be genomic, of course. If you really want things with transcript-centric coordinates, then just align against the transcriptome with bowtie2 or bwa.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Extracting RNAseq read names from Tophat accepted hits

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News