Unconfigured Ad

**dpryan** · 03-19-2015, 09:40 AM

What's the most efficient really depends on what you want to then do with the data. For example, keeping track of individual reads doesn't make much sense if you just want the # of reads/frame/transcript/gene. Anyway, assuming you're not dealing with genomic multimappers, I would think that a more efficient manner would be to simply store the number of alignments starting at each position (you can do this conveniently with HTSlib) and then converting those positions to frame/transcript/gene afterward.

**Brian Bushnell** · 03-19-2015, 11:10 AM

I don't understand why you would care which frame a read aligns to, unless you are translating them into amino acids afterward. But if you already know where a read aligns, there doesn't seem to be much point in translating the the read to AA space. Can you elaborate?

**dpryan** · 03-19-2015, 11:41 AM

@Brian: Presumably they're interested in alternate reading frames, which ribosomal profiling would allow determining.

**PolPittacus7** · 03-19-2015, 02:12 PM

Thank dpryan and Brian for the quick replies!

It is very important for us to know the framing of the data, as it allows us to examine ribosomal frameshifting, overlapping/alternate reading frames, as well as cases of ribosomes engaged in eukaryotic non-sense mediated decay. Converting the expressed sequences to amino acid-space is one of the tangential goals, but not one of the primary aims. Hope that clarifies things!

@dpryan - thanks for the earlier suggestion. So in addition to wanting to know the # of reads/frame/transcript/gene, we also want to know WHERE and when changes in framing occur. For example, it appears that in one particular transcript, the ribosomes translate the entire coding sequence in the annotated 0-frame from start to stop codon, but when exposed to a chemical stimulus, the ribosomes elongating along this transcript shift into the -1 frame at about 450 nucleotides downstream of the annotated start codon, translate for about 60 nucleotides in the -1 frame, and then hit a -1 frame premature stop codon. So effectively, the stimulus causes a change in translation patterns that shifts the ribosome into an alternate reading frame along a specific sub-section of the mRNA. That is an example of one of our analyses that we would like to be able to globally and why we need to have framing data for each individual read so that we can do downstream automated analyses and visualization of sub-regions with unusual framing. Does that address your comment?

I will try to implement your suggested solution using HTSlib and downstream frame/transcript/gene conversion tomorrow and let you know how that goes.

**dpryan** · 03-19-2015, 03:46 PM

Yes, it does indeed. In that case, remember that only one end of each alignment is informative (probably the start of read #1, but this will depend on how the libraries are made), so there's no need to store both. There's also no need to store this information per-read. You'll want to run the downstream analysis per-transcript anyway, so either store things as counts/genomic position and then map that to frame within each read or store everything in transcript coordinates (this would have been simpler had the mapping been done directly against the transcriptome with bowtie2/bwa/bbmap/etc.).

BTW, I should note that htslib isn't the worlds fastest method, but it's the reference library for SAM/BAM/CRAM so it will mostly just work (and do so efficiently enough). This is also assuming that you're using something like C or C++. Other convenient options would be htsjdk (java) and pysam (a python interface to htslib). Given the wording in your post, I'm assuming that one of those is appropriate (i.e., you're not some crazy perl person).

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Efficient Data Structure for Managing Ribosome Profiling Data

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News