Seqanswers Leaderboard Ad

**vinay427** · 07-05-2012, 09:26 AM

Bump, does anyone have any neat ideas or advice about a project that would be helpful to the community? Of course, it would be open-source licensed under GPL.

**pbluescript** · 07-05-2012, 12:09 PM

It would be nice if there was a tool that could take a bam file aligned to a transcriptome and along with a gtf file convert the transcriptome coordinates to genome coordinates.

I have asked if such a tool exists here before and done my own searching with no luck. I've cobbled together a solution to the problem, but it's not fast and it only works in my specific case.

**vinay427** · 07-06-2012, 11:07 AM

Anyone else feel that pbluescript's idea would be useful? I'm only trying to gain a general idea of people's opinions toward such a tool (converting .bam aligned to transcriptome and .gtf to genome coordinates).

**alec** · 07-06-2012, 03:30 PM

If you go with the conversion tool, it would also be useful if it could convert from, say, hg18 to hg19 using a liftover file.

My vote would be for a Python wrapper to bowtie or bwa. For example, I often have barcoded data and would like to write a loop that reads a fastq file, removes the barcode, aligns the read, and then uses pysam to write it to a different bam file depending on the barcode. This would also give me more control over how multiply aligned or mispaired reads are handled, or let me store additional information with the alignment.

**A_Morozov** · 07-08-2012, 08:06 PM

vinay427, you can contribute quite a bit to BioPerl and BioPython modules if you know any of these languages. Just check out mailing lists: there always are some requests.

**pmiguel** · 07-09-2012, 05:52 AM

I would like a tool that visually displays the components of an RNA sample sequence by length and abundance. What I have in mind would be a 2 dimensional plot. X-axis would be "nucleotides" of length of a transcript. Y-axis would be read counts for the given transcript. The Y-axis might be usefully log-transformed.

So, if you did no ribosomal depletion, and sequenced 30 million reads into a sample, somewhere on the order of 90% of the reads (27 million) would be 28S and 18S nuclear rRNA. So those might be plotted as being 4 thousand and 2 thousand nucleotides on the x-axis and 18 million reads "thick" and 9 million reads thick respectively. Below them would be the next rank of high expressers -- possibly mitochondrial rRNA, or some abundant tissue specific gene. Or some small transcriptome contaminant, like from bacteria contaminating a cell culture.

Then subsequent rows of decreasing thickness would populated the lower part of the diagram.

The purpose of this would be to give a general, visual, overview of the composition of an RNA sample. I don't think you get that using a full genome viewer.

--
Phillip

**Jean** · 07-10-2012, 06:22 AM

Originally posted by pmiguel View Post

I would like a tool that visually displays the components of an RNA sample sequence by length and abundance. What I have in mind would be a 2 dimensional plot. X-axis would be "nucleotides" of length of a transcript. Y-axis would be read counts for the given transcript. The Y-axis might be usefully log-transformed.

So, if you did no ribosomal depletion, and sequenced 30 million reads into a sample, somewhere on the order of 90% of the reads (27 million) would be 28S and 18S nuclear rRNA. So those might be plotted as being 4 thousand and 2 thousand nucleotides on the x-axis and 18 million reads "thick" and 9 million reads thick respectively. Below them would be the next rank of high expressers -- possibly mitochondrial rRNA, or some abundant tissue specific gene. Or some small transcriptome contaminant, like from bacteria contaminating a cell culture.

Then subsequent rows of decreasing thickness would populated the lower part of the diagram.

The purpose of this would be to give a general, visual, overview of the composition of an RNA sample. I don't think you get that using a full genome viewer.

--
Phillip

Couldn't this be plotted in R with the mapping and transcriptome information?

**pmiguel** · 07-10-2012, 06:56 AM

Originally posted by Jean View Post

Couldn't this be plotted in R with the mapping and transcriptome information?

That would be fine with me.

--
Phillip

**vinay427** · 07-19-2012, 09:31 AM

Originally posted by alec View Post

My vote would be for a Python wrapper to bowtie or bwa. For example, I often have barcoded data and would like to write a loop that reads a fastq file, removes the barcode, aligns the read, and then uses pysam to write it to a different bam file depending on the barcode. This would also give me more control over how multiply aligned or mispaired reads are handled, or let me store additional information with the alignment.

What exactly do you mean by a wrapper? From what I've seen, would it basically pass through arguments to bowtie (or bowtie2) so that you can integrate it into your Python loop?

I found this example; is this essentially what you would want a Python wrapper for Bowtie to do, with different arguments?

Code:

input_filenames = ['BAM-1.bam','BAM-2.bam']
output_filename = 'all.bam'

merge_parameters = [output_filename] + input_filenames
pysam.merge(*merge_parameters)

**severin** · 07-19-2012, 10:56 AM

GBrowse2

I would like to see an extension to GBrowse2 that would permit for on the fly joining of scaffolds based on evidence in the tracks.

**alec** · 07-20-2012, 06:53 AM

Originally posted by vinay427 View Post

What exactly do you mean by a wrapper? From what I've seen, would it basically pass through arguments to bowtie (or bowtie2) so that you can integrate it into your Python loop?

I'm thinking of being able to align sequences one by one. For example, my reads often have custom barcodes at the beginning that need to be removed before alignment and written to a different file for each barcode:

Code:

files = dict([
    ["ATGCG", pysam.Samfile("sample1.bam", "wb")],
    ["GACTA", pysam.Samfile("sample2.bam", "wb")]
])
bwt = bowtie.load("/path/to/index")
for((seq, qual) in FastqReader("reads.fastq")):
    barcode = seq[0:5]
    if barcode in files:
        files[barcode].write(bwt.align(seq[5:], qual[5:]))
    else:
        pass #handle nonmatching barcodes here

There are also a lot of cases where I would want to store additional information in the auxiliary data, set certain flags, or otherwise modify the alignment before writing it. Of course, this can already be done with pysam, the trick is getting the alignment in the first place.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Useful bioinformatics tool ideas?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News