Seqanswers Leaderboard Ad

**thinkRNA** · 02-18-2010, 12:45 PM

Originally posted by genbio64 View Post

I have a smaller data set from an Illumina Solexa (~20-30 GB) and my group has been running some comparative analysis using both CLC Genomeworkbench and Bowtie. The problem we have experienced is the issue of viewing the alignments from Bowtie. The files were converted to SAM format from Bowtie but are too big to use the sam2bed.py script posted on here. I had considered splitting the files but worry about splitting the alignments. Our files can be converted to BAM or bigbed format but then require an HTTP/FTP accessible folder that we do not have, nor can they be viewed with another open souce viewer. Is there an effective, and I dare say easy, way to visualize the alignment using an open source viewer? I am not a novice at bioinformatics but it seems that if you have a significantly sized data set than you are going to be up a creek when attempting to visualize them if you do not have a major computational framework.
Any help is appreciated.

I have about 3 GB of bowtie aligned reads that I could view using TABLET but then I had to increase the memory size to 10GB RAM. How much memory do you have? For viewing 20-30 GB, I can't imagine how much you will need. Also, there are no annotation tracks in TABLET, but I think you can load some of your own.

I came across the same problem as you when using the python script to convert to .bed. although I didn't know the reason it was breaking down.

Can you pipe all your alignment in .bed format into a database and write a perl/cgi script that pulls thru this db to view into UCSC browser using "custom tracks". I have done this before for setting other tracks and it worked well. This way you pull only specific genes worth of data at a time and the query goes fast.

May I ask if you know of a program to convert SAM to .bed/or SAM->BAM?

**thinkRNA** · 02-18-2010, 12:48 PM

"Our files can be converted to BAM or bigbed format but then require an HTTP/FTP accessible folder "

Where did you get this information from (require an HTTP/FTP folder)? I would like to read more on this.

**genbio64** · 02-18-2010, 12:50 PM

The UCSC website gives instructions for using custom tracks in the genome browser

Genome Browser bigBed Track Format

http://genome.ucsc.edu/goldenPath/help/bigBed.html

SAM tools does file conversions from SAM --> BAM

**krobison** · 02-18-2010, 01:27 PM

IGV from the Broad will read BAM files and won't display any more than can fit into memory -- so you don't see much when zoomed very far out, but can view even very deep alignments at high resolution

**imilne** · 02-18-2010, 01:49 PM

Originally posted by thinkRNA View Post

I have about 3 GB of bowtie aligned reads that I could view using TABLET but then I had to increase the memory size to 10GB RAM.

That seems a little steep for that amount of data. Can I ask how many contigs/how long the reference sequence(s) were? Tablet doesn't (yet) cache reference data (including any protein translations that are turned on) so that's certainly one area that eats memory like crazy.

The next version will support indexed BAM assemblies, so you'll be able to browse around massive data sets (in chunks) using a fraction of the memory that the current version does. It'll still hold reference data in ram, but we'll get that cached too eventually...

**thinkRNA** · 02-18-2010, 01:59 PM

so how much RAM do you think is needed to view 20 GB of bowtie alignment in Tablet?

I didn't use the references sequences. I just remember reading in the manual that references sequence are not needed for SAM format (but I may be wrong). I loaded about 1 million illumina read alignment and even with 10 GB RAM, Tablet was hanging up on me.

**imilne** · 02-19-2010, 12:39 AM

Originally posted by thinkRNA View Post

so how much RAM do you think is needed to view 20 GB of bowtie alignment in Tablet?

I honestly couldn't say, as it depends on a number of factors. I would like to know myself, but we just don't have access to data sets of that size.

Originally posted by thinkRNA View Post

I didn't use the references sequences. I just remember reading in the manual that references sequence are not needed for SAM format (but I may be wrong). I loaded about 1 million illumina read alignment and even with 10 GB RAM, Tablet was hanging up on me.

Just to be clear... you were talking about 3GB of data? That's 3 gigabytes, not giga-bases? And 1 million what? Reads or contigs? Tablet's memory requirements do change from version to version (so far always with a downward trend), so it might be worth trying it again if you haven't done so for a while.

(And if people don't mind us having access to these problem data sets, we're more than happy to tweak what we can to help get them working with Tablet)

Iain

**simonandrews** · 02-19-2010, 12:40 AM

When you say you want to view the alignment do you actually need to see the underlying sequence (which is the really big bit!) or just the pattern of alignment against your reference genome. If you're just looking for the distribution of reads then you might want to look at our SeqMonk viewer which is specifically designed to view and analyse very large datasets on a normal desktop PC.

**vedjohns** · 02-19-2010, 06:46 AM

IGV viewer

I found this viewer recently and used it view a gig or so worth of reads. I had to convert the reads to a binary SAM file, but it will handle just about any format. It will also load a genomic sequence and annotation.

Ed

**mattanswers** · 02-19-2010, 11:00 AM

I use and like SeqMonk. Working with Arabidopsis, I can load all five chromosomes for three bowtie files on a 64-bit 4 GB ram desktop.
You can go from a whole chromosome view down to single reads. There is a pull down menu where you can type in a position or a range on a particular chromosome and it will go straight to that position. This is very useful when checking results of Chip-Seq.
SeqMonk will also take a variety of files other than bowtie, including Eland, BED, MAQ, SAM, and some others.
Simon Andrews has been very helpful if we have any problems.

**sperry** · 02-22-2010, 07:50 AM

I would also recommend IGV for viewing large nextgen alignments.

First convert your SAM alignment to the binary BAM format using Samtools (http://samtools.sourceforge.net/). You must then sort and index the BAM file with Samtools, and then view the alignment using IGV (http://www.broadinstitute.org/igv/). If you can't see your alignment after loading it into IGV, make sure that your reference sequences are labelled correctly and regenerate your alignment (ie: chromosomes must be named chr1, chr2, ... chrX and NOT 1, 2, ... X in your reference sequence file).

I have used IGV to view ~2.5 gigabyte BAM alignments on a 64-bit Ubuntu system with only 4 Gigabytes of RAM without any issues. The same system could definitely handle much larger alignments, as IGV used only a fraction of the available RAM.

**maximilianh** · 11-29-2010, 07:02 AM

To convert sam to bed, use this command line:

samtools view <SAMFILENAME.sam> -Sb | bamToBed -i stdin > BEFILENAME.bed

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

A Useable Next-gen visualization protocol

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News