Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Useable Next-gen visualization protocol

    I have a smaller data set from an Illumina Solexa (~20-30 GB) and my group has been running some comparative analysis using both CLC Genomeworkbench and Bowtie. The problem we have experienced is the issue of viewing the alignments from Bowtie. The files were converted to SAM format from Bowtie but are too big to use the sam2bed.py script posted on here. I had considered splitting the files but worry about splitting the alignments. Our files can be converted to BAM or bigbed format but then require an HTTP/FTP accessible folder that we do not have, nor can they be viewed with another open souce viewer. Is there an effective, and I dare say easy, way to visualize the alignment using an open source viewer? I am not a novice at bioinformatics but it seems that if you have a significantly sized data set than you are going to be up a creek when attempting to visualize them if you do not have a major computational framework.
    Any help is appreciated.

  • #2
    Originally posted by genbio64 View Post
    I have a smaller data set from an Illumina Solexa (~20-30 GB) and my group has been running some comparative analysis using both CLC Genomeworkbench and Bowtie. The problem we have experienced is the issue of viewing the alignments from Bowtie. The files were converted to SAM format from Bowtie but are too big to use the sam2bed.py script posted on here. I had considered splitting the files but worry about splitting the alignments. Our files can be converted to BAM or bigbed format but then require an HTTP/FTP accessible folder that we do not have, nor can they be viewed with another open souce viewer. Is there an effective, and I dare say easy, way to visualize the alignment using an open source viewer? I am not a novice at bioinformatics but it seems that if you have a significantly sized data set than you are going to be up a creek when attempting to visualize them if you do not have a major computational framework.
    Any help is appreciated.
    I have about 3 GB of bowtie aligned reads that I could view using TABLET but then I had to increase the memory size to 10GB RAM. How much memory do you have? For viewing 20-30 GB, I can't imagine how much you will need. Also, there are no annotation tracks in TABLET, but I think you can load some of your own.

    I came across the same problem as you when using the python script to convert to .bed. although I didn't know the reason it was breaking down.

    Can you pipe all your alignment in .bed format into a database and write a perl/cgi script that pulls thru this db to view into UCSC browser using "custom tracks". I have done this before for setting other tracks and it worked well. This way you pull only specific genes worth of data at a time and the query goes fast.

    May I ask if you know of a program to convert SAM to .bed/or SAM->BAM?

    Comment


    • #3
      "Our files can be converted to BAM or bigbed format but then require an HTTP/FTP accessible folder "

      Where did you get this information from (require an HTTP/FTP folder)? I would like to read more on this.

      Comment


      • #4
        The UCSC website gives instructions for using custom tracks in the genome browser


        SAM tools does file conversions from SAM --> BAM

        Comment


        • #5
          IGV from the Broad will read BAM files and won't display any more than can fit into memory -- so you don't see much when zoomed very far out, but can view even very deep alignments at high resolution

          Comment


          • #6
            Originally posted by thinkRNA View Post
            I have about 3 GB of bowtie aligned reads that I could view using TABLET but then I had to increase the memory size to 10GB RAM.
            That seems a little steep for that amount of data. Can I ask how many contigs/how long the reference sequence(s) were? Tablet doesn't (yet) cache reference data (including any protein translations that are turned on) so that's certainly one area that eats memory like crazy.

            The next version will support indexed BAM assemblies, so you'll be able to browse around massive data sets (in chunks) using a fraction of the memory that the current version does. It'll still hold reference data in ram, but we'll get that cached too eventually...
            Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

            Comment


            • #7
              so how much RAM do you think is needed to view 20 GB of bowtie alignment in Tablet?

              I didn't use the references sequences. I just remember reading in the manual that references sequence are not needed for SAM format (but I may be wrong). I loaded about 1 million illumina read alignment and even with 10 GB RAM, Tablet was hanging up on me.

              Comment


              • #8
                Originally posted by thinkRNA View Post
                so how much RAM do you think is needed to view 20 GB of bowtie alignment in Tablet?
                I honestly couldn't say, as it depends on a number of factors. I would like to know myself, but we just don't have access to data sets of that size.

                Originally posted by thinkRNA View Post
                I didn't use the references sequences. I just remember reading in the manual that references sequence are not needed for SAM format (but I may be wrong). I loaded about 1 million illumina read alignment and even with 10 GB RAM, Tablet was hanging up on me.
                Just to be clear... you were talking about 3GB of data? That's 3 gigabytes, not giga-bases? And 1 million what? Reads or contigs? Tablet's memory requirements do change from version to version (so far always with a downward trend), so it might be worth trying it again if you haven't done so for a while.

                (And if people don't mind us having access to these problem data sets, we're more than happy to tweak what we can to help get them working with Tablet)

                Iain
                Our software: Tablet | Flapjack | Strudel | CurlyWhirly | TOPALi

                Comment


                • #9
                  When you say you want to view the alignment do you actually need to see the underlying sequence (which is the really big bit!) or just the pattern of alignment against your reference genome. If you're just looking for the distribution of reads then you might want to look at our SeqMonk viewer which is specifically designed to view and analyse very large datasets on a normal desktop PC.

                  Comment


                  • #10
                    IGV viewer

                    I found this viewer recently and used it view a gig or so worth of reads. I had to convert the reads to a binary SAM file, but it will handle just about any format. It will also load a genomic sequence and annotation.

                    Ed

                    Comment


                    • #11
                      I use and like SeqMonk. Working with Arabidopsis, I can load all five chromosomes for three bowtie files on a 64-bit 4 GB ram desktop.
                      You can go from a whole chromosome view down to single reads. There is a pull down menu where you can type in a position or a range on a particular chromosome and it will go straight to that position. This is very useful when checking results of Chip-Seq.
                      SeqMonk will also take a variety of files other than bowtie, including Eland, BED, MAQ, SAM, and some others.
                      Simon Andrews has been very helpful if we have any problems.

                      Comment


                      • #12
                        I would also recommend IGV for viewing large nextgen alignments.

                        First convert your SAM alignment to the binary BAM format using Samtools (http://samtools.sourceforge.net/). You must then sort and index the BAM file with Samtools, and then view the alignment using IGV (http://www.broadinstitute.org/igv/). If you can't see your alignment after loading it into IGV, make sure that your reference sequences are labelled correctly and regenerate your alignment (ie: chromosomes must be named chr1, chr2, ... chrX and NOT 1, 2, ... X in your reference sequence file).

                        I have used IGV to view ~2.5 gigabyte BAM alignments on a 64-bit Ubuntu system with only 4 Gigabytes of RAM without any issues. The same system could definitely handle much larger alignments, as IGV used only a fraction of the available RAM.
                        Last edited by sperry; 02-22-2010, 08:18 AM.

                        Comment


                        • #13
                          To convert sam to bed, use this command line:

                          samtools view <SAMFILENAME.sam> -Sb | bamToBed -i stdin > BEFILENAME.bed

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Today, 08:47 AM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          57 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X