Header Leaderboard Ad


Free & Open Environment for RNA-seq analysis: Galaxy (http://usegalaxy.org)



No announcement yet.
This is a sticky topic.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Free & Open Environment for RNA-seq analysis: Galaxy (http://usegalaxy.org)

    The Galaxy Team is excited to announce that the first free public resource for RNA-seq analysis is now available through the Galaxy public server at http://usegalaxy.org

    Galaxy now supports both Tophat and Cufflinks and also provides useful utilities for manipulating and visualizing GTF files, which are common outputs for a Tophat-Cufflinks pipeline.

    Here is an exercise for learning about how to use Galaxy for RNA-seq analysis.

    This addition brings Galaxy's current NGS offerings to:

    1. NGS QC and manipulation - contains a variety of tools for dealing with all flavors of fastq datasets as well as outputs of SOLiD and 454 instruments.
    2. NGS Mapping - currently includes bowtie (Illumina & SOLiD), BWA (Illumina), and lastz (454) mappers. PerM (SOLiD) is on the way and more will be added in the coming months. Transcriptome tools (e.g., top-hat) are also in the final stages of development.
    3. NGS SAMTools - includes a variety of utilities for SAM/BAM manipulation. Some are based on the samtools library, some are written by the Galaxy team.
    4. NGS RNA-seq tools - includes Tophat, Cufflinks, and useful utilities for manipulating and viewing GTF files.

    Galaxy is an open and free web-based platform for performing accessible, reproducible, and transparent NGS analyses. Users can start using Galaxy by going to http://usegalaxy.org ; alternatively, Galaxy can be downloaded and run on any *NIX machine: http://bitbucket.org/galaxy/galaxy-c...wiki/GetGalaxy or run on cloud computing resources such as Amazon: http://usegalaxy.org/cloud

    Here is the previous SEQAnswers announcement about Galaxy's initial NGS offerings.

    Enjoy and please send us feedback!

    The Galaxy Team

  • #2
    I have problem while running RNA seq on Galaxy, I can not save Bam file (it saves as Bam index by default) or sam files. Secondly I am trying to find do you have any plan to integrate Deseq into Gakaxy or it is not necessary?


    • #3

      (1) Clicking on the save icon (the disk) rather than the arrow will download the BAM file rather than the index. (This is a recent UI bug, and we've fixed it in our codebase; you'll see this fix when update our main server.)

      (2) I'm not sure why you wouldn't be able to save SAM files -- perhaps the size is very large and your browser times out or you're not waiting long enough for the file to download? Can you provide more details about the problem that you're having?

      (3) DESeq could certainly be integrated into Galaxy, but we--the Galaxy team--are not currently working on it. Galaxy has many R-based tools already available and we both welcome and try to support submissions from the community for new tool wrappers.

      Finally, Galaxy usage issues/questions are best sent to either [email protected] or [email protected]. These lists go to the entire Galaxy team and, in the case of galaxy-user, to the user community, and you should be able to get help more easily/quickly when you post there.

      Galaxy Team


      • #4
        Im not sure if this is the best place to post this...but here goes...
        we have recently obtained an rna-seq dataset to get differential expression lists from.
        being new to this, I evaluated the galaxy platform and I found it very useful and interesting. the QC and mapping programs in galaxy have been used to obtain bam/sam mapped files. I recently stumbled across the rquant package for galaxy but am unable to install it. I have also downloaded the bam files from the galaxy server. I am trying hard to understand how to proceed from having these bam files to actually obtaining lists of up or down regulated genes for the condition tested. thanks in anticipation


        • #5

          Thanks for sharing your experience with Galaxy perhaps you may also like to mail the message to Galaxy users list. You have to follow the workflow of RNA-seq and have to run cufflink/ cuffdiff. The problem is I am not sure if you can really get to a point in Galaxy where you can get differential expression list of transcripts or genes or isoforms or splicing junctions. However you can certainly take these bam/sam files and run further analysis outside Galaxy. There is also a nice tutorial to work with RNA- seq data (search Galaxy users list). Jeremy Goecks may add in more information about differential expression if I am missing something.


          • #6

            rQuant was developed by Gunnar Ratsch's Lab and is available via their public Galaxy instance at http://galaxy.tuebingen.mpg.de/ Questions should be directed towards them (Help menu --> Email questions) rather than the galaxy-user mailing list.

            The galaxy-user mailing list (see my previous post) is the best place to ask questions about using Tophat/Cufflinks/compare/diff in Galaxy.


            Many users have gotten a functioning Tophat/Cufflinks/compare/diff pipeline working in Galaxy and have produced Cuffdiff quantitation and differential expression datasets. I think the Galaxy team has managed to address most of the big issues with this pipeline, but we're happy to help solve any particular problems that you may be having.



            • #7
              @honey, @jgoecks
              thank you for the quick reply. I really appreciate it.
              I will write to rQuant developers about the issues. just that the login details for galaxy don't let us login to http://galaxy.tuebingen.mpg.de/ and also it would be so much more easier if the sam/bam files generated through read mapping with bowtie in galaxy were available in http://galaxy.tuebingen.mpg.de/
              thank you both once again.


              • #8

                Galaxy makes it relatively easy to move files from one instance to another:

                (1) for the dataset that you want to move, right click on the save (disk) icon next to the dataset and copy its URL;
                (2) for the instance that you want to copy the dataset to, paste the URL into the upload form.

                Galaxy will then copy the dataset from one instance to another without you having to download it to your local computer.

                Complete histories can be imported and exported as well, but this functionality is still in beta.



                • #9
                  Hi Jeremy,

                  This is an awesome tool. I'm new to RNA-seq and was getting dizzy by reading the tons of reports using different programs. I'm glad Galaxy simplified a fairly complicated analysis pipeline into such a simple one.

                  My only request would be if you could answer the questions you made in the tutorial as to know my results and your results are in accordance, and feel more secure by comparing my reasoning with the results one should get.

                  Thank you so much, this was very interesting.


                  • #10
                    Hi friend,
                    I have two questions.

                    (1) Is there a way to see the command lines ran behind Galaxy's web-interface?
                    (2) A few jobs are still waiting to run, if I shut down my PC. Is it still working?



                    • #11

                      The answers to your questions depend on whether you're using our public server or running Galaxy locally/on the cloud.

                      If you're using our public server (main.g2.bx.psu.edu):

                      (1) you cannot see the command lines run by Galaxy;
                      (2) waiting jobs will be run even if you turn off your computer.

                      If you're running locally/on the cloud:

                      (1) you can see the command lines by viewing Galaxy's logs;
                      (2) waiting jobs will not be run unless your Galaxy is running.

                      Questions like this are best directed to one of our mailing list: http://wiki.g2.bx.psu.edu/Support



                      • #12
                        Originally posted by jgoecks View Post
                        The Galaxy Team is excited to announce that the first free public resource for RNA-seq analysis is now available through the Galaxy public server at http://usegalaxy.org

                        The Galaxy Team

                        I am not sure if it was advertised before but galaxy now has a disk quota for user files on the public instance (I understand the reason for the decision).

                        I learned this from a galaxy mailing list answer yesterday. I feel that this should be pointed out as a footnote for this post and mentioned on the main page of galaxy.

                        Thanks for the great work you all do!


                        • #13

                          Quotas on our public instance are a new feature (within the last couple weeks) and are being phased in slowly. Moreover, we're still in process of determining what the quotas should be; currently they are:

                          (a) 50 GB per dataset;
                          (b) 200 GB per history;
                          (c) 4 concurrent NGS jobs;

                          Once we've determined what these will be, I'll update my initial post and we'll ensure that this information is prominently featured on the public site.



                          • #14
                            Nucleotide bias in a specific position of the reads- Galaxy analysis

                            Hi, I'm analyzing my small-RNA-seq data (Illumina 1.9 quality score) and I'm using galaxy to make the preliminary qc tasks. I find it a great and easy tool! I'm here to ask you how can I interpretate a graph:I'm talking about the nucleotide distribution chart after the sample grooming and the 3' adapter trimming. I attach it here so anybody can see it. Up to now I've loaded two samples in galaxy and they both give me this kind of bias at the 3rd nucleotide of the reads. What does it mean? would you suggest to eliminate all those reads which contain the "N" in the 3rd position?
                            Any suggestion would be appreciated! Thanks a lot.
                            Attached Files


                            • #15
                              Trimming Paired-End Data


                              So if I use quality value < 20 to trim my Illumina dataset, which contains paired-end 100 bp sequencing reads, would both reads on the same pair be removed should one of them have a base quality < 20? What I worry is when I use the trimmed dataset to perform de novo assembly, would any program say that the dataset is not paired-end if both reads are not removed at the same time?