Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • malachig
    Senior Member
    • Aug 2010
    • 117

    ALEXA-Seq : Alternative expression analysis by RNA sequencing paper

    The Marra lab is pleased to announce the recent publication of a manuscript describing the use of RNA-seq data for alternative expression analysis.

    The advance online publication can be found at Nature Methods here:
    Griffith et al. 2010

    Briefly, the method utilizes RNA-seq data to profile transcriptomes, identify transcript features expressed above background noise levels, identify differentially expressed genes, and identify alternatively processed transcripts. Particular emphasis is placed on comparisons between experimental conditions (tumor vs. normal, drug sensitive vs. resistant, etc.)

    Results generated using this method/pipeline can be found here:
    ALEXA-seq.

    To date, 76 libraries corresponding to 16 projects have been analyzed by the ALEXA-seq approach.

    Some specific examples of the output can be viewed here:
    UMPS expression and splicing in 5-FU sensitive and resistant cell lines

    CA12 expression and splicing among normal breast tissue sub-types

    To view these examples, your browser must have SVG support (scalable vector graphics). FireFox produces the best results in my experience.
  • Jon_Keats
    Senior Member
    • Mar 2010
    • 279

    #2
    Hi,

    Nice looking application. Do you have any suggestion for the minimum number of read pairs per sample? For the hypothetical events in the database would this include all possible exon junctions (ie. assuming no known transcript or est support for alternatives) for the following example:

    Exon1--Exon2--Exon3--Exon4--Exon5

    Canonical transcript/est supported junctions

    Exon1-Exon2
    Exon2-Exon3
    Exon3-Exon4
    Exon4-Exon5

    Hypothetical junctions generated

    Exon1-Exon3
    Exon1-Exon4
    Exon1-Exon5
    Exon2-Exon4
    Exon2-Exon5
    Exon3-Exon5

    Comment

    • malachig
      Senior Member
      • Aug 2010
      • 117

      #3
      Hello Jon,

      Thanks for the encouraging word. Your two questions are quite different so I will answer them separately.

      "Do you have any suggestion for the minimum number of read pairs per sample?"

      This is a straight-forward and reasonable question to ask but is difficult to answer directly. This is by far the number one question I am asked about RNA-seq analysis. It has been discussed in various places in this forum including by myself here: How much coverage we need?

      The answer really depends on the particulars of your input material (e.g. RNA quality, cell heterogeneity), the type of library construction (e.g. polyA+ RNA vs. ribominus RNA), the tissues they were created from, the goals of the analysis, etc. I would always rather have more data than less. When absolutely forced to give a hard number I say that for alternative expression analysis with ALEXA-seq, the results really started to shine when I had at least 100 million paired 42-mers (of which say ~40-70% map to known transcripts depending on the library). If you have longer paired reads, you can get away with less of them.

      I have analyzed libraries of highly varying depth and quality and many of these analyses are summarized here. You can browse through these and see what the outcome looks like to get a more hands-on feel for what increasing depth gets you in terms of alternative expression analysis. For example, the REMC, Morgen, and 5-FU datasets have ~100-200 million mapped paired-end reads (36-mers to 75-mers) and produce beautiful alternative expression results. On the other hand the Sutent dataset has only ~10 million mapped reads and is really only good for gene-level analysis. Similarly, the AllenBrain libraries suffered from poor quality input RNA and this caused all kinds of problems with the analysis even though the number of reads was reasonable.

      Comment

      • malachig
        Senior Member
        • Aug 2010
        • 117

        #4
        Your second question is more straightforward. Yes, that is how we create the hypothetical events in the junction databases. Using Ensembl exons as a starting point, we create the combinatorial pairwise connections of these exons. A subset thus correspond to canonical junctions but the majority correspond to hypothetical connections. The number of possible junctions for a gene with n known exons is n!/(2!(n – 2)!)

        For the human hg19 transcriptome annotated by Ensembl, this results in 3,305,170 junctions, only 284,796 of which correspond to a known transcript. If you think such a database might be useful to you, please refer to the downloads page. Junction databases are available for human hg18 and hg19 here. See links to 'additional junction DBs' on this page.

        Junction databases including the sequences (fasta format) and corresponding annotation info for each are provided for 20 lengths of junction sequences (from 60mers up to 150mers). Included in the annotation files are chromosome coordinates, number of exons skipped, Ensembl support, EST and mRNA support from human and all other species, predicted peptide sequence, etc.

        Comment

        • Lee Sam
          Member
          • Oct 2008
          • 57

          #5
          I'm playing with the ALEXA-Seq image, and I'm wondering what kind of data path the scripts require. I ask because I just point it at a common folder /home/alexa-seq/seq_files with .fastq files named s_n_1/2_sequence.txt (just for test, 2 lanes). Does it need the full pipeline analysis path?

          Comment

          • malachig
            Senior Member
            • Aug 2010
            • 117

            #6
            I assume you mean in the config file where you point to the data... If so, then the data path can be anywhere, but it has to be a complete path to a directory that contains your data files... This doesn't have to be where the data files where originally generated. If you are using fastq files, you will have to change the SeqFileType column to fastq. I recommend using qseq files instead as the first step will be faster.

            Comment

            • Lee Sam
              Member
              • Oct 2008
              • 57

              #7
              Originally posted by malachig View Post
              I assume you mean in the config file where you point to the data... If so, then the data path can be anywhere, but it has to be a complete path to a directory that contains your data files... This doesn't have to be where the data files where originally generated. If you are using fastq files, you will have to change the SeqFileType column to fastq. I recommend using qseq files instead as the first step will be faster.
              I figured out my issue. Now I have another question: have you processed any HiSeq data with the pipeline? I started a couple HiSeq lanes 4 hours ago and it isn't even done with the read pre-processing step (processRawSolexaReads.sh). The last message was that the BerkleyDB was being created to save memory. Thanks for the help.

              Comment

              • hong_sunwoo
                Member
                • Jan 2010
                • 11

                #8
                Hello malachig,
                I checked ALEXA-Seq web site and found that this tool support only paired-end data.
                Do you have a plan to develope a tool for single-end data?

                Comment

                • malachig
                  Senior Member
                  • Aug 2010
                  • 117

                  #9
                  Lee Sam. Yes, the support for fastq was added near the end of development to support another user. It still needs some optimization as the initial read processing step is slow. If you are impatient you can convert your fastq file to either qseq or seq format and this step will run faster. We have processed some HiSeq data, and because each lane is so much larger it did tend to take longer for each step (and use more memory).

                  micrornas, no we don't have a specific plan to develop a tool for single-end data as we never generate single end RNA-seq data... I am aware of another user that processed single end data by creating 'dummy' read pairs (somewhat of a hack but apparently it worked).

                  Comment

                  • obig
                    Member
                    • Nov 2010
                    • 12

                    #10
                    single-end data

                    micrornas. I have processed single-end data with alexa-seq. I created dummy R2 qseq files with sequences of Ns at the same length as the real read and quality strings comprised of all "B" values. This allows the pipeline to run and all dummy reads are filtered out at the first step as "Low Quality" reads. A few of the library summary figures and stats will be affected by this. But, the results I got out were still usable and useful.

                    Comment

                    • Lee Sam
                      Member
                      • Oct 2008
                      • 57

                      #11
                      We're trying to get the heavy lifting (preprocessing, alignment) parts of ALEXA going on our cluster which uses the Torque scheduler. I know that ALEXA was designed to run on a cluster, was there a particular configuration it was designed to work with? I was hoping to edit some of the configuration and script batch generation code to generate jobs that could be submitted.

                      Comment

                      • malachig
                        Senior Member
                        • Aug 2010
                        • 117

                        #12
                        Our cluster uses Sun Grid Engine (sge). Submitting jobs to the cluster is accomplished using a wrapper for the 'qsub' utility of sge. Basically the submission command is just pointing to a batch file containing bash commands (one job per line). I assume this is a somewhat common theme in cluster job submission. If this is the case for you, it shouldn't be too hard to modify the 'createAnalysisCommands' step. You would just need to modify all the lines containing 'mqsub' to match the submission style of your cluster and then when you run createAnalysisCommands use the option '--cluster_commands=1'

                        Comment

                        • obig
                          Member
                          • Nov 2010
                          • 12

                          #13
                          alexa-seq cluster

                          I guess there are too many different cluster configurations for alexa-seq to anticipate. So, simple bash files are produced which can be run serially (for very small libraries) or submitted to your cluster according to its protocols. You will probably have to work with your cluster administrator to get things running optimally.

                          Our cluster here (lawrencium) uses PBS Torque Resource manager and Moab job scheduler. And, with some work, I have been able to submit Alexa-seq jobs to it. I have processed four projects with over 100 libraries to date. So, it is doable. Instead of trying to edit all those parts of the alexa-seq pipeline code that produce job batch files and submission commands, I created a simple perl script which takes an alexa-seq job batch file (essentially just an sh file with one "task/command" per line) and produces the submission files compatible with our scheduler. I strongly recommend this strategy. Changing the alexa-seq code will be a lot more work. What I do is run the alexa-seq pipeline as instructed for steps 0 to 5B. Step 5C (submitMapBatch.sh) is the first step that requires submitting to a cluster. That sh file contains a whole bunch of bash commands for additional sh files (e.g., blast_vs_intergenics.sh). It is those files which should be submitted to a cluster, not the parent submitMapBatch.sh file. You can do them individually or cat them into combined files. I create one combined batch file for all libraries separated only by feature type (repeats, transcripts, etc) because they have different memory and runtime requirements. I can thus optimize cluster submission parameters for each of the 6 feature types. This is necessary because our cluster uses wallclock estimates and task number to determine job priority in the queue. Maybe your cluster has a more simple setup and this step will be unnecessary for you. Once I have combined the bash files I run my submitjobs.pl script on it and wait for it to finish. In later steps, whenever alexa says to submit some jobs to a cluster, the bash file typically contains the tasks/commands (instead of additional bash commands as above). I just run my submitjobs.pl script on each of those bash files. Check .output and .error files for problems and then proceed to the next step.

                          For each project, once the alexa-seq .commands file is produced, I make a new copy of this file and edit it to add my own commands that are necessary for job submission. This file can then be used as a template for running future projects.

                          Comment

                          • bioinfosm
                            Senior Member
                            • Jan 2008
                            • 483

                            #14
                            Originally posted by obig View Post
                            micrornas. I have processed single-end data with alexa-seq. I created dummy R2 qseq files with sequences of Ns at the same length as the real read and quality strings comprised of all "B" values. This allows the pipeline to run and all dummy reads are filtered out at the first step as "Low Quality" reads. A few of the library summary figures and stats will be affected by this. But, the results I got out were still usable and useful.
                            Could you share what advantage you had of tweaking this particular tool and not using any of the specific microRNA tools?
                            --
                            bioinfosm

                            Comment

                            • obig
                              Member
                              • Nov 2010
                              • 12

                              #15
                              Dear bioinfosm,

                              I was responding to a question from the user with user name = "micrornas". This thread doesn't actually have anything specifically to do with the biological entity called microRNA. And, I'm afraid I have no experience to share regarding microRNA tools. This is perhaps a cautionary tale for those choosing a user name that has specific meaning and is commonly used and searched for in the forums.
                              Last edited by obig; 11-17-2010, 03:13 PM. Reason: grammar

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              34 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              35 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              23 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...