Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Illumina - RNA Seq. - Gene Expression Analysis

  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina - RNA Seq. - Gene Expression Analysis

    Hello all,
    I have just delved into the world of RNA-Sequencing via Illumina, and as such, do not have much knowledge on the software available. Our lab has just performed RNA-seq. analysis on soybean RNA. The current goal of our lab is to compare our soybean samples and attempt to detect genes that are upregulated or downregulated between the different samples. I've noticed that Illumina offers a software package called "GenomeStudio" that can do analysis like this, but it currently seems to only be compatible with human, mouse, or rat genomes. Does anyone have suggestions on available software that could be used to compare transcription levels between samples? We just received our data from Illumina today, and the reads have been aligned to the soybean genome using Eland.

    Thank you for your advice!

  • #2
    Is there a soybean reference genome available ?

    If there is, then consider using TopHat or ERANGE (use the later only if you are ambitious enough to build your own Cistematic genome) to get RPKM numbers. Once you have those, you can treat them as microarray intensities & put them through the same normalization procedures as microarrays, etc...



    • #3
      Yes, a soy reference genome is available at phytozome. I'm not exactly sure how complete it is. It's currently in its first chromosome-scale assembly.


      • #4
        Good - then take a look at TopHat & see if it does what you need it to do!


        • #5
          DNAstar has a new add-on RNAseq module, Qseq, for its ArrayStar. You should be able to get a 30 day free demo.


          • #6
            yes there is a reference genome for Soybean made by JGI and DOE. Here is a link to the Soybean:


            • #7
              If you have a reference genome then you can use Genome Studio, the human rat and mouse genomes are just supplied for ease.


              • #8
                Originally posted by alim View Post
                Is there a soybean reference genome available ?

                If there is, then consider using TopHat or ERANGE (use the later only if you are ambitious enough to build your own Cistematic genome) to get RPKM numbers. Once you have those, you can treat them as microarray intensities & put them through the same normalization procedures as microarrays, etc...

                Hi Ali and all,
                If I were to be so ambitious as to want to assemble my own Cistematic genome, can you suggest where to start? I am browsing around the Cistematic code looking for some documentation on this, but haven't found any? Sorry if I missed it.


                • #9
                  Hi together

                  @kkamerath's question (even though already quite old - it may help someone else - like me ^^):
                  important: the following will NOT work if you renamed any folders or scripts in the cistematic directory. If you did so - download it again and unpack it to get a clean installation.

                  If you want to build your own cistematic genome:
                  1. Check if your genome is supported (go to .../cistematic/genomes/ and search the line:

                  supportedGenomes = [...]

                  2. If your organism is listed there: lucky, proceed with next steps. Otherwise I'm not able to help (most probably you have to write your own and update the in the cistematic/genomes folder).

                  3. Now you need to download the files required to build the genome. I will give the example I used for making the TAIR9 genome (therefore: - if you want to get a human genome you will have a look at and change it later on):

                  go to .../cistematic/genomes/ and have a look at the function buildArabidopsisDB - namely search for lines where the script points to directories that potentially contain information about your genome. You will find that for arabidopsis the script requires:
                  ---> the fasta files: chr1.fas, chr2.fas, chr3.fas, chr4.fas, chr5.fas, chrM.fas, chrC.fas
                  ---> GFF3 file with genes/transposons/whatever: TAIRX_GFF3_XXX.gff
                  ---> functional descriptions: TAIRX_functional_descriptions
                  ---> GO terms: ATH_GO_GOSLIM.txt
                  Luckely, the files in the original are named exactly the way they are named on the TAIR FTP server. Therefore it is easy to find the required files. If found, download them. I guess that you will find similar things for other organisms that are supported. Just get the required files and continue with step 4.

                  4. Update the paths in the Here the example (excerpt):
                  geneDB = cisRoot + '/A_thalianaTAIR9/arabidopsis.genedb'
                  def buildArabidopsisDB(db=geneDB, downloadDir= cisRoot + '/A_thalianaTAIR9/FASfiles'):
                  genePath = downloadDir + '/TAIR9_GFF3_genes_transposons.gff'
                  annotPath = downloadDir + '/TAIR9_functional_descriptions'
                  goPath = downloadDir + '/ATH_GO_GOSLIM.txt'

                  chromos = {'1': downloadDir + '/chr1.fas', '2': downloadDir + '/chr2.fas',
                  '3': downloadDir + '/chr3.fas', '4': downloadDir + '/chr4.fas', '5': downloadDir + '/chr5.fas',
                  'C': downloadDir + '/chrC.fas', 'M': downloadDir + '/chrM.fas'}
                  You may need several tries to get all the paths correct. Especially the "CISTEMATIC_ROOT" variable needs to be set properly before running the scripts (see below)

                  5. Now everything should be ready for building the genome. Here the commands that need to be passed to the shell (and the explanations):
                  - > very important - set PYTHONPATH AND CISTEMATIC_ROOT. I set both to the directory where the folder "cistematic" is located:

                  export PYTHONPATH=/home/marc/ERANGE/ERANGE31
                  export CISTEMATIC_ROOT=/home/marc/ERANGE/ERANGE31

                  - > open python:


                  - > import the whole cistematic package:

                  from cistematic import *

                  - > run the command that builds your genome. Note that this name will most probably be something like: genomes.GOI.buildGOIDB(). [Hint: "genomes" points to the directory genomes, "GOI" points to your script and buildGOIDB calls the function that builds the genome (this function is defined in the - it is where you updated the paths).


                  -> exit python environment:


                  This is it. In case you also want to update the genome sizes - type in a shell:

                  from string import *
                  fasDir = '/home/marc/ERANGE/ERANGE31/A_thalianaTAIR9/FASfiles'
                  chromos = {'1': fasDir + '/chr1.fas', '2': fasDir + '/chr2.fas', '3': fasDir + '/chr3.fas', '4': fasDir + '/chr4.fas', '5': fasDir + '/chr5.fas', 'C': fasDir + '/chrC.fas', 'M': fasDir + '/chrM.fas'}

                  def chromSize(chromID, chromPath):
                  seq = ''
                  seqLen = 0
                  seqArray = []
                  inFile = open(chromPath, 'r')
                  index = 0
                  line = inFile.readline()
                  for line in inFile:
                  seq = join(seqArray,'')
                  seqLen = len(seq)
                  return seqLen

                  def genomeSize():
                  chroLen = {}
                  genoLen = {}
                  for chromID in ['1', '2', '3', '4', '5', 'C', 'M']:
                  seqLen = {chromID : chromSize(chromID, chromos[chromID])}
                  return chroLen

                  resultingSizes = genomeSize()
                  print resultingSizes
                  print resultingSizes.values()[0] + resultingSizes.values()[1] + resultingSizes.values()[2] + resultingSizes.values()[3] + resultingSizes.values()[4] + resultingSizes.values()[5] + resultingSizes.values()[6]

                  Hope this helps. In case you're not familiar with python: search a course called "Introduction to Programming using Python - Programming Course for Biologists at the pasteur institute". Thanks to chapter 14 I understood what to do ^^

                  By the way - I have a question on my own:

                  In there is the entry background = {...}. Can anyone tell me what this is/what for it is required and how it is calculated?


                  • #10
                    I'm very sorry that I cannot answer your question. Would be interesting to know.

                    Would just like to thank for the superb guide. Worked very well for me!

                    The only thing I would like to add, is that the folder, in the above case "cisRoot + '/A_thalianaTAIR9'" needs to be created manually before.