Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cerebralrust
    Junior Member
    • Jan 2012
    • 8

    Differential gene expression analysis without reference

    I have to find the differential gene expression between two genotypes of a plant species sequenced using 454 pyrosequencer.

    There is no reference genome and the closest species -glycine max aligns poorly with the reads.

    How does one go about DE analysis in this case?

    Should i combine the reads of both genotypes, assemble them, use that as a reference genome.Then map the reads of each genotype to this reference and continue the analysis?

    Thank you.
  • phoss
    Member
    • Aug 2011
    • 12

    #2
    Hi cerebralrust,

    I was curious if you've tried M. truncatula? This a close relative of G. max.
    Have you had any luck with genomes off of phytozome?
    Last edited by phoss; 05-03-2012, 05:11 PM.

    Comment

    • sdriscoll
      I like code
      • Sep 2009
      • 436

      #3
      You might try one of the "no genome" assemblers like Trinity or Abyss to build a "gene" library from your data. I think those put together some consensus set of sequences assembled from your reads. The you could build a bowtie reference from those FASTA sequences and align your reads to it with bowtie. Finally you can count reads aligned to each one and compare samples using something like DESeq.

      You'll need some major computer power to run Trinity, from what I hear. That process of assembling sequences from reads is much more reasource consuming than the bowtie alignment stage.
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment

      • jujubix
        Member
        • May 2011
        • 14

        #4
        De novo assembly, as sdriscoll mentioned, is the typical solution when no decent reference genome exists.

        Given that you're dealing with gene expression, I assume you have transcriptome reads, in which case you could look into Trans-ABySS, which is the transcriptome-specific version of ABySS. It is a single software pipeline that aims to assemble reads into transcripts and quantify transcript abundance, all without a reference genome. In theory you would end up with two sets of transcripts and expression levels, after which standard DE analysis could be conducted. Although finding corresponding transcripts between the two sets could be tricky...

        Software link is here and paper is here
        Last edited by jujubix; 05-03-2012, 01:26 PM.

        Comment

        • sdriscoll
          I like code
          • Sep 2009
          • 436

          #5
          indeed. do you think that one would have to engage in a massive pairwise BLAST session between assemblies in order to match them up?

          Maybe, for that reason, it would be easiest to pool all reads into a massive FASTQ and run them through ABySS at once to get a master list of transcripts and then perform quantification through other means.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment

          • jujubix
            Member
            • May 2011
            • 14

            #6
            Yeah, at this point building a common reference via assembly is looking mighty tempting. This of course, assuming cerebralrust has the major computer power to run everything

            Comment

            • sdriscoll
              I like code
              • Sep 2009
              • 436

              #7
              yeah. i'd be a little nervous to try it myself. but that's why i have more than one computer.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              • cerebralrust
                Junior Member
                • Jan 2012
                • 8

                #8
                Hello members. Thank you for your valuable advice.

                I've run assemblies on my data using Trinity,Newbler,MIRA,velvet on my HP laptop which has 4GB RAM and i3 processsor. About 800k reads with both genotypes pooled together.
                No, i have not tried M.truncatula, phoss.I will,thanks!
                I've pooled all the reads and assembled using MIRA + CAP3.Trinity, although a really good assembler is quite bad for plant genomes.(poor annotation, poor N50 etc)
                Yes it is transcriptome. Now i suppose i will map the reads back to this 'reference', quantify and continue with the analyses.

                Thanks for the Abyss suggestion and paper, jujubix & sdriscoll. I will try it out.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 08:59 AM
                0 responses
                10 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                17 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...