Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • aurorasea1
    Junior Member
    • May 2016
    • 5

    Running Top-HAT/Bowtie2 index base name & transcript files help

    Hello, I am new to running RNAseq data and I am getting confused about the terminologies used for running the program.
    Right now, I am trying to run Tophat & bowtie2 using the Ugene software's workflow.
    It requires me to enter:
    1. Bowtie index base name
    2. Known transcript file
    3. Raw junctions

    UGENE's software tutorial page is not very detailed in the instructions and so I visited Bowtie's website.
    I found these files available for download
    A) H. sapiens NCBI GRCh38 (ftp://ftp.ncbi.nlm.nih.gov/genomes/a...e_index.tar.gz) - >3.5gb size file
    and also this:
    B) H. sapiens, EMSEMBL GrCH37 (ftp://igenome:[email protected]..._GRCh37.tar.gz) -> more than 18gb size file

    May I know if it's correct to use (A) as index file, and call it GrCH38 as bowtie index base name?

    And is it correct to call (B) the transcript file ?

    As for "raw junctions", where can I find the list of raw junctions?
    Would really appreciate your help.
  • Michael.Ante
    Senior Member
    • Oct 2011
    • 127

    #2
    Hi,

    A) and B) are different releases of the Human gene assembly. So don't mix them.
    If you want to have a bowtie index and its corresponding transcriptome index, download the Fasta files and the GTF from the same source. I'd recommend the ENSEMBL annotation (see fasta files here and the gtf here). There are some discussions ongoing whether to use the primary assembly (all chromosomes) or the toplevel assembly (all chromosomes plus patches and haplotype sequences). For the beginning, I'd start with the primary one.

    After building the index with bowtie2-build (say you name it GRCh38.84), you can create the index for the transcripts with Tophat2.
    Code:
    tophat -G Homo_sapiens.GRCh38.84.gtf --transcriptome-index=transcriptome_data/known GRCh38.84
    Cheers,

    Michael

    Comment

    • aurorasea1
      Junior Member
      • May 2016
      • 5

      #3
      Originally posted by Michael.Ante View Post
      Hi,

      A) and B) are different releases of the Human gene assembly. So don't mix them.
      If you want to have a bowtie index and its corresponding transcriptome index, download the Fasta files and the GTF from the same source. I'd recommend the ENSEMBL annotation (see fasta files here and the gtf here). There are some discussions ongoing whether to use the primary assembly (all chromosomes) or the toplevel assembly (all chromosomes plus patches and haplotype sequences). For the beginning, I'd start with the primary one.

      After building the index with bowtie2-build (say you name it GRCh38.84), you can create the index for the transcripts with Tophat2.
      Code:
      tophat -G Homo_sapiens.GRCh38.84.gtf --transcriptome-index=transcriptome_data/known GRCh38.84
      Cheers,

      Michael
      Hi Michael, thanks.
      I'm running on Ugene but I've got these error results.
      [2016-05-14 16:19:04] Beginning TopHat run (v2.0.9)
      -----------------------------------------------
      [2016-05-14 16:19:04] Checking for Bowtie
      Bowtie version: 2.1.0.0
      [2016-05-14 16:19:04] Checking for Samtools
      Samtools version: 0.1.19.0
      [2016-05-14 16:19:04] Checking for Bowtie index files (genome)..
      Error: Could not find Bowtie 2 index files (/Users/*.bt2)

      Now it's asking for *bt2 file and I'm at loss of what type of file I should be using to run the analysis properly.
      Thank you for all experts here for your useful tips.

      Comment

      • mastal
        Senior Member
        • Mar 2009
        • 666

        #4
        You need to make a bowtie index of the reference genome with the bowtie2-build command before you run tophat. This will produce some 6 files with suffixes like .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev1.bt2 and .rev2.bt2, with the name of the genome as prefix.

        you need to specify the path to the genome index files and the prefix of the genome index files in your tophat command,

        Comment

        • maxter
          Junior Member
          • Apr 2016
          • 2

          #5
          Hi,

          In UGENE, Settings>Preferences>External Tools, yo have to put the path for every program (tophat, bowtie, etc).

          For the index, you can do it first, before running all the workflow in Tools>NGS data analysis>Build index for reads mapping. That work for me, now i'm just trying to figure out how to retrieve all the information of the Tuxedo protocol.

          Regards

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            Yesterday, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 12:03 PM
          0 responses
          19 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, Yesterday, 11:40 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          29 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-26-2026, 10:12 AM
          0 responses
          31 views
          0 reactions
          Last Post SEQadmin2  
          Working...