Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blancha
    replied
    @GenoMax,@kmcarr

    Thank you both for your help.

    I've downloaded Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz, which excludes haplotypes and patches.
    bowtie2-build built the smaller bt2 index files on this file.

    Since I was interested in novel transcript discovery in addition to gene expression quantification, I wanted to use the most complete genome version available, so I was using Homo_sapiens.GRCh38.dna.toplevel.fa.gz. In hindsight, Homo_sapiens.GRCh37.dna.primary_assembly.fa was probably more appropriate.

    The following description of the files says GRCh37, but it was downloaded from the GRCh38 directory on the Ensembl FTP site.
    Code:
    ---------
    TOPLEVEL
    ---------
    These files contains all sequence regions flagged as toplevel in an Ensembl
    schema. This includes chromsomes, regions not assembled into chromosomes and
    N padded haplotype/patch regions.
    
    EXAMPLES
    
      Toplevel sequences unmasked:
        Homo_sapiens.GRCh37.dna.toplevel.fa.gz
      
      Toplevel soft/hard masked sequences:
        Homo_sapiens.GRCh37.dna_sm.toplevel.fa.gz
        Homo_sapiens.GRCh37.dna_rm.toplevel.fa.gz
    
    -----------------
    PRIMARY ASSEMBLY
    -----------------
    Primary assembly contains all toplevel sequence regions excluding haplotypes
    and patches. This file is best used for performing sequence similarity searches
    where patch and haplotype sequences would confuse analysis.   
    
    EXAMPLES
    
      Primary assembly sequences unmasked:
        Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
      
      Primary assembly soft/hard masked sequences:
        Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa.gz
        Homo_sapiens.GRCh37.dna_rm.primary_assembly.fa.gz
    Last edited by blancha; 10-07-2014, 09:35 AM.

    Leave a comment:


  • GenoMax
    replied
    Homo_sapiens.GRCh38.dna.toplevel.fa from ensembl is 36G in size. It appears to contain alternate haplotypes for a number of locations/scaffolds in addition to the chromosomes. No wonder bowtie2 is building long indexes.

    I am going to see if I can find a link for just the chromosomes.

    Leave a comment:


  • kmcarr
    replied
    Originally posted by blancha View Post
    How do I generate the smaller (32-bit) index files?

    There is no option in bowtie2-build.

    I used the following simple command to generate the index files.

    Code:
    bowtie2-build Homo_sapiens.GRCh38.dna.toplevel.fa Homo_sapiens.GRCh38.dna.toplevel \
    &> bowtie2_build.sh.log
    Code:
    Bowtie 2 version 2.2.3 by Ben Langmead ([email protected], www.cs.jhu.edu/~langmea)
    Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
        reference_in            comma-separated list of files with ref sequences
        bt2_index_base          write bt2 data to files with this dir/basename
    *** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
    Options:
        -f                      reference files are Fasta (default)
        -c                      reference sequences given on cmd line (as
                                <reference_in>)
        --large-index           force generated index to be 'large', even if ref
                                has fewer than 4 billion nucleotides
        -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
        -p/--packed             use packed strings internally; slower, less memory
        --bmax <int>            max bucket sz for blockwise suffix-array builder
        --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
        --dcv <int>             diff-cover period for blockwise (default: 1024)
        --nodc                  disable diff-cover (algorithm becomes quadratic)
        -r/--noref              don't build .3/.4 index files
        -3/--justref            just build .3/.4 index files
        -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
        -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
        --seed <int>            seed for random number generator
        -q/--quiet              verbose output (for debugging)
        -h/--help               print detailed description of tool and its options
        --usage                 print this usage message
        --version               print version information and quit
    bowtie2-build is really just a small wrapper script which then calls either bowtie2-build-s ('small' genomes) or bowtie2-build-l ('large'). While not recommended you could try directly using bowtie2-build-s, e.g.
    Code:
    bowtie2-build-s Homo_sapiens.GRCh38.dna.toplevel.fa Homo_sapiens.GRCh38.dna.toplevel \
    &> bowtie2_build.sh.log
    I do not know if this will work for GRCh38.

    Leave a comment:


  • blancha
    replied
    How do I generate the smaller (32-bit) index files?

    There is no option in bowtie2-build.

    I used the following simple command to generate the index files.

    Code:
    bowtie2-build Homo_sapiens.GRCh38.dna.toplevel.fa Homo_sapiens.GRCh38.dna.toplevel \
    &> bowtie2_build.sh.log
    Code:
    Bowtie 2 version 2.2.3 by Ben Langmead ([email protected], www.cs.jhu.edu/~langmea)
    Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
        reference_in            comma-separated list of files with ref sequences
        bt2_index_base          write bt2 data to files with this dir/basename
    *** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
    Options:
        -f                      reference files are Fasta (default)
        -c                      reference sequences given on cmd line (as
                                <reference_in>)
        --large-index           force generated index to be 'large', even if ref
                                has fewer than 4 billion nucleotides
        -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
        -p/--packed             use packed strings internally; slower, less memory
        --bmax <int>            max bucket sz for blockwise suffix-array builder
        --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
        --dcv <int>             diff-cover period for blockwise (default: 1024)
        --nodc                  disable diff-cover (algorithm becomes quadratic)
        -r/--noref              don't build .3/.4 index files
        -3/--justref            just build .3/.4 index files
        -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
        -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
        --seed <int>            seed for random number generator
        -q/--quiet              verbose output (for debugging)
        -h/--help               print detailed description of tool and its options
        --usage                 print this usage message
        --version               print version information and quit
    Last edited by blancha; 10-07-2014, 08:07 AM.

    Leave a comment:


  • GenoMax
    replied
    Though it has not been said explicitly on TopHat web page (last time this was mentioned was for v. 2.0.11) it is still likely that TopHat does not support 64-bit bowtie2 indexes. I think that is what you have generated.

    According to the manual bowtie2-build should generate normal indexes (if the reference is < 4 gigabases). Not sure why you are getting large indexes.
    Last edited by GenoMax; 10-07-2014, 07:58 AM.

    Leave a comment:


  • blancha
    started a topic How to run Tophat2 with GRCh38?

    How to run Tophat2 with GRCh38?

    Hi,

    This is a very simple question that I'm hopeful someone has resolved already.

    How does one run Tophat2 with GRCh38?

    I've downloaded the reference genome from Ensembl.
    I've indexed the reference genome with bowtie2-build.

    The problem is that bowtie2-build generates large index files with the extension bt2l that are not recognized by TopHat.

    What should I do?
    Would an older version of Bowtie2 allow me to generate bt2 files?

    Someone must have resolved this problem.
    iGenomes does not yet provide indexes for GRCh38.
    I'm happy with Tophat, and don't want to switch to STAR, although I find this issue annoying and perplexing.

    The problem has been reported in the Tuxedo user group, but no solution has been provided.


    TopHat v2.0.12
    Bowtie2 version 2.2.3

    Error: Could not find Bowtie 2 index files (/stockage/genomes/Homo_sapiens/Ensembl/GRCh38/Sequence/Bowtie2Index/Homo_sapiens.GRCh38.dna.toplevel.*.bt2)

    Thank you for your help.

Latest Articles

Collapse

  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
    02-26-2024, 02:07 PM
  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-28-2024, 06:12 AM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-23-2024, 04:11 PM
0 responses
74 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
81 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X