Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ctseto
    replied
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    If you just need to retrieve known regions.

    Leave a comment:


  • gringer
    replied
    Although bowtie index essentially keeps the genome, I doubt it is optimized or designed for your purpose.
    The bowtie index is optimised for searching, but it's an overkill (and inefficient) for getting subsequences. If you want compressed indexed storage for just DNA sequence retrieval, then the 2bit format is probably best:



    The code points to a way to retrieve ranges:


    Code:
    /* Parse a .2bit file and sequence spec into an object.
     * The spec is a string in the form:
     *
     *    file/path/input.2bit[:seqSpec1][,seqSpec2,...]
     *
     * where seqSpec is either
     *     seqName
     *  or
     *     seqName:start-end
    So there's probably a program somewhere for getting subsequences out of that file using seqName:start-end notation.

    edit: indeed, BLAT has such functions included. See here for a bit of discussion about 2bit retrieval using Perl:

    Last edited by gringer; 10-31-2013, 03:43 PM.

    Leave a comment:


  • shawn.mek
    replied
    yeah, I'm torn on holding it in memory or not. Toy with different workflows

    Leave a comment:


  • dpryan
    replied
    If you really have a LOT of positions, then it's best to read the genome into memory. samtools faidx is great for a smallish number of sites, but it grabs the sequence from disk, making it a bit slow for a large number of queries.

    Leave a comment:


  • shawn.mek
    replied
    I want to retrieve lots of regions efficiently, but thanks for pointing me to faidx, I'll see how it works.

    Leave a comment:


  • lh3
    replied
    Although bowtie index essentially keeps the genome, I doubt it is optimized or designed for your purpose. Use faidx if you only want to retrieve a few regions.

    Leave a comment:


  • shawn.mek
    replied
    The bowtie-inspect thing does get all the info out, but thats 3gb of info since I can't select a location

    Leave a comment:


  • shawn.mek
    replied
    Just to clarify, I mean using the index - giving it a chromosome name (fasta header) and location numbers, and getting back a sequence.

    I don't want to run an alignment, just pull out the sequence. So no SAM output.

    For this I'm using bowtie, not bowtie2. But of bowtie2 can do this...

    Thanks

    Leave a comment:


  • winsettz
    replied
    Originally posted by shawn.mek View Post
    We have the fasta files (obviously) for the hg19 genome, we used them to create a big Bowtie index.

    I was hoping not to have to keep the fasta file. Instead just look up sequences in the Bowtie index when I get chromosome locations.

    I know when the alignment comes back it tells me where the alignment occurs and which fasta record (header) that it came from. So all the info is there, but I can't figure out how to pull out a sequence given a location.

    Does anyone know if this is possible, or know much about the index format (perhaps I could write a little program to fish out a sequence)?


    Thanks
    You should be able to extract that information from the sam output. I've not used bowtie2-inspect before, but it could be what you are looking for.

    Code:
    bowtie2-inspect
    No index name given!
    Bowtie 2 version 2.1.0 by Ben Langmead ([email protected], www.cs.jhu.edu/~langmea)
    Usage: bowtie2-inspect [options]* <bt2_base>
      <bt2_base>         bt2 filename minus trailing .1.bt2/.2.bt2
    
      By default, prints FASTA records of the indexed nucleotide sequences to
      standard out.  With -n, just prints names.  With -s, just prints a summary of
      the index parameters and sequences.  With -e, preserves colors if applicable.
    
    Options:
      -a/--across <int>  Number of characters across in FASTA output (default: 60)
      -n/--names         Print reference sequence names only
      -s/--summary       Print summary incl. ref names, lengths, index properties
      -e/--bt2-ref      Reconstruct reference from .bt2 (slow, preserves colors)
      -v/--verbose       Verbose output (for debugging)
      -h/--help          print detailed description of tool and its options
      --help             print this usage message

    Leave a comment:


  • Use Bowtie Index to get sequences using locations

    We have the fasta files (obviously) for the hg19 genome, we used them to create a big Bowtie index.

    I was hoping not to have to keep the fasta file. Instead just look up sequences in the Bowtie index when I get chromosome locations.

    I know when the alignment comes back it tells me where the alignment occurs and which fasta record (header) that it came from. So all the info is there, but I can't figure out how to pull out a sequence given a location.

    Does anyone know if this is possible, or know much about the index format (perhaps I could write a little program to fish out a sequence)?


    Thanks

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Today, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
37 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
41 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X