Header Leaderboard Ad

Collapse

Extracting all microbial sequences from NT

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • kga1978
    replied
    Wow, thanks so much guys - this was incredible helpful! I got it all covered now

    Leave a comment:


  • Richard Finney
    replied
    Easiest method to get taxonomy ids ...
    Just check out this directory: ftp://ftp.ncbi.nih.gov/pub/taxonomy/

    ________________
    If you want bacteria and virsus genome in fasta format files ...

    Check out doucmentation here :
    http://defindit.com/readme_files/ncb...on_format.html
    for NCBI file name extensions.

    You can ftp download data from NCBI here :
    ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
    Look for the all* files. The ftp://ftp.ncbi.nlm.nih.gov/genomes/B...all.fna.tar.gz file should be all bacterial genomes.

    Virae here : ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/

    "WGS bacteria OLD" is thereabouts, just look around. Draft genomes there abouts, too.

    _____

    Alternate way to get taxon ids for example bacteria ...

    You can get the file "all rpt" file via wget :
    wget ftp://ftp.ncbi.nlm.nih.gov/genomes/B...all.rpt.tar.gz
    Unzip and untar.

    Run the command
    -bash-3.00$ find . -name '*.rpt' -exec grep Taxid {} \; | sort | uniq
    There you go.

    Leave a comment:


  • nickloman
    replied
    Hey

    Not a full solution, but MEGAN provides files which map GIs to taxon IDs for nt and nr via this link: http://ab.inf.uni-tuebingen.de/data/...d/welcome.html

    Hope that helps

    Leave a comment:


  • kga1978
    started a topic Extracting all microbial sequences from NT

    Extracting all microbial sequences from NT

    Hi all,

    I have been trying to find a way to extract all microbial (and eukaryotic) sequences in the NT database but I am running into a bunch of problems.

    I have tried to download the GI lists for all bacterial entries using the NCBI nucleotide database, but the generated files always time out and fail to download the file completely. Then I thought maybe I could get the GI IDs using blastdbcmd, but that also fails. I tried the following:

    Code:
    blastdbcmd -db nt -entry all -outfmt '%g %T' | awk '{ if ($2 == "2") print $1 }' > ../gi/bacteria.gi
    But that also failed, since the individual entries have their species taxon in the %T field, instead of the domain, etc.

    Then I thought maybe I could get a list of all taxon IDs for bacteria, eukaryota, etc., but that also doesn't appear to exist.

    So in short - does anybody have an idea how I can extract all microbial sequences (to make a custom database) from the NT database? Whatever method works....

    Thanks guys!

Latest Articles

Collapse

  • seqadmin
    Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
    by seqadmin



    Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
    03-21-2023, 01:49 PM
  • seqadmin
    Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
    by seqadmin




    Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
    03-10-2023, 05:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 12:26 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-17-2023, 12:32 PM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-15-2023, 12:42 PM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-09-2023, 10:17 AM
0 responses
68 views
1 like
Last Post seqadmin  
Working...
X