Header Leaderboard Ad

Collapse

Extracting all microbial sequences from NT

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting all microbial sequences from NT

    Hi all,

    I have been trying to find a way to extract all microbial (and eukaryotic) sequences in the NT database but I am running into a bunch of problems.

    I have tried to download the GI lists for all bacterial entries using the NCBI nucleotide database, but the generated files always time out and fail to download the file completely. Then I thought maybe I could get the GI IDs using blastdbcmd, but that also fails. I tried the following:

    Code:
    blastdbcmd -db nt -entry all -outfmt '%g %T' | awk '{ if ($2 == "2") print $1 }' > ../gi/bacteria.gi
    But that also failed, since the individual entries have their species taxon in the %T field, instead of the domain, etc.

    Then I thought maybe I could get a list of all taxon IDs for bacteria, eukaryota, etc., but that also doesn't appear to exist.

    So in short - does anybody have an idea how I can extract all microbial sequences (to make a custom database) from the NT database? Whatever method works....

    Thanks guys!

  • #2
    Hey

    Not a full solution, but MEGAN provides files which map GIs to taxon IDs for nt and nr via this link: http://ab.inf.uni-tuebingen.de/data/...d/welcome.html

    Hope that helps

    Comment


    • #3
      Easiest method to get taxonomy ids ...
      Just check out this directory: ftp://ftp.ncbi.nih.gov/pub/taxonomy/

      ________________
      If you want bacteria and virsus genome in fasta format files ...

      Check out doucmentation here :
      http://defindit.com/readme_files/ncb...on_format.html
      for NCBI file name extensions.

      You can ftp download data from NCBI here :
      ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/
      Look for the all* files. The ftp://ftp.ncbi.nlm.nih.gov/genomes/B...all.fna.tar.gz file should be all bacterial genomes.

      Virae here : ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/

      "WGS bacteria OLD" is thereabouts, just look around. Draft genomes there abouts, too.

      _____

      Alternate way to get taxon ids for example bacteria ...

      You can get the file "all rpt" file via wget :
      wget ftp://ftp.ncbi.nlm.nih.gov/genomes/B...all.rpt.tar.gz
      Unzip and untar.

      Run the command
      -bash-3.00$ find . -name '*.rpt' -exec grep Taxid {} \; | sort | uniq
      There you go.

      Comment


      • #4
        Wow, thanks so much guys - this was incredible helpful! I got it all covered now

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
          by seqadmin



          Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
          03-21-2023, 01:49 PM
        • seqadmin
          Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
          by seqadmin




          Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
          03-10-2023, 05:31 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 12:26 PM
        0 responses
        7 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-17-2023, 12:32 PM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-15-2023, 12:42 PM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-09-2023, 10:17 AM
        0 responses
        68 views
        1 like
        Last Post seqadmin  
        Working...
        X