Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Do you have access to nt blast database and blast+ programs? You can easily write a loop to extract the sequences using the following command (let me know if you need more details).
Code:$ blastdbcmd -entry JN713152 -db /path_to/nt -outfmt '%f'
Code:for i in {713151..713566}; do blastdbcmd -entry JN$i -db /path_to/nt -outfmt '%f' >> filename.fa; done
Last edited by GenoMax; 07-22-2015, 07:57 AM.
Comment
-
Hi GenoMax,
Thanks for the tip. I got the db and it works -- got the fasta sequences! Now I have another question on how to retrieve the information under 'organism' tag. For example,
in this JN713151, I would also like to get the bacterial lineage (in red) for each query id. I tried many specifiers in the -help document, none has worked so far. Any thoughts?
John
LOCUS JN713151 1526 bp DNA linear ENV 09-MAY-2012
DEFINITION Filifactor alocis canine oral taxon 001 clone OB017 16S ribosomal
RNA gene, partial sequence.
ACCESSION JN713151
VERSION JN713151.1 GI:373279114
KEYWORDS ENV.
SOURCE Filifactor alocis
ORGANISM Filifactor alocis
Bacteria; Firmicutes; Clostridia; Clostridiales;
Peptostreptococcaceae; Filifactor.
Comment
-
You won't find that information in the blast database. Here is one way (I am sure there are others):
Save the following code in a file (e.g. retr.sh).
Code:j=713151; while [ $j -le 713567 ] do num=`printf "JN%06d" $j`; curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${num}&rettype=genbank" j=$((j+1)) done
Code:$ chmod u+x retr.sh
Code:$ ./retr.sh > genbank_records
Code:$ grep -e JN -e OrgName_lineage genbank_records | sed 's/<Textseq-id_accession>//' | sed 's/<\/Textseq-id_accession>//' | sed 's/<OrgName_lineage>//' | sed 's/<\/OrgName_lineage>//' > id_you_need
Last edited by GenoMax; 07-28-2015, 04:57 PM.
Comment
-
Hi GenoMax,
thank you very much for your help and code. It works very nicely!
Just curious -- when the web download was extracted for locus id and lineage using grep, I found the locus id went beyond 713567, to all the way 713709 (in red below). Interestingly, those beyond 713566 were all human HIV virus lineage, not bacterial. I thought we only download
while [ $j -le 713567 ]? But anyway, it is easily cleaned up, not an issue, just wonder.
John
***
JN713566
Lachnospiraceae bacterium canine oral taxon 399 clone 1K033 16S ribosomal RNA gene, partial sequence</Seqdesc_title>
Bacteria; Firmicutes; Clostridia; Clostridiales; Lachnospiraceae
Human immunodeficiency virus 1 pol protein (pol) gene, partial cds.</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713567
HIV-1 isolate HIV_PRRT_PJ01967_1 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713568
HIV-1 isolate HIV_PRRT_PJ01967_2 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713569
HIV-1 isolate HIV_PRRT_PJ01967_3 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713570
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben MartÃnez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
Yesterday, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
Comment