Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Do you have access to nt blast database and blast+ programs? You can easily write a loop to extract the sequences using the following command (let me know if you need more details).
Code:$ blastdbcmd -entry JN713152 -db /path_to/nt -outfmt '%f'
Code:for i in {713151..713566}; do blastdbcmd -entry JN$i -db /path_to/nt -outfmt '%f' >> filename.fa; done
Last edited by GenoMax; 07-22-2015, 07:57 AM.
Comment
-
Hi GenoMax,
Thanks for the tip. I got the db and it works -- got the fasta sequences! Now I have another question on how to retrieve the information under 'organism' tag. For example,
in this JN713151, I would also like to get the bacterial lineage (in red) for each query id. I tried many specifiers in the -help document, none has worked so far. Any thoughts?
John
LOCUS JN713151 1526 bp DNA linear ENV 09-MAY-2012
DEFINITION Filifactor alocis canine oral taxon 001 clone OB017 16S ribosomal
RNA gene, partial sequence.
ACCESSION JN713151
VERSION JN713151.1 GI:373279114
KEYWORDS ENV.
SOURCE Filifactor alocis
ORGANISM Filifactor alocis
Bacteria; Firmicutes; Clostridia; Clostridiales;
Peptostreptococcaceae; Filifactor.
Comment
-
You won't find that information in the blast database. Here is one way (I am sure there are others):
Save the following code in a file (e.g. retr.sh).
Code:j=713151; while [ $j -le 713567 ] do num=`printf "JN%06d" $j`; curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${num}&rettype=genbank" j=$((j+1)) done
Code:$ chmod u+x retr.sh
Code:$ ./retr.sh > genbank_records
Code:$ grep -e JN -e OrgName_lineage genbank_records | sed 's/<Textseq-id_accession>//' | sed 's/<\/Textseq-id_accession>//' | sed 's/<OrgName_lineage>//' | sed 's/<\/OrgName_lineage>//' > id_you_need
Last edited by GenoMax; 07-28-2015, 04:57 PM.
Comment
-
Hi GenoMax,
thank you very much for your help and code. It works very nicely!
Just curious -- when the web download was extracted for locus id and lineage using grep, I found the locus id went beyond 713567, to all the way 713709 (in red below). Interestingly, those beyond 713566 were all human HIV virus lineage, not bacterial. I thought we only download
while [ $j -le 713567 ]? But anyway, it is easily cleaned up, not an issue, just wonder.
John
***
JN713566
Lachnospiraceae bacterium canine oral taxon 399 clone 1K033 16S ribosomal RNA gene, partial sequence</Seqdesc_title>
Bacteria; Firmicutes; Clostridia; Clostridiales; Lachnospiraceae
Human immunodeficiency virus 1 pol protein (pol) gene, partial cds.</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713567
HIV-1 isolate HIV_PRRT_PJ01967_1 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713568
HIV-1 isolate HIV_PRRT_PJ01967_2 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713569
HIV-1 isolate HIV_PRRT_PJ01967_3 from Dominican Republic pol protein (pol) gene, partial cds.</Seqdesc_title>
pol protein [Human immunodeficiency virus 1]</Seqdesc_title>
Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Lentivirus; Primate lentivirus group
JN713570
Comment
Latest Articles
Collapse
-
by seqadmin
The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...-
Channel: Articles
07-08-2024, 03:19 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:46 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:46 AM
|
||
Started by seqadmin, 07-24-2024, 11:09 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
07-24-2024, 11:09 AM
|
||
Started by seqadmin, 07-19-2024, 07:20 AM
|
0 responses
160 views
0 likes
|
Last Post
by seqadmin
07-19-2024, 07:20 AM
|
||
Started by seqadmin, 07-16-2024, 05:49 AM
|
0 responses
127 views
0 likes
|
Last Post
by seqadmin
07-16-2024, 05:49 AM
|
Comment