I am attempting to access metagenomes referenced in a paper (Thurber et al. 2009) which are housed on NCBI. I have the project numbers and genome project ID's. I found the genbank record of one of the genomes I want (wanting to download one first to run a test analysis, to see if my idea will even work). I can click on the WGS ID link and that takes me to a page where I can click on a "downloads" tab and it has three file formats listed. I download the FASTA file and get this:
ã3ËQ‘˝Àédª≤-ˆ˘Á
B∆€≥#Ä≈{Ÿ"™Ø*\ ›ï$®s?ær⁄É4ìúÛ»\KqŒ^È····∆áŸ∞a∆¡ˇÒø¸ˇ˚À„«ÁÀ◊ÁÀÀˇ/ˇÔˇûˇüø˛_?^~l_/ˇ√ÀˇèÚfl˛ˇ”ˇÚˇÎ˛ˇ”˘œˇ˙fl˛◊ˇ¸¸«ˇı?ˇ∑ˇÂ?ˇÉø˚èˇ˝˛oˇ«˘?ˇÎ¸Ôˇ˘ø˝üˇ˘_ˇ?ˇô2}ïfl_ı˜W˚˝UÌÉf~“_Èa˚'µ¶øQä}≥˛ÊÚo©Ê∑˙_”Áì¸ä¸˛ÉµŸØ*_çˇLŸ^Ì~æ=ë‰Ω˚áØ˙`{√j_˙˚[˛*ÚÜy|•˛fi¸™ÒÜÚ⁄¢O˝É˝ØñÒmo20˙ «'®„'ÙCyœd_Î∆Ä^c˛(}ofl„˜+“ˇhŒX8ØëÖ”
·œî˚,ÍxËá§ü±yc¶ŸÃ\S˘˝ê>¸Ô˝~/25ùVdäh¶ç’2lÛ∂p‡øˇj+ç˛8ˇÕ÷£πw´c˚.…ÃRu#˜„'å‹[hÀ…ßfil¢ïP∂øX∑Å$≥yŸ—¬ßü¸~Ç>s£Å†…˝˝øî…J⁄!€Øn/‹fiåfl0ÀÊêeëe]eyÔ\‰ÔÚͧq∑olƱ‚∂W஺»¯£mi€+ª¡z¿`ΩáΘ͟>¨Ó≤FK•ÚÓîœNˇ≠MMTkËsnØÙ=ÿ∫œê˘ßY~¶¶ÎØ»Á=X›/pÛ߆y”Ó˚ü˝W¶FÛ˘˚ˇìflè?æ`†>b´*ãcÆ44J4G¥˜M¨ÿAkh[<ïVê˘OR˜”flÑ∞ık∫ÌÈù≥â*Ƀ˛´l÷∆ã#Àe9˝Í∏2ÊùkËÜÊ3Ë™ÿËfl˜∑…ê‘qµ∫M›ˆ£flèrÎæJ6ÚN-ˇ˙Ì™∂!.›±f≥;dTºÅ t˘»üMÙΩ
AÙ2›¬µØŒ⁄W†¸!7000_°Åi»Pù'ä&¥X∑Aa∑,ªoõTfi(∫”$‹*FÎ{üÀ}t⁄à≥ÚCfi√Iˆ≤›ÜÇ5ƒˇÁ˚˝4≤D›®º√®<B£AbÑa≤fiÏ/i!îB·*—≤°Õ∑Y˘˚á<®mãḱ–ºÉ}7é]=ÏõÁ∆KŸøï˙k˚+e[⁄y{˚ÏGÂ
FÂgdTZ·OΩm˝òU¶H`ôná™Ö¥V
∏íãflc¯G8ÔQKê¿.:Ω˙Ay˘ÒWúӑζr≈ÎŒ‹Ó∫øwE≤>ófl}ÓàåU>óH)^b)≈ˆâ¯è+ç˝`tJ£S≤¯ò*†¨ fëßÚÊn˘ªmHŸÈH.—AÁ πˆ¿ú’µˆ*y‰/Uùn?C˛IflS‹-lÀ/a∞º€∏NwB:¬ƒìç`wÇŸ©˙›RÁfinº´èèüÅ_B∏nQvŒv%˘± ç≈Ê4≥dNE
ß∫˝põ€>UÇ-˝rn˛ó$©’øCøëÂ9v±„7˙°e 1ÅO?r_B ∑
\%≥–MΩÒ8¡hÔ
G∏
etc.
Now either I'm showing my complete ignorance or this is not in fact anything usable. Why is it giving me complete gibberish?
Ideally I would like to do this all through the terminal, and I tried it there as well using wget with the same results.
This if my first time trying to access sequence data from NCBI, I must be missing something. Any help would be much appreciated, especially tips on doing this cleanly and sexily through CLI.
Thanks!
Tori
ã3ËQ‘˝Àédª≤-ˆ˘Á
B∆€≥#Ä≈{Ÿ"™Ø*\ ›ï$®s?ær⁄É4ìúÛ»\KqŒ^È····∆áŸ∞a∆¡ˇÒø¸ˇ˚À„«ÁÀ◊ÁÀÀˇ/ˇÔˇûˇüø˛_?^~l_/ˇ√ÀˇèÚfl˛ˇ”ˇÚˇÎ˛ˇ”˘œˇ˙fl˛◊ˇ¸¸«ˇı?ˇ∑ˇÂ?ˇÉø˚èˇ˝˛oˇ«˘?ˇÎ¸Ôˇ˘ø˝üˇ˘_ˇ?ˇô2}ïfl_ı˜W˚˝UÌÉf~“_Èa˚'µ¶øQä}≥˛ÊÚo©Ê∑˙_”Áì¸ä¸˛ÉµŸØ*_çˇLŸ^Ì~æ=ë‰Ω˚áØ˙`{√j_˙˚[˛*ÚÜy|•˛fi¸™ÒÜÚ⁄¢O˝É˝ØñÒmo20˙ «'®„'ÙCyœd_Î∆Ä^c˛(}ofl„˜+“ˇhŒX8ØëÖ”
·œî˚,ÍxËá§ü±yc¶ŸÃ\S˘˝ê>¸Ô˝~/25ùVdäh¶ç’2lÛ∂p‡øˇj+ç˛8ˇÕ÷£πw´c˚.…ÃRu#˜„'å‹[hÀ…ßfil¢ïP∂øX∑Å$≥yŸ—¬ßü¸~Ç>s£Å†…˝˝øî…J⁄!€Øn/‹fiåfl0ÀÊêeëe]eyÔ\‰ÔÚͧq∑olƱ‚∂W஺»¯£mi€+ª¡z¿`ΩáΘ͟>¨Ó≤FK•ÚÓîœNˇ≠MMTkËsnØÙ=ÿ∫œê˘ßY~¶¶ÎØ»Á=X›/pÛ߆y”Ó˚ü˝W¶FÛ˘˚ˇìflè?æ`†>b´*ãcÆ44J4G¥˜M¨ÿAkh[<ïVê˘OR˜”flÑ∞ık∫ÌÈù≥â*Ƀ˛´l÷∆ã#Àe9˝Í∏2ÊùkËÜÊ3Ë™ÿËfl˜∑…ê‘qµ∫M›ˆ£flèrÎæJ6ÚN-ˇ˙Ì™∂!.›±f≥;dTºÅ t˘»üMÙΩ
AÙ2›¬µØŒ⁄W†¸!7000_°Åi»Pù'ä&¥X∑Aa∑,ªoõTfi(∫”$‹*FÎ{üÀ}t⁄à≥ÚCfi√Iˆ≤›ÜÇ5ƒˇÁ˚˝4≤D›®º√®<B£AbÑa≤fiÏ/i!îB·*—≤°Õ∑Y˘˚á<®mãḱ–ºÉ}7é]=ÏõÁ∆KŸøï˙k˚+e[⁄y{˚ÏGÂ
FÂgdTZ·OΩm˝òU¶H`ôná™Ö¥V
∏íãflc¯G8ÔQKê¿.:Ω˙Ay˘ÒWúӑζr≈ÎŒ‹Ó∫øwE≤>ófl}ÓàåU>óH)^b)≈ˆâ¯è+ç˝`tJ£S≤¯ò*†¨ fëßÚÊn˘ªmHŸÈH.—AÁ πˆ¿ú’µˆ*y‰/Uùn?C˛IflS‹-lÀ/a∞º€∏NwB:¬ƒìç`wÇŸ©˙›RÁfinº´èèüÅ_B∏nQvŒv%˘± ç≈Ê4≥dNE
ß∫˝põ€>UÇ-˝rn˛ó$©’øCøëÂ9v±„7˙°e 1ÅO?r_B ∑
\%≥–MΩÒ8¡hÔ
G∏
etc.
Now either I'm showing my complete ignorance or this is not in fact anything usable. Why is it giving me complete gibberish?
Ideally I would like to do this all through the terminal, and I tried it there as well using wget with the same results.
This if my first time trying to access sequence data from NCBI, I must be missing something. Any help would be much appreciated, especially tips on doing this cleanly and sexily through CLI.
Thanks!
Tori
Comment