Seqanswers Leaderboard Ad

**dpryan** · 09-11-2013, 06:51 AM

Do you plan to have the hg18/hg19 genomes locally available where you're deploying the package? If so, you can use Rsamtools to load just the relevant region for processing.

**RyanLCollins** · 09-11-2013, 06:56 AM

Originally posted by dpryan View Post

Do you plan to have the hg18/hg19 genomes locally available where you're deploying the package? If so, you can use Rsamtools to load just the relevant region for processing.

Hi dpryan, thanks for the prompt reply! At present, we are planning on using RMySQL to access the UCSC MySQL database (which has all tables associated with hg18/hg19), but we don't plan on having the entire genomes locally available.

Thanks for the suggestion though, I'll look into it further!

Having never used Rsamtools before, would it be possible to source hg18/hg19 if were were to place them on a secure server? Or do the genomes both have to be strictly local?

For further info, we are planning on distributing this package amongst roughly one dozen bioinformaticians in our group, all of whom will have access to a central cluster, but who will all be working from different local machines.

Thanks again!

**GenoMax** · 09-11-2013, 07:07 AM

UCSC limits programmatic access to their services (based on number of access attempts from IP block/time). https://genome.ucsc.edu/goldenPath/help/mysql.html

If several people are going to query the database it may be more useful to have the data locally. You can find the database dumps for hg19 here: ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ (look for others elsewhere on the same ftp server)

**dpryan** · 09-11-2013, 07:09 AM

Well, I believe that it needs to be available from the local file system, though that doesn't preclude just mounting a remote drive (we have a group drive available via smb/cifs and nfs). If you're running this on a cluster, then copying the files to one of the mountpoints available to each node might prove easiest (I do this with genome indices for alignments, though each node also has access to a filesystem that's also mounted on my desktop).

**RyanLCollins** · 09-11-2013, 08:36 AM

Thank you both for the replies!

@GenoMax: Thank you for the heads up! I was unaware of the access limits per IP block. I'll ask around our group to estimate our expected requirements and go from there.

@dpryan: Hmm ok, thank you for the suggestion. I think ideally I would prefer to find a work around, although we have the capabilities to go that route if necessary. Ideally I'd like to keep this package running locally on our analyst's local machines, although if necessary we could run it on a cluster.

**RyanLCollins** · 09-12-2013, 03:34 PM

Hello all,

I believe I have found the solution to my problem in the package "DASiR". It allows sequence retrieval from DAS servers (including UCSC, of course).

If others are interested in tackling a similar problem with R, you can find the details regarding DASiR here:
http://www.bioconductor.org/packages...tml/DASiR.html

Thanks for the help,
Ryan

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 18 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

R human genomic sequence acquisition?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News