Seqanswers Leaderboard Ad

**dpryan** · 09-11-2013, 06:51 AM

Do you plan to have the hg18/hg19 genomes locally available where you're deploying the package? If so, you can use Rsamtools to load just the relevant region for processing.

**RyanLCollins** · 09-11-2013, 06:56 AM

Originally posted by dpryan View Post

Do you plan to have the hg18/hg19 genomes locally available where you're deploying the package? If so, you can use Rsamtools to load just the relevant region for processing.

Hi dpryan, thanks for the prompt reply! At present, we are planning on using RMySQL to access the UCSC MySQL database (which has all tables associated with hg18/hg19), but we don't plan on having the entire genomes locally available.

Thanks for the suggestion though, I'll look into it further!

Having never used Rsamtools before, would it be possible to source hg18/hg19 if were were to place them on a secure server? Or do the genomes both have to be strictly local?

For further info, we are planning on distributing this package amongst roughly one dozen bioinformaticians in our group, all of whom will have access to a central cluster, but who will all be working from different local machines.

Thanks again!

**GenoMax** · 09-11-2013, 07:07 AM

UCSC limits programmatic access to their services (based on number of access attempts from IP block/time). https://genome.ucsc.edu/goldenPath/help/mysql.html

If several people are going to query the database it may be more useful to have the data locally. You can find the database dumps for hg19 here: ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/ (look for others elsewhere on the same ftp server)

**dpryan** · 09-11-2013, 07:09 AM

Well, I believe that it needs to be available from the local file system, though that doesn't preclude just mounting a remote drive (we have a group drive available via smb/cifs and nfs). If you're running this on a cluster, then copying the files to one of the mountpoints available to each node might prove easiest (I do this with genome indices for alignments, though each node also has access to a filesystem that's also mounted on my desktop).

**RyanLCollins** · 09-11-2013, 08:36 AM

Thank you both for the replies!

@GenoMax: Thank you for the heads up! I was unaware of the access limits per IP block. I'll ask around our group to estimate our expected requirements and go from there.

@dpryan: Hmm ok, thank you for the suggestion. I think ideally I would prefer to find a work around, although we have the capabilities to go that route if necessary. Ideally I'd like to keep this package running locally on our analyst's local machines, although if necessary we could run it on a cluster.

**RyanLCollins** · 09-12-2013, 03:34 PM

Hello all,

I believe I have found the solution to my problem in the package "DASiR". It allows sequence retrieval from DAS servers (including UCSC, of course).

If others are interested in tackling a similar problem with R, you can find the details regarding DASiR here:
http://www.bioconductor.org/packages...tml/DASiR.html

Thanks for the help,
Ryan

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, Yesterday, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 24 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 29 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 215 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

R human genomic sequence acquisition?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News