For international database collaborations (INSCD)
Quoting directly from the ENA web page:
"The European Nucleotide Archive (ENA) accepts data generated by next-generation sequencing methodologies such as 454, Illumina Genome Analyzer and ABI SOLiD into the Sequence Read Archive (SRA). ENA works in close collaboration with the NCBI and DDBJ as part of the International Nucleotide Sequence Database Collaboration (INSDC). All submitted public data is exchanged between the partners on a daily basis. All three partners use the same data and metadata formats.
For all questions and enquiries please contact [email protected]."
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
The ENA will continue to accept open access data and the EGA will continue to accept human data with consent agreements and Data Access committees, lets hope these changes don't stop people releasing data into the public domain, certainly all the 1000 genomes data will remain freely available for everyone
Leave a comment:
-
Can DAS or another off-the-shelf system address the concern for security? The US government sponsored research has some weighty Patient Privacy restrictions. I'm not sure that dishing up Bams via FTP by just "setting up something as simple as installing apache" is going to work. I'd like it to, but I'm thinking someone's going to say "no go". I hope there's an easing of the "must lock down data. only high priests than even think about looking at the data" mentality. But ... we're not there yet.Last edited by Richard Finney; 02-19-2011, 10:47 AM.
Leave a comment:
-
Originally posted by Joann View PostThis is where I wish use of our institutional libraries would come to mind....
For those "centralists" in the group... In theory one can pull data off of one DAS server and host it on your own. Thus a centralized DAS server could be built that is continuously updating it's repository from other DAS servers. This kind of defeats the purpose of a Distributed Annotation System though.
Large groups (Institutions, Universities, Journals, NIH (if they can fund it)) would be the best final repositories for genomic data. The SRA was swamped almost from the start. I think the only way to keep up with the deluge is to distribute the data.
Leave a comment:
-
central solution
This is where I wish use of our institutional libraries would come to mind. They have and maintain the long term academic infrastructure and well understand inter-institutional standards (also publishing standards). Why can't institutional biological repositories hosting for open access scientific research purposes include next gen sequencing data? If a set of deposit and access standards were worked out and agreed upon (as is now being discussed in this thread) (and that could be the beneficiary of a collaborative grant proposal for start-up funding) library consortia could be forged and linked-up to accomplish specialized sequence data deposit/access, large and small. This is truly the kind of non-profit academic research purpose that is for the good and advancement of society and science. It enhances and builds upon using our existing, traditional academic resources.
Leave a comment:
-
Originally posted by Nix View PostThe problem with both of these approaches is the lack of a formal language to associate annotation with the data. Species, genome version, file format are often anyone's guess. Every group encodes this information differently. Thus making use of such data requires manual download and interpretation.
Don't believe me? Try intersecting chIP-seq peaks calls from 20 different datasets hosted by UCSC, Ensembl, and GMOD. That's at minimum 1 day of work by a skilled bioinformatician. Now do it again and again with each new release. Analysis applications that speak DAS can do this in minutes with no human error.
DAS provides the "glue language" (in XML) for brokering queries and returning slices of data based on coordinates or feature searches. It is defined by a community vetted specification with several server and client implementations. Among other things these define how you represent each species, their genome, and their coordinate system.
Another key advantage of DAS is that allows one to separate data distribution from data use. I love the UCSC data tracks but can't always use their browser to analyze them.
I think we'd be lucky if labs would even provide you with more than a list of variants if they were forced to host data themselves. That's the whole reason for having a centralized host. Most people these days know how to deal with FTP or HTTP, so that's what we'd end up with
SRA was a good idea, but with clunky implementation. A group, government or academic, ought to pick up the mantle and host the world's genomic data. Hey, it could be you. Host it all on a DAS/2 server.
Leave a comment:
-
The trouble with any none centralised solution is consistency. Databases like Ensembl and UCSC have been around for a long time as have the sequence archives (and long may they continue) how many labs have the resources to put up their data pretty much forever I would suspect the answer to that question is very few
Leave a comment:
-
Originally posted by lh3 View PostIn general, hosting data with HTTP/FTP is much more convenient to most researchers. If we want to look at data in small regions, we can use IGV/UCSC to view remote bam/bigbed/bigwig files. IGV also supports tabix indexing and thus VCF files.
Don't believe me? Try intersecting chIP-seq peaks calls from 20 different datasets hosted by UCSC, Ensembl, and GMOD. That's at minimum 1 day of work by a skilled bioinformatician. Now do it again and again with each new release. Analysis applications that speak DAS can do this in minutes with no human error.
DAS provides the "glue language" (in XML) for brokering queries and returning slices of data based on coordinates or feature searches. It is defined by a community vetted specification with several server and client implementations. Among other things these define how you represent each species, their genome, and their coordinate system.
Another key advantage of DAS is that allows one to separate data distribution from data use. I love the UCSC data tracks but can't always use their browser to analyze them.
Leave a comment:
-
In general, hosting data with HTTP/FTP is much more convenient to most researchers. If we want to look at data in small regions, we can use IGV/UCSC to view remote bam/bigbed/bigwig files. IGV also supports tabix indexing and thus VCF files.
Leave a comment:
-
Originally posted by Nix View PostIf your group can run a web site it can run a DAS/2 server. It really is rather easy to set up, just mysql and tomcat. Then have the biologists use the point and click web interface to load the data.
It is probably overkill to have every lab run a DAS/2 server, although they can. It is best if your department/ institute/ organization maintains one alongside their web server.
It's doable, but I know I'd be happier having a central managed repository than depending on every group/department to have a server running properly that contains the data reasonable format.
Leave a comment:
-
If your group can run a web site it can run a DAS/2 server. It really is rather easy to set up, just mysql and tomcat. Then have the biologists use the point and click web interface to load the data.
It is probably overkill to have every lab run a DAS/2 server, although they can. It is best if your department/ institute/ organization maintains one alongside their web server.
Forcing folks to properly annotate their data is another issue. Best to have journals hold the stick and require that datasets for publication be MINSEQE compliant.
Leave a comment:
-
Originally posted by Nix View PostWhy wait for the government to fix our problems when we can do it ourselves?
Having a central repository makes it a lot easier to make sure that the data is consistent and has enough details to be useful.
Leave a comment:
-
we could definitely DIY the process, the limitation being the bandwidth demand for any lab to do so. again this could be solved by using fedex-ing harddisks, but again who wants to take time in a lab to do so? and who has the expertise to setup such a thing. if not properly set up, backup would be an issue so would general performance of the server, since it will be soley used for fetching data all the time.
Leave a comment:
-
DAS/2 to the rescue?!
Why aren't people hosting and publishing their own data? There's no need to centralize this activity.
You are responsible for providing plasmids and cell lines etc. used in your papers why not the genomic data too in both its raw and processed form.
It's quite easy and useful to do this provided everyone uses the same communication protocol to enable programatic access so one doesn't have to manually download and reprocess the data before using it.
DAS (Distributed Annotation System) is one such protocol designed to do exactly this, it's been in use for >10yrs, with hundreds of servers world wide. DAS/2 is a modification to the original DAS/1 protocol optimized for large scale genomic data distribution using any file format (bam, bar, gff, bed, etc).
Check out http://www.biodas.org and http://www.biodas.org/wiki/DAS/2 and feel free to play around with our DAS/2 server http://bioserver.hci.utah.edu/BioInf.../Software:DAS2 or install your own http://bioserver.hci.utah.edu/BioInf...GenoPubInstall .
We've written up some of these tools in a recent paper if folks want to take a look: http://www.biomedcentral.com/1471-2105/11/455
Why wait for the government to fix our problems when we can do it ourselves?
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Leave a comment: