Currently, I'm looking into the Gene Expression Omnibus. Are there any other good websites that curate RNA-seq data (or perhaps provide links to where RNA-seq data can be found)? A bit of background: I'm looking for any publicly available RNA-seq data sets containing at least 10 individuals with some form of cancer (ideally breast cancer).
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
You can check the TCGA data portal for cancer sample data.
The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. Learn more about how the program transformed the cancer research community and beyond.
There are access tiers so you may need to look through those: https://tcga-data.nci.nih.gov/tcga/tcgaAccessTiers.jsp
Comment
-
TCGA sequence data is censored. There is no public primary or metastatic tumor data for U.S. studies. There are cell lines. But, are cell lines cancer? Patients should be allowed to release their genomes (both tumor and normal) for public inspection. Hopefully future studies will accommodate them and there will be great benefits from making the data public.
Comment
-
ENCODE data
ENCODE RNA-seq data from human cell lines can be found here:
For raw .fastq Read1/Read2 files select fastqRd1/2 in the "View" column.
Comment
-
"Asian Gastric Cancer" : http://trace.ncbi.nlm.nih.gov/Traces...tudy=SRP012016
Great! I knew that Asian samples where showing up in GEO with Affy SNP6 data.
Good to see Solid RNA-seq data is now showing up in SRA.
(I wish NCBI SRA would turn off their "freeze the machine" ajax/javascript nonsense)
Comment
-
Thanks for all the great suggestions. So far, I've mostly been looking into TCGA (https://tcga-data.nci.nih.gov/tcga/findArchives.htm). I downloaded the BRCA RNASeqV2 dataset (https://tcga-data.nci.nih.gov/tcga/s...rchiveId=10418). Does anyone know how to determine if each of the sample ID's came from distinct individuals? If that's the case, then there would be ~800-900 individuals in this data set - which seems to be unlikely given the size of other data sets. I wish there was a way to tell which samples came from distinct individuals.
The ENCODE database also looks very promising. Here is a tool that I've been using to find RNASeq data: http://genome.crg.es/~jlagarde/encode_RNA_dashboard/
Comment
-
Note that TCGA BRCA (breast cancer) data from UNC is just the idf/sdrf MAGE-TAB files.
It's just a description of the data processing.
See here : http://tab2mage.sourceforge.net/docs/magetab_docs.html form mage info.
The RNA-Seq "Asian Gastric Cancer" samples can be downloaded here:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra...012/SRP012016/
Use "wget -r" to get the whole thing.
The whole separting out the SRA study/sample/experiment/run thing is frustrating, but do-able. (The scars heal eventually.)
Comment
-
Original data files for TCGA are available from CGHub: https://cghub.ucsc.edu/
You will have to apply to get access: https://cghub.ucsc.edu/get_access.html
All the samples should be unique. Breast cancer was one of the major types included so there are many samples.
Additional information here: http://www.ncbi.nlm.nih.gov/projects...hs000178.v5.p5
Comment
-
Here's a dataset with 79 samples of RNA-seq for breast cancer patient samples.
https://www.ebi.ac.uk/ega/studies/EGAS00001000132
Comment
-
NB: The EMBL-EBI data is controlled access:
From https://www.ebi.ac.uk/ega/datasets/EGAD00001000113
Who controls access to this dataset
For each dataset that requires access control, there is a corresponding Data Access Committee (DAC) who determine access permissions. Data access is not the responsibility of the EGA. If you need to request access to this data set, please contact: Department of Molecular Oncology, BC Cancer Research Centre, Data Access Committee
Comment
-
Wow, the ebi dataset looks really promising. And it has a nice Nature journal article to accompany it too (http://www.nature.com/nature/journal...ture10933.html). Very cool.
Comment
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment