Currently, I'm looking into the Gene Expression Omnibus. Are there any other good websites that curate RNA-seq data (or perhaps provide links to where RNA-seq data can be found)? A bit of background: I'm looking for any publicly available RNA-seq data sets containing at least 10 individuals with some form of cancer (ideally breast cancer).
Unconfigured Ad
Collapse
X
-
-
You can check the TCGA data portal for cancer sample data.
The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. Learn more about how the program transformed the cancer research community and beyond.
There are access tiers so you may need to look through those: https://tcga-data.nci.nih.gov/tcga/tcgaAccessTiers.jsp
Comment
-
-
TCGA sequence data is censored. There is no public primary or metastatic tumor data for U.S. studies. There are cell lines. But, are cell lines cancer? Patients should be allowed to release their genomes (both tumor and normal) for public inspection. Hopefully future studies will accommodate them and there will be great benefits from making the data public.
Comment
-
-
ENCODE data
ENCODE RNA-seq data from human cell lines can be found here:
For raw .fastq Read1/Read2 files select fastqRd1/2 in the "View" column.
Comment
-
-
"Asian Gastric Cancer" : http://trace.ncbi.nlm.nih.gov/Traces...tudy=SRP012016
Great! I knew that Asian samples where showing up in GEO with Affy SNP6 data.
Good to see Solid RNA-seq data is now showing up in SRA.
(I wish NCBI SRA would turn off their "freeze the machine" ajax/javascript nonsense)
Comment
-
-
Thanks for all the great suggestions. So far, I've mostly been looking into TCGA (https://tcga-data.nci.nih.gov/tcga/findArchives.htm). I downloaded the BRCA RNASeqV2 dataset (https://tcga-data.nci.nih.gov/tcga/s...rchiveId=10418). Does anyone know how to determine if each of the sample ID's came from distinct individuals? If that's the case, then there would be ~800-900 individuals in this data set - which seems to be unlikely given the size of other data sets. I wish there was a way to tell which samples came from distinct individuals.
The ENCODE database also looks very promising. Here is a tool that I've been using to find RNASeq data: http://genome.crg.es/~jlagarde/encode_RNA_dashboard/
Comment
-
-
Note that TCGA BRCA (breast cancer) data from UNC is just the idf/sdrf MAGE-TAB files.
It's just a description of the data processing.
See here : http://tab2mage.sourceforge.net/docs/magetab_docs.html form mage info.
The RNA-Seq "Asian Gastric Cancer" samples can be downloaded here:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra...012/SRP012016/
Use "wget -r" to get the whole thing.
The whole separting out the SRA study/sample/experiment/run thing is frustrating, but do-able. (The scars heal eventually.)
Comment
-
-
Original data files for TCGA are available from CGHub: https://cghub.ucsc.edu/
You will have to apply to get access: https://cghub.ucsc.edu/get_access.html
All the samples should be unique. Breast cancer was one of the major types included so there are many samples.
Additional information here: http://www.ncbi.nlm.nih.gov/projects...hs000178.v5.p5
Comment
-
-
NB: The EMBL-EBI data is controlled access:
From https://www.ebi.ac.uk/ega/datasets/EGAD00001000113
Who controls access to this dataset
For each dataset that requires access control, there is a corresponding Data Access Committee (DAC) who determine access permissions. Data access is not the responsibility of the EGA. If you need to request access to this data set, please contact: Department of Molecular Oncology, BC Cancer Research Centre, Data Access Committee
Comment
-
-
Wow, the ebi dataset looks really promising. And it has a nice Nature journal article to accompany it too (http://www.nature.com/nature/journal...ture10933.html). Very cool.
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.
Here are nine questions we think about, in roughly the order they matter, before...-
Channel: Articles
06-18-2026, 07:11 AM -
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 06-26-2026, 11:10 AM
|
0 responses
16 views
0 reactions
|
Last Post
by SEQadmin2
06-26-2026, 11:10 AM
|
||
|
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population
by SEQadmin2
Started by SEQadmin2, 06-17-2026, 06:09 AM
|
0 responses
49 views
0 reactions
|
Last Post
by SEQadmin2
06-17-2026, 06:09 AM
|
||
|
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism
by SEQadmin2
Started by SEQadmin2, 06-09-2026, 11:58 AM
|
0 responses
108 views
0 reactions
|
Last Post
by SEQadmin2
06-09-2026, 11:58 AM
|
||
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
125 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
Comment