Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find TCGA data in cghub?

    We have gotten access approval for some TCGA data, but how do I find them? I have GeneTorrent (gtdownload and cgquery) installed, but it seems mightily difficult to find anything I'm looking for.

    For instance, I'm trying to download a couple of data sets from TCGA's lung adenocarcinoma studies: http://www.sciencedirect.com/science...92867412010616

    "The dbGAP accession number for the data reported in this paper is phs000488.v1.p1."

    The dbGAP page can be found here: http://www.ncbi.nlm.nih.gov/projects...hs000488.v1.p1

    cgquery "study=phs000488" returned zero result, as is the case for pretty much all the accession numbers I've found in any paper.

    I downloaded some supplementary files from the article's website, but couldn't identify any of the Patient ID in the cghub's data manifest file, e.g., http://www.broadinstitute.org/pubs/l...UAD-5V8LT.html


    So....... does anyone know actually how to find the data set you're looking for?

    Thanks.

  • #2
    FOR BAM FILES ...

    Have you seen this ...

    I usually don't like these JavaScript GUI click click click things, but this is not so bad.

    You want the analysis_ids.

    My script for this (which you must customize to your location ) is ..

    #point to your executable and libraries for cghub client where you put them
    export LD_LIBRARY_PATH=/data/data04/CG/cghub/lib/:/data/data04/CG/cghub/lib/GeneTorrent/:/h1/finneyr/xerces-c-3.0.1/src/.libs/:/h1/finneyr/XQilla-2.2.3/.libs/:LD_LIBRARY_PATH
    export PATH=/data/data04/CG/cghub/bin/:$PATH:

    function f
    {
    gtdownload -vv -c /h1/finneyr/finneyr.key -d $1
    sleep 2;
    }

    # just add "f analysis_id"

    f 038d680d-4a29-4be1-9568-72d80a52c782
    f 059e80af-c614-4424-8075-d42f072705b2



    ALTERNATELY
    You can grab info for BAMs for a project like this ...
    function f
    {
    echo $1
    cghub/bin/cgquery disease_abbr=$1 > cghub.$1.txt
    n=$((i%5))
    if [ $n -eq 0 ]; then sleep 1; fi
    ((i=i+1))
    sleep 2;
    }

    f LIHC
    f LUAD

    This creates reports for LIHC (liver) and LUAD (lung adeno).
    You can parse out the analysis_ids.



    FOR OTHER STUFF ...

    The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. Learn more about how the program transformed the cancer research community and beyond.


    FOR PROTECTED OTHER STUFF ...
    The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. Learn more about how the program transformed the cancer research community and beyond.

    (you need to log in)

    For the tcga-data.nci.nih.gov sites , you can write a script to grab a listing of all files.
    Last edited by Richard Finney; 08-26-2014, 04:51 PM.

    Comment


    • #3
      Hi,

      Yep, like Richard has pointed out, you need to get a hold of analysis IDs.

      We typically use https://browser.cghub.ucsc.edu/ to search for the samples that we're interested in, and then link the samples to the patient & sample metadata available for the various cohorts here: https://tcga-data.nci.nih.gov/tcgafi...onymous/tumor/

      For what it's worth, Station X has spent a lot of timing organizing and curating the patient & sample metadata, and subsequently attaching to the various genomics assays generated by The Cancer Genome Atlas. This data is all prepped and ready for analysis in GenePool.

      If you're interested, here are some related posts about it:

      Registered SEQanswers sponsors/vendors can post commercial content here. Please support our sponsors!

      Registered SEQanswers sponsors/vendors can post commercial content here. Please support our sponsors!


      Good Luck!

      ------------------------------
      GenePool is making genomics data management, analysis, and sharing easier!
      Products @ www.stationxinc.com
      Last edited by GenePool; 11-23-2014, 09:25 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X