Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pulling Data from NCBI Based on Simple Criteria

    I have spent a good hour reading about NCBI BioSamples and BioProjects and using their searches to download data from environments that have metadata for "polar" and "marine" environments that were produced by 16S rRNA gene amplicon sequencing.

    I post this thread b/c I have had very little success in finding useful datasets. I expected that my search criteria are simple enough to generate a decent list of datasets, but I've had to manually pick through a very non-specific list of hits.

    Can someone comment on a good workflow to achieve what I am aiming at, or provide me with a good walk-through? Surely, this kind of basic data retrieval is common practice and should be easier than I'm finding it... right?

    Thanks in advance,
    Roli

  • #2
    I am going to hazard a guess that you can only find what is there in the first place. Sounds like sequence submitters may not be doing a good job of submitting adequate metadata.

    That said, I had recently found an R-based package to search SRA metadata and posted it in one of the threads. You can search here or I can look for that thread. Perhaps that may help.

    Here is that post: http://seqanswers.com/forums/showpos...45&postcount=9
    Last edited by GenoMax; 11-13-2015, 06:06 PM.

    Comment


    • #3
      @GenoMax's answer is certainly more useful than mine.

      I think it's a pipe dream to think that you will find all the datasets neatly organized on NCBI. You might be better off just using Google to find the articles published by the studies, and then locate the datasets.

      Inevitably, the raw datasets will be hard to locate, and incomplete. The file formats will differ from one study to the other. The library preparation protocols, and data processing steps, will vary from one study to another, and will be poorly documented.

      I myself submit data to NCBI, albeit related to human health, and I can tell you that it's a mess. Even today, there is no consensus on what data to submit, or under what format. You can imagine how it was for datasets collected a few years ago, at the dawn of next generation sequencing.

      Researchers are mainly interested in getting their paper published. Since most journals now require that the dataset be uploaded to NCBI, researchers will do so. However, providing the data to the public in a neat and organized manner is not a major preoccupation, and even for a conscientious researcher, it's not always clear under what format the data should be uploaded or with which accompanying information.
      Last edited by blancha; 11-13-2015, 07:39 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-25-2024, 11:49 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      62 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X