Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bair
    Member
    • Jan 2010
    • 65

    NCBI SRA database

    Hello,

    Is someone familay with NCBI SRA database: http://www.ncbi.nlm.nih.gov/sites/entrez

    searching SRA for SRP000607 about Korean genome study, got 5 experiments,

    What's the relation about experiment, runs and spots?

    These 5 experiment sampled from the same person, all supposed to have paired reads, but SRX002757 does not have paired data.

    Under SRX002761, the reads files are strange to me, like:

    06/11/2009 12:00AM 239 SRR016027.fastq.gz
    06/11/2009 12:00AM 788,821,597 SRR016027_1.fastq.gz
    06/11/2009 12:00AM 797,621,364 SRR016027_2.fastq.gz
    06/11/2009 12:00AM 22,470 SRR016028.fastq.gz
    06/11/2009 12:00AM 809,891,610 SRR016028_1.fastq.gz
    06/11/2009 12:00AM 810,659,524 SRR016028_2.fastq.gz

    SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?

    I want to play with this datasets, can I just use all the paired files in these 5 experiments and ignore the unpaired files like SRR016027.fastq.gz?

    Lots experts here, any help will be appriciated!
  • krobison
    Senior Member
    • Nov 2007
    • 734

    #2
    This link may be helpful to you (it really should be featured more prominently on the SRA)
    The NCBI now maintains the Short Read Archive (SRA) (www.ncbi.nlm.nih.gov/Traces/sra/) as a repository for data from sequencing projects that use the new massively parallel sequencing technologies, often called next-generation sequencing. These methods can generate hundreds of megabases to gigabases of data in a single instrument run, millions of times the output of a standard Sanger sequencing instrument. Applications of these technologies include sequencing of new genomes, re-sequencing of targeted genomic regions, sequencing complete genomes of multiple individuals to mine for variations, transcriptome sequencing to sample splice variants and expression levels, environmental samples and other metagenome sequencing, and chromatin DNA binding protein analysis. SRA provides the ability to search and display aspects of SRA project data through the SRA homepage (Figure 1, top panel), and the Entrez system (Figure 1, bottom panel. The SRA site also provides direct access to download data through the Aspera Connect (www.aspera.com) client that offers much faster transfers than traditional ftp. A recently added BLAST service allows searches against the transcriptome sequencing studies from the SRA data.


    Excerpt (I've added linebreaks for clarity). One might think that in your case each Experiment had different instrument parameters or library characteristics and somewhere it would be documented, but as far as I can tell these were all 80x1 runs. Wierd.
    An Experiment describes specifically what was sequenced and the method used. It includes information about the source of the DNA, the Sample, the sequencing platform, and the processing of the data.

    Each Experiment is made up of one or more instrument Runs.

    A Run contains the results or reads from each spot in the instrument run.

    In the future, some data will also have an associated Analysis. These Analyses may include assemblies of the short reads into genomic or transcript contigs and alignment to existing genomes or alignments with SRA data.

    Records at each level have unique accession identifiers with a specific three letter prefix that indicates the type of record: ERP or SRP for Studies, SRS for samples, SRX for Experiments, and SRR for Runs.

    Comment

    • bair
      Member
      • Jan 2010
      • 65

      #3
      Thank you, krobison

      That information is quite helpful.

      Comment

      • dvanic
        Member
        • Jan 2012
        • 61

        #4
        SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?
        Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
        Thanks in advance!

        Comment

        • vadim
          Member
          • Sep 2009
          • 37

          #5
          Originally posted by dvanic View Post
          Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
          Thanks in advance!
          I believe SRR123456.fastq contains the "leftovers": reads with missing mates (due to filtering etc. )

          Comment

          • VC87
            Member
            • Oct 2015
            • 18

            #6
            Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Originally posted by VC87 View Post
              Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!
              Not sure what exactly you are looking for but have you tried the advanced search: http://www.ncbi.nlm.nih.gov/sra/advanced

              Comment

              • VC87
                Member
                • Oct 2015
                • 18

                #8
                Yes i have.I want to search all SRA files from Bisulfite seq library fixing certain features such as organism, tissue, age, sex etc..thanks anyway for your reply!

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  A search found this: http://sra.dbcls.jp/search

                  Project here: https://github.com/inutano/soylatte

                  R-solution: https://www.bioconductor.org/package...tml/SRAdb.html

                  Comment

                  • VC87
                    Member
                    • Oct 2015
                    • 18

                    #10
                    Genomax thanks for your reply!i'll check that out

                    Comment

                    • VC87
                      Member
                      • Oct 2015
                      • 18

                      #11
                      Does anyone know how to get the raw SRA files associated with the samples that we can search in the browser from the epigenomics database of NCBI? i suppose it should be possible to gte them from the sample ID but i dont know how to...

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        #12
                        Do you want the SRA files or the fastq files?

                        Comment

                        • VC87
                          Member
                          • Oct 2015
                          • 18

                          #13
                          SRA, for now

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            SRAtoolkit makes it easy to download the actual fastq data since you would have to uncompress the SRA files locally anyway. The toolkit saves you a step. You are most likely going to use the "fastq-dump" program. Help here: http://www.ncbi.nlm.nih.gov/Traces/s...ew=toolkit_doc

                            Comment

                            • VC87
                              Member
                              • Oct 2015
                              • 18

                              #15
                              Thanks again.By the way, do you know if it is possible to convert wig to fasta (or SRA)?

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...