Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NCBI SRA database

    Hello,

    Is someone familay with NCBI SRA database: http://www.ncbi.nlm.nih.gov/sites/entrez

    searching SRA for SRP000607 about Korean genome study, got 5 experiments,

    What's the relation about experiment, runs and spots?

    These 5 experiment sampled from the same person, all supposed to have paired reads, but SRX002757 does not have paired data.

    Under SRX002761, the reads files are strange to me, like:

    06/11/2009 12:00AM 239 SRR016027.fastq.gz
    06/11/2009 12:00AM 788,821,597 SRR016027_1.fastq.gz
    06/11/2009 12:00AM 797,621,364 SRR016027_2.fastq.gz
    06/11/2009 12:00AM 22,470 SRR016028.fastq.gz
    06/11/2009 12:00AM 809,891,610 SRR016028_1.fastq.gz
    06/11/2009 12:00AM 810,659,524 SRR016028_2.fastq.gz

    SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?

    I want to play with this datasets, can I just use all the paired files in these 5 experiments and ignore the unpaired files like SRR016027.fastq.gz?

    Lots experts here, any help will be appriciated!

  • #2
    This link may be helpful to you (it really should be featured more prominently on the SRA)
    The NCBI now maintains the Short Read Archive (SRA) (www.ncbi.nlm.nih.gov/Traces/sra/) as a repository for data from sequencing projects that use the new massively parallel sequencing technologies, often called next-generation sequencing. These methods can generate hundreds of megabases to gigabases of data in a single instrument run, millions of times the output of a standard Sanger sequencing instrument. Applications of these technologies include sequencing of new genomes, re-sequencing of targeted genomic regions, sequencing complete genomes of multiple individuals to mine for variations, transcriptome sequencing to sample splice variants and expression levels, environmental samples and other metagenome sequencing, and chromatin DNA binding protein analysis. SRA provides the ability to search and display aspects of SRA project data through the SRA homepage (Figure 1, top panel), and the Entrez system (Figure 1, bottom panel. The SRA site also provides direct access to download data through the Aspera Connect (www.aspera.com) client that offers much faster transfers than traditional ftp. A recently added BLAST service allows searches against the transcriptome sequencing studies from the SRA data.


    Excerpt (I've added linebreaks for clarity). One might think that in your case each Experiment had different instrument parameters or library characteristics and somewhere it would be documented, but as far as I can tell these were all 80x1 runs. Wierd.
    An Experiment describes specifically what was sequenced and the method used. It includes information about the source of the DNA, the Sample, the sequencing platform, and the processing of the data.

    Each Experiment is made up of one or more instrument Runs.

    A Run contains the results or reads from each spot in the instrument run.

    In the future, some data will also have an associated Analysis. These Analyses may include assemblies of the short reads into genomic or transcript contigs and alignment to existing genomes or alignments with SRA data.

    Records at each level have unique accession identifiers with a specific three letter prefix that indicates the type of record: ERP or SRP for Studies, SRS for samples, SRX for Experiments, and SRR for Runs.

    Comment


    • #3
      Thank you, krobison

      That information is quite helpful.

      Comment


      • #4
        SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?
        Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
        Thanks in advance!

        Comment


        • #5
          Originally posted by dvanic View Post
          Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
          Thanks in advance!
          I believe SRR123456.fastq contains the "leftovers": reads with missing mates (due to filtering etc. )

          Comment


          • #6
            Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!

            Comment


            • #7
              Originally posted by VC87 View Post
              Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!
              Not sure what exactly you are looking for but have you tried the advanced search: http://www.ncbi.nlm.nih.gov/sra/advanced

              Comment


              • #8
                Yes i have.I want to search all SRA files from Bisulfite seq library fixing certain features such as organism, tissue, age, sex etc..thanks anyway for your reply!

                Comment


                • #9
                  A search found this: http://sra.dbcls.jp/search

                  Project here: https://github.com/inutano/soylatte

                  R-solution: https://www.bioconductor.org/package...tml/SRAdb.html

                  Comment


                  • #10
                    Genomax thanks for your reply!i'll check that out

                    Comment


                    • #11
                      Does anyone know how to get the raw SRA files associated with the samples that we can search in the browser from the epigenomics database of NCBI? i suppose it should be possible to gte them from the sample ID but i dont know how to...

                      Comment


                      • #12
                        Do you want the SRA files or the fastq files?

                        Comment


                        • #13
                          SRA, for now

                          Comment


                          • #14
                            SRAtoolkit makes it easy to download the actual fastq data since you would have to uncompress the SRA files locally anyway. The toolkit saves you a step. You are most likely going to use the "fastq-dump" program. Help here: http://www.ncbi.nlm.nih.gov/Traces/s...ew=toolkit_doc

                            Comment


                            • #15
                              Thanks again.By the way, do you know if it is possible to convert wig to fasta (or SRA)?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-14-2024, 07:03 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-10-2024, 06:35 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-09-2024, 02:46 PM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-07-2024, 06:57 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X