Announcement

Collapse
No announcement yet.

NCBI SRA database

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NCBI SRA database

    Hello,

    Is someone familay with NCBI SRA database: http://www.ncbi.nlm.nih.gov/sites/entrez

    searching SRA for SRP000607 about Korean genome study, got 5 experiments,

    What's the relation about experiment, runs and spots?

    These 5 experiment sampled from the same person, all supposed to have paired reads, but SRX002757 does not have paired data.

    Under SRX002761, the reads files are strange to me, like:

    06/11/2009 12:00AM 239 SRR016027.fastq.gz
    06/11/2009 12:00AM 788,821,597 SRR016027_1.fastq.gz
    06/11/2009 12:00AM 797,621,364 SRR016027_2.fastq.gz
    06/11/2009 12:00AM 22,470 SRR016028.fastq.gz
    06/11/2009 12:00AM 809,891,610 SRR016028_1.fastq.gz
    06/11/2009 12:00AM 810,659,524 SRR016028_2.fastq.gz

    SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?

    I want to play with this datasets, can I just use all the paired files in these 5 experiments and ignore the unpaired files like SRR016027.fastq.gz?

    Lots experts here, any help will be appriciated!

  • #2
    This link may be helpful to you (it really should be featured more prominently on the SRA)
    http://www.ncbi.nlm.nih.gov/bookshel...cbi&part=Aug09

    Excerpt (I've added linebreaks for clarity). One might think that in your case each Experiment had different instrument parameters or library characteristics and somewhere it would be documented, but as far as I can tell these were all 80x1 runs. Wierd.
    An Experiment describes specifically what was sequenced and the method used. It includes information about the source of the DNA, the Sample, the sequencing platform, and the processing of the data.

    Each Experiment is made up of one or more instrument Runs.

    A Run contains the results or reads from each spot in the instrument run.

    In the future, some data will also have an associated Analysis. These Analyses may include assemblies of the short reads into genomic or transcript contigs and alignment to existing genomes or alignments with SRA data.

    Records at each level have unique accession identifiers with a specific three letter prefix that indicates the type of record: ERP or SRP for Studies, SRS for samples, SRX for Experiments, and SRR for Runs.

    Comment


    • #3
      Thank you, krobison

      That information is quite helpful.

      Comment


      • #4
        SRR016027_1.fastq.gz mates to SRR016027_2.fastq.gz, how about SRR016027.fastq.gz?
        Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
        Thanks in advance!

        Comment


        • #5
          Originally posted by dvanic View Post
          Hi! I've actually got the same question, albeit for a different dataset. If SRR123456_1.fastq mates with SRR123456_2.fastq, then what is the (much smaller), but still "properly" formatted and reasonably sized (~25 Mb in my case) SRR123456.fastq file???
          Thanks in advance!
          I believe SRR123456.fastq contains the "leftovers": reads with missing mates (due to filtering etc. )

          Comment


          • #6
            Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!

            Comment


            • #7
              Originally posted by VC87 View Post
              Hi!can someonde tell me how can i search SRA files trouhgh metadata features (wether in GEO, ENA..)?thanks in advance!
              Not sure what exactly you are looking for but have you tried the advanced search: http://www.ncbi.nlm.nih.gov/sra/advanced

              Comment


              • #8
                Yes i have.I want to search all SRA files from Bisulfite seq library fixing certain features such as organism, tissue, age, sex etc..thanks anyway for your reply!

                Comment


                • #9
                  A search found this: http://sra.dbcls.jp/search

                  Project here: https://github.com/inutano/soylatte

                  R-solution: https://www.bioconductor.org/package...tml/SRAdb.html

                  Comment


                  • #10
                    Genomax thanks for your reply!i'll check that out

                    Comment


                    • #11
                      Does anyone know how to get the raw SRA files associated with the samples that we can search in the browser from the epigenomics database of NCBI? i suppose it should be possible to gte them from the sample ID but i dont know how to...

                      Comment


                      • #12
                        Do you want the SRA files or the fastq files?

                        Comment


                        • #13
                          SRA, for now

                          Comment


                          • #14
                            SRAtoolkit makes it easy to download the actual fastq data since you would have to uncompress the SRA files locally anyway. The toolkit saves you a step. You are most likely going to use the "fastq-dump" program. Help here: http://www.ncbi.nlm.nih.gov/Traces/s...ew=toolkit_doc

                            Comment


                            • #15
                              Thanks again.By the way, do you know if it is possible to convert wig to fasta (or SRA)?

                              Comment

                              Working...
                              X