Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Deutsche
    Junior Member
    • Apr 2011
    • 9

    SRA format

    Does anybody know about the SRA format specification? Does one exist?
    I just have found API on the NCBI site which can help to read SRA files but I haven't found any information about the format specification.
  • vadim
    Member
    • Sep 2009
    • 37

    #2
    ask them: [email protected]

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      You don't mean the SRA XML Specification, which is documented?
      This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


      Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?

      Comment

      • Deutsche
        Junior Member
        • Apr 2011
        • 9

        #4
        Originally posted by maubp View Post
        You don't mean the SRA XML Specification, which is documented?
        This documentation provides application notes for the Sequence Read Archive (SRA) at the National Center for Biotechnology Information.


        Rather I assume you mean the binary SRA files whose first 8 bytes are "NCBI.sra"? If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
        Yes, I mean the binary file format. Ok, I will write them and will answer you if I find something.

        Comment

        • vadim
          Member
          • Sep 2009
          • 37

          #5
          You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.

          Comment

          • Deutsche
            Junior Member
            • Apr 2011
            • 9

            #6
            Originally posted by vadim View Post
            You could also check their source code of, say, fastq-dump.c or of some other dumping tool. It worked for us.
            It is very difficult to understand a format specification from 12 Mbs of code so I want to try the simplest way at the beginning. If nothing is successful I will try to analyse the source code.

            Comment

            • Deutsche
              Junior Member
              • Apr 2011
              • 9

              #7
              Originally posted by vadim View Post
              Could you tell me please how did you got this address?

              Comment

              • vadim
                Member
                • Sep 2009
                • 37

                #8
                Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.

                Comment

                • Deutsche
                  Junior Member
                  • Apr 2011
                  • 9

                  #9
                  Originally posted by vadim View Post
                  Could you please explain what are you planning on doing with SRA data? Most people are happy with fastq/fasta dumps produced by standard tools. For something more complicated you could use the API from the SDK, which in my sense is much easier than understanding the format specs.
                  I'm working in the UGENE project and my next task is integration SRA format supporting into our tool. It is not simple to have just included SRA SDK into UGENE because of our tool is a cross-platform program but this SDK is only UNIX-supportable.

                  Comment

                  • vadim
                    Member
                    • Sep 2009
                    • 37

                    #10
                    I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                    Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.

                    Comment

                    • Deutsche
                      Junior Member
                      • Apr 2011
                      • 9

                      #11
                      Originally posted by vadim View Post
                      I believe SRA SDK can be build for Windows and Mac as well, although I have never actually tried this.
                      Is UGENE written in C++? In which case I would definitely consider re-using NCBI's code.
                      Yes, it is. C++ with Qt4.

                      Comment

                      • Deutsche
                        Junior Member
                        • Apr 2011
                        • 9

                        #12
                        Originally posted by maubp View Post
                        If you find a link, or you prompt the NCBI to publish this, could you post the URL here please?
                        Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.

                        Comment

                        • maubp
                          Peter (Biopython etc)
                          • Jul 2009
                          • 1544

                          #13
                          Originally posted by Deutsche View Post
                          Guys from NCBI said me that they don't give this documentation anybody. And if you want to use the SRA format then you need to use their API.
                          Well, at least they are clear about it.

                          Hurrah for the principles of openness and sharing in science! </sarcasm>

                          Comment

                          • jkbonfield
                            Senior Member
                            • Jul 2008
                            • 146

                            #14
                            It's perhaps a reasonable stance to take as it gives them flexibility of changing the format without having to keep notifying people, just as long as the API remains constant. However it does rather block interfaces being written by others in alternative languages.

                            The format is almost certainly quite complex though. I remember lots of discussions and to-ing and fro-ing on the best algorithms for compressing traces, qualities and sequences, with different methods for each type. As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                            I'm not sure what licence they use though and whether that would be a hindrance.

                            Comment

                            • vadim
                              Member
                              • Sep 2009
                              • 37

                              #15
                              Originally posted by jkbonfield View Post
                              As others have suggested, I'd recommend using their API and if it doesn't port cleanly to Windows then making it fixing that may be an easier task than reimplementing.

                              I'm not sure what licence they use though and whether that would be a hindrance.
                              It should work under windows, see here:
                              SRA Tools. Contribute to ncbi/sra-tools development by creating an account on GitHub.


                              It is not licensed, I asked them recently about it and they said "no restrictions", whatever that means.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              18 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...