Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tbusch0000
    Junior Member
    • Nov 2010
    • 5

    How to convert sra-lite format to fastq?

    I am trying to dump sra-lite (sequence read archive) files to fastq format. On the NCBI Sequence Read Archive site it states:

    ...users are asked download runs of interest and execute dumps into the desired format using the SRA SDK toolkit available at http://www.ncbi.nlm.nih.gov/Traces/s...are&s=software

    I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".

    Any guidance would be much appreciated!
  • SongLi
    Member
    • Oct 2010
    • 19

    #2
    Although I can get their CentOS 64bit running, it's realy slow, take about 10hrs to unpack one file. I am also interested to know more about this new SRA-tools.

    Comment

    • tbusch0000
      Junior Member
      • Nov 2010
      • 5

      #3
      I just noticed they released a new MacOSX beta package.

      I downloaded that one and entered in the terminal $./fastq-dump -A SRP000910 -D SRR070499.lite.sra

      Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        Originally posted by tbusch0000 View Post
        I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".
        My guess is you download a 64bit Linux binary, which won't work on the Mac.

        Comment

        • tbusch0000
          Junior Member
          • Nov 2010
          • 5

          #5
          Originally posted by maubp View Post
          My guess is you download a 64bit Linux binary, which won't work on the Mac.
          Thanks, they've only just released the mac binaries. It will execute now, but gives the error message above.

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Originally posted by tbusch0000 View Post
            Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"
            How much RAM do you have, and how big is SRR070499.lite.sra?

            Comment

            • tbusch0000
              Junior Member
              • Nov 2010
              • 5

              #7
              Originally posted by maubp View Post
              How much RAM do you have, and how big is SRR070499.lite.sra?
              I have 6GB RAM and the file is 3.5 GB

              Comment

              • seb567
                Senior Member
                • Jul 2008
                • 260

                #8
                I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.

                [boiseb01@ls30 MyShortReadArchive]$ ldd /software/sratoolkit.2.0b4-2-centos_linux64/fastq-dump
                linux-vdso.so.1 => (0x00007fff361ff000)
                libdl.so.2 => /lib64/libdl.so.2 (0x00000033f5a00000)
                libz.so.1 => /lib64/libz.so.1 (0x00000033f6600000)
                libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003403e00000)
                libm.so.6 => /lib64/libm.so.6 (0x00000033f5600000)
                libc.so.6 => /lib64/libc.so.6 (0x00000033f5200000)
                /lib64/ld-linux-x86-64.so.2 (0x00000033f4e00000)
                Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                Comment

                • maubp
                  Peter (Biopython etc)
                  • Jul 2009
                  • 1544

                  #9
                  I'm not 100% sure how memmap works on the Mac, but it sounds like you should have enough RAM to load the whole file into memory (assuming no other memory hungry applications are running at the same time). Can you find a smaller example to test?

                  Comment

                  • SongLi
                    Member
                    • Oct 2010
                    • 19

                    #10
                    Hi seb567,

                    How slow are you experiencing with fasta-dump?

                    My experiene is this: my computer is Xeon 2.4G 4core, 12G RAM, fasta-dump takes 600 minutes to finish one sra file.

                    I have tried the newest release and also different sra files. fastq-dump is always very slow.

                    Thanks,

                    Originally posted by seb567 View Post
                    I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                    My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                    It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.



                    Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                    Comment

                    • seb567
                      Senior Member
                      • Jul 2008
                      • 260

                      #11
                      About 1-2 hours for a 2 GB sra file, though it is very approximated.

                      I downloaded all sra files for SRA010766, converted them from sra to fastq, then to fastq.gz. The script started yesterday 6 PM (EST).

                      So yours is slower, way slower.

                      [boiseb01@ls30 Illumina-SRX015621]$ ls
                      batch-3 SRR033559_1.fastq.gz SRR033570_1.fastq.gz SRR033581_1.fastq.gz SRR033592_1.fastq.gz SRR033603_1.fastq.gz SRR033614_1.fastq.gz SRR033625_1.fastq.gz
                      download.log SRR033559_2.fastq.gz SRR033570_2.fastq.gz SRR033581_2.fastq.gz SRR033592_2.fastq.gz SRR033603_2.fastq.gz SRR033614_2.fastq.gz SRR033625_2.fastq.gz
                      files.txt SRR033560_1.fastq.gz SRR033571_1.fastq.gz SRR033582_1.fastq.gz SRR033593_1.fastq.gz SRR033604_1.fastq.gz SRR033615_1.fastq.gz SRR033626_1.fastq.gz
                      list-sra.sh SRR033560_2.fastq.gz SRR033571_2.fastq.gz SRR033582_2.fastq.gz SRR033593_2.fastq.gz SRR033604_2.fastq.gz SRR033615_2.fastq.gz SRR033626_2.fastq.gz
                      newFiles SRR033561_1.fastq.gz SRR033572_1.fastq.gz SRR033583_1.fastq.gz SRR033594_1.fastq.gz SRR033605_1.fastq.gz SRR033616_1.fastq.gz SRR033627_1.fastq.gz
                      nohup.out SRR033561_2.fastq.gz SRR033572_2.fastq.gz SRR033583_2.fastq.gz SRR033594_2.fastq.gz SRR033605_2.fastq.gz SRR033616_2.fastq.gz SRR033627_2.fastq.gz
                      README SRR033562_1.fastq.gz SRR033573_1.fastq.gz SRR033584_1.fastq.gz SRR033595_1.fastq.gz SRR033606_1.fastq.gz SRR033617_1.fastq.gz SRR033628_1.fastq
                      SRA010766 SRR033562_2.fastq.gz SRR033573_2.fastq.gz SRR033584_2.fastq.gz SRR033595_2.fastq.gz SRR033606_2.fastq.gz SRR033617_2.fastq.gz SRR033628_2.fastq
                      SRR033552_1.fastq.gz SRR033563_1.fastq.gz SRR033574_1.fastq.gz SRR033585_1.fastq.gz SRR033596_1.fastq.gz SRR033607_1.fastq.gz SRR033618_1.fastq.gz SRR033629_1.fastq
                      SRR033552_2.fastq.gz SRR033563_2.fastq.gz SRR033574_2.fastq.gz SRR033585_2.fastq.gz SRR033596_2.fastq.gz SRR033607_2.fastq.gz SRR033618_2.fastq.gz SRR033629_2.fastq
                      SRR033553_1.fastq.gz SRR033564_1.fastq.gz SRR033575_1.fastq.gz SRR033586_1.fastq.gz SRR033597_1.fastq.gz SRR033608_1.fastq.gz SRR033619_1.fastq.gz SRR033630_1.fastq
                      SRR033553_2.fastq.gz SRR033564_2.fastq.gz SRR033575_2.fastq.gz SRR033586_2.fastq.gz SRR033597_2.fastq.gz SRR033608_2.fastq.gz SRR033619_2.fastq.gz SRR033630_2.fastq
                      SRR033554_1.fastq.gz SRR033565_1.fastq.gz SRR033576_1.fastq.gz SRR033587_1.fastq.gz SRR033598_1.fastq.gz SRR033609_1.fastq.gz SRR033620_1.fastq.gz SRR033631_1.fastq
                      SRR033554_2.fastq.gz SRR033565_2.fastq.gz SRR033576_2.fastq.gz SRR033587_2.fastq.gz SRR033598_2.fastq.gz SRR033609_2.fastq.gz SRR033620_2.fastq.gz SRR033631_2.fastq
                      SRR033555_1.fastq.gz SRR033566_1.fastq.gz SRR033577_1.fastq.gz SRR033588_1.fastq.gz SRR033599_1.fastq.gz SRR033610_1.fastq.gz SRR033621_1.fastq.gz SRR033632_1.fastq
                      SRR033555_2.fastq.gz SRR033566_2.fastq.gz SRR033577_2.fastq.gz SRR033588_2.fastq.gz SRR033599_2.fastq.gz SRR033610_2.fastq.gz SRR033621_2.fastq.gz SRR033632_2.fastq
                      SRR033556_1.fastq.gz SRR033567_1.fastq.gz SRR033578_1.fastq.gz SRR033589_1.fastq.gz SRR033600_1.fastq.gz SRR033611_1.fastq.gz SRR033622_1.fastq.gz SRR033633_1.fastq
                      SRR033556_2.fastq.gz SRR033567_2.fastq.gz SRR033578_2.fastq.gz SRR033589_2.fastq.gz SRR033600_2.fastq.gz SRR033611_2.fastq.gz SRR033622_2.fastq.gz SRR033633_2.fastq
                      SRR033557_1.fastq.gz SRR033568_1.fastq.gz SRR033579_1.fastq.gz SRR033590_1.fastq.gz SRR033601_1.fastq.gz SRR033612_1.fastq.gz SRR033623_1.fastq.gz
                      SRR033557_2.fastq.gz SRR033568_2.fastq.gz SRR033579_2.fastq.gz SRR033590_2.fastq.gz SRR033601_2.fastq.gz SRR033612_2.fastq.gz SRR033623_2.fastq.gz
                      SRR033558_1.fastq.gz SRR033569_1.fastq.gz SRR033580_1.fastq.gz SRR033591_1.fastq.gz SRR033602_1.fastq.gz SRR033613_1.fastq.gz SRR033624_1.fastq.gz
                      SRR033558_2.fastq.gz SRR033569_2.fastq.gz SRR033580_2.fastq.gz SRR033591_2.fastq.gz SRR033602_2.fastq.gz SRR033613_2.fastq.gz SRR033624_2.fastq.gz

                      Comment

                      • tbusch0000
                        Junior Member
                        • Nov 2010
                        • 5

                        #12
                        Thanks for the tips.

                        I got the fastq-dump working on an x-large amazon cloud instance running cent os ami.

                        Comment

                        • babaref
                          Junior Member
                          • Jul 2011
                          • 2

                          #13
                          How to convert fastq format to sra files? is there any perl script for this conversion?

                          Comment

                          • gsgs
                            Senior Member
                            • Oct 2009
                            • 139

                            #14
                            I want the table, that converts a byte from the sra file
                            into a sequence of nucleotides



                            SRA toolkit sourcecode has "4na" and "2na"

                            Comment

                            • dpryan
                              Devon Ryan
                              • Jul 2011
                              • 3478

                              #15
                              Why don't you either use fastq-dump or just download the gzipped fastq files from ENA (such as this one)?
                              Last edited by dpryan; 08-21-2013, 03:40 AM. Reason: forgot a word

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:03 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              18 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...