Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert sra-lite format to fastq?

    I am trying to dump sra-lite (sequence read archive) files to fastq format. On the NCBI Sequence Read Archive site it states:

    ...users are asked download runs of interest and execute dumps into the desired format using the SRA SDK toolkit available at http://www.ncbi.nlm.nih.gov/Traces/s...are&s=software

    I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".

    Any guidance would be much appreciated!

  • #2
    Although I can get their CentOS 64bit running, it's realy slow, take about 10hrs to unpack one file. I am also interested to know more about this new SRA-tools.

    Comment


    • #3
      I just noticed they released a new MacOSX beta package.

      I downloaded that one and entered in the terminal $./fastq-dump -A SRP000910 -D SRR070499.lite.sra

      Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"

      Comment


      • #4
        Originally posted by tbusch0000 View Post
        I downloaded the precompiled toolkit for 64-bit architecture onto my macbookpro running snow leopard and tried to run the fastq-dump executable from the terminal, and get the error message "cannot execute binary file".
        My guess is you download a 64bit Linux binary, which won't work on the Mac.

        Comment


        • #5
          Originally posted by maubp View Post
          My guess is you download a 64bit Linux binary, which won't work on the Mac.
          Thanks, they've only just released the mac binaries. It will execute now, but gives the error message above.

          Comment


          • #6
            Originally posted by tbusch0000 View Post
            Received error message: "memory exhausted while constructing memory map within file system module - failed to open 'SRR070499.lite.sra'"
            How much RAM do you have, and how big is SRR070499.lite.sra?

            Comment


            • #7
              Originally posted by maubp View Post
              How much RAM do you have, and how big is SRR070499.lite.sra?
              I have 6GB RAM and the file is 3.5 GB

              Comment


              • #8
                I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.

                [boiseb01@ls30 MyShortReadArchive]$ ldd /software/sratoolkit.2.0b4-2-centos_linux64/fastq-dump
                linux-vdso.so.1 => (0x00007fff361ff000)
                libdl.so.2 => /lib64/libdl.so.2 (0x00000033f5a00000)
                libz.so.1 => /lib64/libz.so.1 (0x00000033f6600000)
                libbz2.so.1 => /lib64/libbz2.so.1 (0x0000003403e00000)
                libm.so.6 => /lib64/libm.so.6 (0x00000033f5600000)
                libc.so.6 => /lib64/libc.so.6 (0x00000033f5200000)
                /lib64/ld-linux-x86-64.so.2 (0x00000033f4e00000)
                Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                Comment


                • #9
                  I'm not 100% sure how memmap works on the Mac, but it sounds like you should have enough RAM to load the whole file into memory (assuming no other memory hungry applications are running at the same time). Can you find a smaller example to test?

                  Comment


                  • #10
                    Hi seb567,

                    How slow are you experiencing with fasta-dump?

                    My experiene is this: my computer is Xeon 2.4G 4core, 12G RAM, fasta-dump takes 600 minutes to finish one sra file.

                    I have tried the newest release and also different sra files. fastq-dump is always very slow.

                    Thanks,

                    Originally posted by seb567 View Post
                    I have to download and convert files to test Ray, the assembler I am working on (see a thread elsewhere on this forum).

                    My take on sratoolkit (I use /software/sratoolkit.2.0b4-2-centos_linux64/):

                    It is slow, but it works. My guess is that data are compressed, using something like LIBBZ2 (it is just a guess). That explains the compression ratio as well as the slowness.



                    Binaries are linked against libz and libbz2, but the slowness indicates that they probably rely on libbz2.

                    Comment


                    • #11
                      About 1-2 hours for a 2 GB sra file, though it is very approximated.

                      I downloaded all sra files for SRA010766, converted them from sra to fastq, then to fastq.gz. The script started yesterday 6 PM (EST).

                      So yours is slower, way slower.

                      [boiseb01@ls30 Illumina-SRX015621]$ ls
                      batch-3 SRR033559_1.fastq.gz SRR033570_1.fastq.gz SRR033581_1.fastq.gz SRR033592_1.fastq.gz SRR033603_1.fastq.gz SRR033614_1.fastq.gz SRR033625_1.fastq.gz
                      download.log SRR033559_2.fastq.gz SRR033570_2.fastq.gz SRR033581_2.fastq.gz SRR033592_2.fastq.gz SRR033603_2.fastq.gz SRR033614_2.fastq.gz SRR033625_2.fastq.gz
                      files.txt SRR033560_1.fastq.gz SRR033571_1.fastq.gz SRR033582_1.fastq.gz SRR033593_1.fastq.gz SRR033604_1.fastq.gz SRR033615_1.fastq.gz SRR033626_1.fastq.gz
                      list-sra.sh SRR033560_2.fastq.gz SRR033571_2.fastq.gz SRR033582_2.fastq.gz SRR033593_2.fastq.gz SRR033604_2.fastq.gz SRR033615_2.fastq.gz SRR033626_2.fastq.gz
                      newFiles SRR033561_1.fastq.gz SRR033572_1.fastq.gz SRR033583_1.fastq.gz SRR033594_1.fastq.gz SRR033605_1.fastq.gz SRR033616_1.fastq.gz SRR033627_1.fastq.gz
                      nohup.out SRR033561_2.fastq.gz SRR033572_2.fastq.gz SRR033583_2.fastq.gz SRR033594_2.fastq.gz SRR033605_2.fastq.gz SRR033616_2.fastq.gz SRR033627_2.fastq.gz
                      README SRR033562_1.fastq.gz SRR033573_1.fastq.gz SRR033584_1.fastq.gz SRR033595_1.fastq.gz SRR033606_1.fastq.gz SRR033617_1.fastq.gz SRR033628_1.fastq
                      SRA010766 SRR033562_2.fastq.gz SRR033573_2.fastq.gz SRR033584_2.fastq.gz SRR033595_2.fastq.gz SRR033606_2.fastq.gz SRR033617_2.fastq.gz SRR033628_2.fastq
                      SRR033552_1.fastq.gz SRR033563_1.fastq.gz SRR033574_1.fastq.gz SRR033585_1.fastq.gz SRR033596_1.fastq.gz SRR033607_1.fastq.gz SRR033618_1.fastq.gz SRR033629_1.fastq
                      SRR033552_2.fastq.gz SRR033563_2.fastq.gz SRR033574_2.fastq.gz SRR033585_2.fastq.gz SRR033596_2.fastq.gz SRR033607_2.fastq.gz SRR033618_2.fastq.gz SRR033629_2.fastq
                      SRR033553_1.fastq.gz SRR033564_1.fastq.gz SRR033575_1.fastq.gz SRR033586_1.fastq.gz SRR033597_1.fastq.gz SRR033608_1.fastq.gz SRR033619_1.fastq.gz SRR033630_1.fastq
                      SRR033553_2.fastq.gz SRR033564_2.fastq.gz SRR033575_2.fastq.gz SRR033586_2.fastq.gz SRR033597_2.fastq.gz SRR033608_2.fastq.gz SRR033619_2.fastq.gz SRR033630_2.fastq
                      SRR033554_1.fastq.gz SRR033565_1.fastq.gz SRR033576_1.fastq.gz SRR033587_1.fastq.gz SRR033598_1.fastq.gz SRR033609_1.fastq.gz SRR033620_1.fastq.gz SRR033631_1.fastq
                      SRR033554_2.fastq.gz SRR033565_2.fastq.gz SRR033576_2.fastq.gz SRR033587_2.fastq.gz SRR033598_2.fastq.gz SRR033609_2.fastq.gz SRR033620_2.fastq.gz SRR033631_2.fastq
                      SRR033555_1.fastq.gz SRR033566_1.fastq.gz SRR033577_1.fastq.gz SRR033588_1.fastq.gz SRR033599_1.fastq.gz SRR033610_1.fastq.gz SRR033621_1.fastq.gz SRR033632_1.fastq
                      SRR033555_2.fastq.gz SRR033566_2.fastq.gz SRR033577_2.fastq.gz SRR033588_2.fastq.gz SRR033599_2.fastq.gz SRR033610_2.fastq.gz SRR033621_2.fastq.gz SRR033632_2.fastq
                      SRR033556_1.fastq.gz SRR033567_1.fastq.gz SRR033578_1.fastq.gz SRR033589_1.fastq.gz SRR033600_1.fastq.gz SRR033611_1.fastq.gz SRR033622_1.fastq.gz SRR033633_1.fastq
                      SRR033556_2.fastq.gz SRR033567_2.fastq.gz SRR033578_2.fastq.gz SRR033589_2.fastq.gz SRR033600_2.fastq.gz SRR033611_2.fastq.gz SRR033622_2.fastq.gz SRR033633_2.fastq
                      SRR033557_1.fastq.gz SRR033568_1.fastq.gz SRR033579_1.fastq.gz SRR033590_1.fastq.gz SRR033601_1.fastq.gz SRR033612_1.fastq.gz SRR033623_1.fastq.gz
                      SRR033557_2.fastq.gz SRR033568_2.fastq.gz SRR033579_2.fastq.gz SRR033590_2.fastq.gz SRR033601_2.fastq.gz SRR033612_2.fastq.gz SRR033623_2.fastq.gz
                      SRR033558_1.fastq.gz SRR033569_1.fastq.gz SRR033580_1.fastq.gz SRR033591_1.fastq.gz SRR033602_1.fastq.gz SRR033613_1.fastq.gz SRR033624_1.fastq.gz
                      SRR033558_2.fastq.gz SRR033569_2.fastq.gz SRR033580_2.fastq.gz SRR033591_2.fastq.gz SRR033602_2.fastq.gz SRR033613_2.fastq.gz SRR033624_2.fastq.gz

                      Comment


                      • #12
                        Thanks for the tips.

                        I got the fastq-dump working on an x-large amazon cloud instance running cent os ami.

                        Comment


                        • #13
                          How to convert fastq format to sra files? is there any perl script for this conversion?

                          Comment


                          • #14
                            I want the table, that converts a byte from the sra file
                            into a sequence of nucleotides



                            SRA toolkit sourcecode has "4na" and "2na"

                            Comment


                            • #15
                              Why don't you either use fastq-dump or just download the gzipped fastq files from ENA (such as this one)?
                              Last edited by dpryan; 08-21-2013, 03:40 AM. Reason: forgot a word

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X