Header Leaderboard Ad


GB File Storage/Transfer Solutions



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • GB File Storage/Transfer Solutions


    I'd appreciate any advice on moving large sequence/read files (5 GB to 25 GB) among different servers such as Galaxy or the UCSC Genome Browser.

    Our campus Linux servers provide 50 GB of storage and only allow secure FTP file transfers (SFTP). To upload a large file to Galaxy, I first have to download the file from the campus server to my Mac desktop, then upload the file to Galaxy or UCSC. This is a painfully slow process (several hours per file).

    What I am looking for is a cloud storage site that allows "server-to-server" file transfer so that I can eliminate the download/upload process to my desktop. Most of the services I have tried don't allow file transfers larger than 2 GB or don't support SFTP.

    If anyone could help me out with some suggestions, I'd really appreciate it.


  • #2
    I'm interested in this as well.


    • #3
      The only service I've found that allows storage and transfer of large files (>10 GB) is called Humyo, but it is a paid service, and doesn't support FTP transfer. For now, one of our lab desktop computers is being used to download the files from our Linux server and then upload them to Galaxy.

      If I use my free DropBox folder, it takes about 10 minutes to send a 1 GB file (gzip compressed) from the DropBox server to Galaxy. I'm happy with that, except that DropBox doesn't support file transfers larger than 2 GB.



      • #4
        Depending on how your linux ssh server is configured, you may consider to use sshfs-like approach - you "mount" the ssh file system to your local computer as if it is local. Data are still downloaded to your desktop and then uploaded to galaxy, but the two steps are done at the same time. You do not need extra disk space at all. (Hope I understand your question correctly)


        • #5

          Bingo! You did understand my question correctly, and you answered it perfectly.

          I installed an application called ExpanDrive on the desktop PC that mounts the Linux ssh file system as a local drive (sshfs). To test it out, I uploaded a small 1 MB FASTQ file to Galaxy directly with the "Get Data" Tool, and it uploaded in less than a minute.

          Then I uploaded a 1.6 GB file (gzipped) file to Galaxy. It took just over an hour to get the file on to Galaxy. As you said, it was a one step process.

          Your advice is much appreciated. This is very smooth compared to some of the clunky methods I was using without success.



          • #6
            Good to know it works. For linux, there is sshfs. For mac, there is macfuse+macfusion. I used to use macfuse before. Someone says macfuse does not work for snow leopard. If this happens, you may google out some solutions. I have not tried, though.


            • #7
              You're right (again). I read on the Google Code page that MacFuse doesn't support Snow Leopard's 64-bit kernel, but you can run it if OS X is started in 32-bit mode. ExpanDrive uses the MacFuse library, so the Mac version doesn't have 64-bit support. The Windows version of ExpanDrive seems to work fine.

              I use a Mac as my desktop computer for data analysis and an older lab PC running Windows XP to transfer the files via sshfs. Essentially the XP machine is just a file server, so I don't have to use additional resources on my Mac.



              • #8
                This seems to be a very interesting and useful discussion, but too technical for my background .. I will do some reading, but if someone can put it in layman terms...!


                • #9
                  If it seems "too technical", it's only because I didn't explain it clearly

                  The poster "lh3" gave me the advice that worked best for transferring large files from our Linux server to online tools such as Galaxy when a secure connection (SSH) is required. I can post the steps I used to do this. However, if you're looking for the specifics of SSH and SSHFS, other Linux or Unix gurus could probably explain those.

                  I'm one of those people with just enough knowledge to be dangerous, but I'd be glad to help where I can.


                  • #10
                    I don't know if this helps, but back in the day , everyone used FXP for this. It's still supported by a bunch of FTP servers, not sure how the support is with FTPS. It essentially lets you initiate a file transfer directly between two ftp servers, without going through your intermediate connection.
                    For windows the go to FXP app was flashFXP, googling around a bit i found this though,
                    might be useful?



                    • #11
                      SRA to Galaxy?

                      Hello All,

                      Does anyone know how to transfer files directly from ncbi's SRA to Galaxy. It seems it would save a fair amount of bandwidth to transfer directly. I've tried pasting the SRA dataset's download FTP URL into the Galaxy Get Data box, but an error is generated. The other issue is the .bz2 compression that SRA uses.

                      Any ideas out there? Thanks.


                      • #12
                        I thought the only way to upload data to galaxy (and others) was to use the web interface. I did not know ssh was allowed.

                        BTW, I am using sshfs (on top of macfuse) without problems (32 bit system though). Well, if you are in a laptop or machine that goes online on and off... you may leave your sshfs mount point in a unusable state. Recently I discover a way to umount:

                        $ sudo diskutil unmount force /Users/drio/sshfs/ardmore


                        • #13
                          Upload fastqs quickly to Galaxy

                          Just come across this care of subio support on youtube - saves me so much time and local disk space . . .


                          Sharing it in case other people don't know this already . . .

                          If you want to upload fastqs directly from SRA server to Galaxy . . . .use the DRA site at ddbj: http://trace.ddbj.nig.ac.jp/

                          look for your SRx experiment/run and copy the fastq link from the right hand side

                          paste this into the URL box on get data at galaxy

                          upload takes <1 min usually

                          Have fun



                          • #14
                            Originally posted by jjw14 View Post
                            ExpanDrive uses the MacFuse library, so the Mac version doesn't have 64-bit support.
                            There's OSXFUSE that replaces MacFuse and supports 64-bit kernels


                            Latest Articles


                            • seqadmin
                              A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                              by seqadmin

                              ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                              01-24-2023, 01:19 PM
                            • seqadmin
                              Introduction to Single-Cell Sequencing
                              by seqadmin
                              Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                              The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                              01-09-2023, 03:10 PM