Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • chayan
    Member
    • Nov 2012
    • 52

    Multiple fastq alignment with bowtie2 in server

    Hi!
    I'm trying to map multiple sra files (>6500) with bowtie2 against my reference genome. I am running slurm script in a server. While mapping for single sequence is working fine but when running bash loop all the time getting the following error

    "path/to/slurm_script: line 16: path/to/file1.fastq: Permission denied"

    Here is my slurm script

    #!/bin/bash
    #BATCH --job-name=ERR1135336.clean.reads.Assembly
    #SBATCH -N 1 # Number of nodes, not cores
    #SBATCH -t 2-00:00:00 # Walltime
    #SBATCH --ntasks-per-node 40 # Number of cores
    #SBATCH --output=out-%j.log # Output (console)
    #SBATCH --partition=test # Queue

    module use /gpfs/shared/modulefiles_local
    module use /gpfs/shared/modulefiles_local/bio
    module load bio/bowtie2/2.3.4

    for i in $(path/to/*.fastq)
    do
    bowtie2 -x PC_805 --threads 40 -U ${i} -S path/to/${i%%.fastq}.sam
    done


    I am not sure whether this is really a permission issue or bash scripting issue.

    Output of ls -l for the directory from where I am running slurm job

    drwxr-xr-x 2 chayan.roy domain users 4096 Apr 23 10:14 PC_805


    Output of ls -l for the directory where I am storing my fastq is

    drwxr-xr-x 22 chayan.roy domain users 4096 Apr 22 14:44 HMP_2017

    Any help will be much appreciated

    Thanks
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    You can't run a bash script inside one SLURM job and expect the jobs to be parallelised. Instead you should run bash script on the command line that in turn submits multiple/individual SLURM jobs.

    "path/to/" I assume this a real path on your system that you are obfuscating here? If not you need to have a real value there.

    Comment

    • chayan
      Member
      • Nov 2012
      • 52

      #3
      Thanks for your prompt response.

      If I understood correctly I have to submit >6500 slurm array? Well this particular server has 56 nodes and each with 40 threads. Every single job is taking more than 3 hours. Is there any other ways to make it faster?

      p.s. I have shortened the long real path in my post.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        If you want true parallelization then yes you would need to submit 6500 jobs to queue. You are likely not the only user so most of them will pend but will finish eventually.

        Comment

        • archana87
          Junior Member
          • Jul 2018
          • 6

          #5
          Hi,
          In spite of giving the path in for loop, you can first add a prefix of the serial number in all your fastq files and then try like this

          for i in $(1 6500);
          do
          bowtie2 -x PC_805 --threads 40 -U $i -S path/to/$i\_.fastq.sam;
          done

          Hoping it will help.
          Last edited by archana87; 04-29-2019, 02:10 PM.

          Comment

          • chayan
            Member
            • Nov 2012
            • 52

            #6
            Hi

            I am running parallel jobs but all the getting the following error which I am not sure from my array script or something else.

            Slurm Array

            PHP Code:
            #!/bin/bash

            #SBATCH --job-name=Bowtie_Array # Job name
            #SBATCH --nodes=12               # Number of nodes
            #SBATCH --ntasks-per-node=40     # CPUs per node (MAX=40 for CPU nodes and 80 for GPU)
            #SBATCH --output=bowtie-%A_%a.out  # Standard output (log file)
            #SBATCH --partition=test        # Partition/Queue
            #SBATCH --time=7-00:00:00          # Maximum walltime
            #SBATCH --array=0-12        # job array index

            module use /cm/shared/modulefiles_local
            module 
            use /gpfs/shared/modulefiles_local/bio
            module load bio
            /bowtie2/2.3.4

            names
            =($(cat jobs))
             
            echo ${
            names[${SLURM_ARRAY_TASK_ID}]}

            bowtie2 --threads 40 -/gpfs/scratch/chayan.roy/Pc_project/HGM_Genomes/Index/PC_1969.fasta -${names[${SLURM_ARRAY_TASK_ID}]} -S alignments/${names[${SLURM_ARRAY_TASK_ID}]}.sam 

            Error message

            SRR1789035.fastq
            /gpfs/shared/apps_local/bowtie2/2.3.4.3/bin/bowtie2-align-s: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
            (ERR): Description of arguments failed!
            Exiting now ...

            Any help?

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Did you download the bowtie2 binaries or compile the program yourself? Looks like the thread building blocks (tbb) library is missing on your cluster. See the section on "building from source" in the manual.

              Comment

              • chayan
                Member
                • Nov 2012
                • 52

                #8
                I don't have installation access and I just ask them but they will take month to respond I know. In the meanwhile I am trying to bypass it using Anaconda. Do let me know if there is any better ways to do it.

                Thanks

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  If you use the conda option make sure to remove "module load bio/bowtie2/2.3.4 " from your script.

                  Hopefully your home directory is available on all cluster nodes because conda will install programs in your home directory by default.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 08:59 AM
                  0 responses
                  10 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...