No announcement yet.

Multiple fastq alignment with bowtie2 in server

  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple fastq alignment with bowtie2 in server

    I'm trying to map multiple sra files (>6500) with bowtie2 against my reference genome. I am running slurm script in a server. While mapping for single sequence is working fine but when running bash loop all the time getting the following error

    "path/to/slurm_script: line 16: path/to/file1.fastq: Permission denied"

    Here is my slurm script

    #BATCH --job-name=ERR1135336.clean.reads.Assembly
    #SBATCH -N 1 # Number of nodes, not cores
    #SBATCH -t 2-00:00:00 # Walltime
    #SBATCH --ntasks-per-node 40 # Number of cores
    #SBATCH --output=out-%j.log # Output (console)
    #SBATCH --partition=test # Queue

    module use /gpfs/shared/modulefiles_local
    module use /gpfs/shared/modulefiles_local/bio
    module load bio/bowtie2/2.3.4

    for i in $(path/to/*.fastq)
    bowtie2 -x PC_805 --threads 40 -U ${i} -S path/to/${i%%.fastq}.sam

    I am not sure whether this is really a permission issue or bash scripting issue.

    Output of ls -l for the directory from where I am running slurm job

    drwxr-xr-x 2 chayan.roy domain users 4096 Apr 23 10:14 PC_805

    Output of ls -l for the directory where I am storing my fastq is

    drwxr-xr-x 22 chayan.roy domain users 4096 Apr 22 14:44 HMP_2017

    Any help will be much appreciated


  • #2
    You can't run a bash script inside one SLURM job and expect the jobs to be parallelised. Instead you should run bash script on the command line that in turn submits multiple/individual SLURM jobs.

    "path/to/" I assume this a real path on your system that you are obfuscating here? If not you need to have a real value there.


    • #3
      Thanks for your prompt response.

      If I understood correctly I have to submit >6500 slurm array? Well this particular server has 56 nodes and each with 40 threads. Every single job is taking more than 3 hours. Is there any other ways to make it faster?

      p.s. I have shortened the long real path in my post.


      • #4
        If you want true parallelization then yes you would need to submit 6500 jobs to queue. You are likely not the only user so most of them will pend but will finish eventually.


        • #5
          In spite of giving the path in for loop, you can first add a prefix of the serial number in all your fastq files and then try like this

          for i in $(1 6500);
          bowtie2 -x PC_805 --threads 40 -U $i -S path/to/$i\_.fastq.sam;

          Hoping it will help.
          Last edited by archana87; 04-29-2019, 02:10 PM.


          • #6

            I am running parallel jobs but all the getting the following error which I am not sure from my array script or something else.

            Slurm Array

            PHP Code:

            #SBATCH --job-name=Bowtie_Array # Job name
            #SBATCH --nodes=12               # Number of nodes
            #SBATCH --ntasks-per-node=40     # CPUs per node (MAX=40 for CPU nodes and 80 for GPU)
            #SBATCH --output=bowtie-%A_%a.out  # Standard output (log file)
            #SBATCH --partition=test        # Partition/Queue
            #SBATCH --time=7-00:00:00          # Maximum walltime
            #SBATCH --array=0-12        # job array index

            module use /cm/shared/modulefiles_local
            use /gpfs/shared/modulefiles_local/bio
            module load bio

            =($(cat jobs))
            echo ${

            bowtie2 --threads 40 -/gpfs/scratch/chayan.roy/Pc_project/HGM_Genomes/Index/PC_1969.fasta -${names[${SLURM_ARRAY_TASK_ID}]} -S alignments/${names[${SLURM_ARRAY_TASK_ID}]}.sam 

            Error message

            /gpfs/shared/apps_local/bowtie2/ error while loading shared libraries: cannot open shared object file: No such file or directory
            (ERR): Description of arguments failed!
            Exiting now ...

            Any help?


            • #7
              Did you download the bowtie2 binaries or compile the program yourself? Looks like the thread building blocks (tbb) library is missing on your cluster. See the section on "building from source" in the manual.


              • #8
                I don't have installation access and I just ask them but they will take month to respond I know. In the meanwhile I am trying to bypass it using Anaconda. Do let me know if there is any better ways to do it.



                • #9
                  If you use the conda option make sure to remove "module load bio/bowtie2/2.3.4 " from your script.

                  Hopefully your home directory is available on all cluster nodes because conda will install programs in your home directory by default.