Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple fastq alignment with bowtie2 in server

    Hi!
    I'm trying to map multiple sra files (>6500) with bowtie2 against my reference genome. I am running slurm script in a server. While mapping for single sequence is working fine but when running bash loop all the time getting the following error

    "path/to/slurm_script: line 16: path/to/file1.fastq: Permission denied"

    Here is my slurm script

    #!/bin/bash
    #BATCH --job-name=ERR1135336.clean.reads.Assembly
    #SBATCH -N 1 # Number of nodes, not cores
    #SBATCH -t 2-00:00:00 # Walltime
    #SBATCH --ntasks-per-node 40 # Number of cores
    #SBATCH --output=out-%j.log # Output (console)
    #SBATCH --partition=test # Queue

    module use /gpfs/shared/modulefiles_local
    module use /gpfs/shared/modulefiles_local/bio
    module load bio/bowtie2/2.3.4

    for i in $(path/to/*.fastq)
    do
    bowtie2 -x PC_805 --threads 40 -U ${i} -S path/to/${i%%.fastq}.sam
    done


    I am not sure whether this is really a permission issue or bash scripting issue.

    Output of ls -l for the directory from where I am running slurm job

    drwxr-xr-x 2 chayan.roy domain users 4096 Apr 23 10:14 PC_805


    Output of ls -l for the directory where I am storing my fastq is

    drwxr-xr-x 22 chayan.roy domain users 4096 Apr 22 14:44 HMP_2017

    Any help will be much appreciated

    Thanks

  • #2
    You can't run a bash script inside one SLURM job and expect the jobs to be parallelised. Instead you should run bash script on the command line that in turn submits multiple/individual SLURM jobs.

    "path/to/" I assume this a real path on your system that you are obfuscating here? If not you need to have a real value there.

    Comment


    • #3
      Thanks for your prompt response.

      If I understood correctly I have to submit >6500 slurm array? Well this particular server has 56 nodes and each with 40 threads. Every single job is taking more than 3 hours. Is there any other ways to make it faster?

      p.s. I have shortened the long real path in my post.

      Comment


      • #4
        If you want true parallelization then yes you would need to submit 6500 jobs to queue. You are likely not the only user so most of them will pend but will finish eventually.

        Comment


        • #5
          Hi,
          In spite of giving the path in for loop, you can first add a prefix of the serial number in all your fastq files and then try like this

          for i in $(1 6500);
          do
          bowtie2 -x PC_805 --threads 40 -U $i -S path/to/$i\_.fastq.sam;
          done

          Hoping it will help.
          Last edited by archana87; 04-29-2019, 02:10 PM.

          Comment


          • #6
            Hi

            I am running parallel jobs but all the getting the following error which I am not sure from my array script or something else.

            Slurm Array

            PHP Code:
            #!/bin/bash

            #SBATCH --job-name=Bowtie_Array # Job name
            #SBATCH --nodes=12               # Number of nodes
            #SBATCH --ntasks-per-node=40     # CPUs per node (MAX=40 for CPU nodes and 80 for GPU)
            #SBATCH --output=bowtie-%A_%a.out  # Standard output (log file)
            #SBATCH --partition=test        # Partition/Queue
            #SBATCH --time=7-00:00:00          # Maximum walltime
            #SBATCH --array=0-12        # job array index

            module use /cm/shared/modulefiles_local
            module 
            use /gpfs/shared/modulefiles_local/bio
            module load bio
            /bowtie2/2.3.4

            names
            =($(cat jobs))
             
            echo ${
            names[${SLURM_ARRAY_TASK_ID}]}

            bowtie2 --threads 40 -/gpfs/scratch/chayan.roy/Pc_project/HGM_Genomes/Index/PC_1969.fasta -${names[${SLURM_ARRAY_TASK_ID}]} -S alignments/${names[${SLURM_ARRAY_TASK_ID}]}.sam 

            Error message

            SRR1789035.fastq
            /gpfs/shared/apps_local/bowtie2/2.3.4.3/bin/bowtie2-align-s: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
            (ERR): Description of arguments failed!
            Exiting now ...

            Any help?

            Comment


            • #7
              Did you download the bowtie2 binaries or compile the program yourself? Looks like the thread building blocks (tbb) library is missing on your cluster. See the section on "building from source" in the manual.

              Comment


              • #8
                I don't have installation access and I just ask them but they will take month to respond I know. In the meanwhile I am trying to bypass it using Anaconda. Do let me know if there is any better ways to do it.

                Thanks

                Comment


                • #9
                  If you use the conda option make sure to remove "module load bio/bowtie2/2.3.4 " from your script.

                  Hopefully your home directory is available on all cluster nodes because conda will install programs in your home directory by default.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Understanding Genetic Influence on Infectious Disease
                    by seqadmin




                    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                    09-09-2024, 10:59 AM
                  • seqadmin
                    Addressing Off-Target Effects in CRISPR Technologies
                    by seqadmin






                    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                    08-27-2024, 04:44 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 06:25 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 01:02 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-18-2024, 06:39 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-11-2024, 02:44 PM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X