Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I'm not sure if the problem is running up against my computation time limit - the job I specify to the scheduler is listed as completed in much less than the time allotted, and I am only running alignment on one individual to my reference at a time.

    I'm currently submitting jobs with different thread values, and in my output I receive a message indicating the number of reads processed in the initial iteration of bwa-mem, but there aren't any additional iterations of this output before I receive a list of all of the nodes in my reference assembly, and then what appears to be the sam formatted bwa mem alignment output.

    One thing I am curious about is how the output for bwa mem is presented and stored. Is there normally separate output in the terminal for bwa mem's run details (# of reads processed, iteration, etc), and only the sam formatted alignment data is written into the .sam file, or is all of this information included in the .sam file specified using ">"?
    In the scheduling system I'm working on, all of this information is written into a user-specified output file using "-o", and when I use the ">" command, bwa mem writes an additional blank file. All of the bwa mem run details and the sam format data appear to be written into the output file.

    I'm wondering if additional reads are being processed after the first sam formatted alignment output, but I can't see it because it's buried ~400,000 lines in my output file.
    Hopefully this isn't the case, but I'm wondering if there is any easy way to check?

    Comment


    • #17
      Ah. So the trick here is to write a little shell script that runs bwa and uses redirection (">") internally. You then schedule that to run and you'll find that you suddenly don't have blank SAM files.

      BTW, bwa will only write the alignments to the file specified with ">" (at the moment, it's not being told to do that, since the commands you're using to schedule things are getting that).

      Comment


      • #18
        If you are using LSF then you could enclose the entire bwa command in double quotes. The file redirect will then work.

        Comment


        • #19
          @dpryan - I'm fairly new to this stuff, what do you mean by getting bwa to run internally? Do you mean that I write a script with run details for bwa mem in it with the specified output, and then submit a scheduled job for that script?

          @GenoMax - the scheduling environment is SQ, which is a unified frontend for LSF+RMS and Maui+Torque. I'm not sure which scheduling environment I'm actually working in beyond that, is there usually something equivalent to your double quotes solution for other schedulers?

          Comment


          • #20
            Yes, that's what I mean.

            Comment


            • #21
              @TKTKTK: Here are some examples of job submissions under SQ: https://www.sharcnet.ca/help/index.p...mmand_to_SQ.3F. After the SQ options you can try enclosing the remainder of the bwa command in double quoutes (all the way to the end including all bwa options). This works for LSF alone and may for SQ.

              There is also an example of the wrapper script that Devon was referring to at the link above (scroll down on the page).

              Comment


              • #22
                Thanks for everyone's advice - running a script through the scheduler worked - the specified .sam file was written correctly, and I was able to confirm that bwa mem ran multiple iterations through the log file.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X