Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • AnushaC
    Member
    • Sep 2013
    • 78

    shell scripting searching directories and subdirectories

    Hi Everybody ,
    I have a directory which 8 subdirectories each subdirectory has single sra file . Now my problem i have take each sra file and input to my script which i created to converting sra to bam file . I want all my output under the same as parent directory name . How to do it ?.


    Thanks,
    Anusha.Ch
  • winsettz
    Member
    • Sep 2012
    • 91

    #2
    It sounds like you have

    upper_dir
    -dir1
    --sra_in_dir_1
    -..
    -dir8
    --sra_in_dir_8

    From NCBI (http://www.ncbi.nlm.nih.gov/Traces/s...lkit_doc&f=std)

    The following dumpers access sequence data
    sff-dump
    abi-dump
    illumina-dump
    Use sam-dump to convert SRA to SAM format. [...]
    You can convert the SRA file to FASTQ format using the fastq-dump program: fastq-dump SRR390728 will produce a file with the same name as the SRA file with the extension of .fastq[...]
    Conversion to BAM requires an additional step of converting produced SAM file to BAM using samtools from the SAMtools website http://samtools.sourceforge.net/.
    [...]
    If your SRA is aligned, then sam-dump, then convert with samtools to bam.
    If your SRA is unaligned, then you'd have to fastq-dump, map to reference, then convert sam output to bam with samtools.

    Most likely the ideal way to set up your shell script is to use a loop to go through each directory and execute on the sra found in each one.

    From a pseudo-code standpoint, it could be

    for DIR in dir1 dir2 dir3
    do
    sam-dump ./$DIR/*.sra | samtools > bam file output
    done

    Note: sam-dump output can be piped to samtools, which in turn can take streaming stdin and redirect to a bam of your choice.

    Though *.sra is rather sloppy way of capturing the name of the SRA file.

    Not sure if all SRA are mapped to a reference, this would affect what you do with the bam file. The other option is to fastq-dump, then map, then pipe the map output to samtools and convert to bam.

    Is this a homolog of (http://seqanswers.com/forums/showthread.php?t=34247)
    Last edited by winsettz; 10-04-2013, 04:22 PM.

    Comment

    • AnushaC
      Member
      • Sep 2013
      • 78

      #3
      Hi ,
      I create alignment from sam to bam using fastq-dump ,bow tie then sam tools view to convert from sra to bam format( i did for one sra file in sub directory .
      I have a /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364 home directory and 8 other sub directories under it . I want iterate over 8 subdirectories and take sra files under each subdirectory repeat my shell . Don't scolde me if i am dumb question i am beginner

      Thanks,
      Anusha.Ch

      Comment

      • rhinoceros
        Senior Member
        • Apr 2013
        • 372

        #4
        find . -name *.sra

        p.s. Your questions are not dumb. The problem is that you don't seem to put any effort from your own part into solving the problem. You're beginner? So maybe you should read some bash guide for beginners (or at the very least some relevant parts)?
        Last edited by rhinoceros; 10-05-2013, 03:49 AM.
        savetherhino.org

        Comment

        • harryzs
          Member
          • Dec 2010
          • 30

          #5
          Originally posted by rhinoceros View Post
          find . -name *.sra

          p.s. Your questions are not dumb. The problem is that you don't seem to put any effort from your own part into solving the problem.
          One vote for you

          Comment

          • AnushaC
            Member
            • Sep 2013
            • 78

            #6
            Thank you so much guys ,
            I will read the link .


            Thanks,
            Anusha.ch

            Comment

            • AnushaC
              Member
              • Sep 2013
              • 78

              #7
              Hi ,
              I understood how to use find command and list all the subdirectories in a file but now my problem is

              this is my shell script

              ./SRAtoBAM.copy.sh /raid/development/anusha/python_test/shelltest/SRR203400.sra /raid/references-and-indexes/hg19/bowtie-indexes/hg19 /raid/development/anusha/pypython_test/shelltest/SRR203400.sort.bam ---for a single sra file .

              now my find command gives me this

              find /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062 -type f -print
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203400/SRR203400.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203401/SRR203401.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203402/SRR203402.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203403/SRR203403.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203404/SRR203404.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203405/SRR203405.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203406/SRR203406.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203407/SRR203407.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203408/SRR203408.sra
              /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203409/SRR203409.sra

              which are list of sea files but when i want to execute my shell which takes sra files and store the output result with same as given sra ,i mean suppose SRR203408.sra is running i want to name my output SRR203408.bam

              Comment

              • AnushaC
                Member
                • Sep 2013
                • 78

                #8
                sorry my question was 1/2 cut
                my find command gives me list of sra files
                anusha@cn1:~> find /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062 -type f -print
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203400/SRR203400.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203401/SRR203401.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203402/SRR203402.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203403/SRR203403.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203404/SRR203404.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203405/SRR203405.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203406/SRR203406.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203407/SRR203407.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203408/SRR203408.sra
                /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203409/SRR203409.sra
                but if i want to name output of my shell that is bam file with same name what shouuld I do ?.

                Thanks,
                Anusha.Ch

                Comment

                • winsettz
                  Member
                  • Sep 2012
                  • 91

                  #9
                  This would be an excellent time for you to do a bash tutorial.

                  $variable is an initialized variable (that's the dollar sign). So for instance, one can design a for-loop that changes the value of $variable for each cycle, which is critical to your problem.

                  For instance, changing variable from SRR1, SRR2, SRR3, SRR4, you could write your workflow like so:


                  for variable in SRR1 SRR2 SRR3 ...
                  {
                  fastq-dump ./blablabla/$variable/$variable.sra | stuff | stuff
                  }

                  where the for-loop repeats the contents of the for-loop (fastq-dump ./blabla...) over and over for each valuable of $variable.

                  You could probably do something like

                  for NAME in SRR203400 SRR203401 SRR203402 SRR203403 SRR203404 SRR203405 SRR203406 SRR203407 SRR203408
                  do
                  fastq-dump /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/$NAME/$NAME.sra > $NAME.fastq
                  bowtie2 -x /raid/references-and-indexes/hg19/bowtie-indexes/hg19 -U $NAME.fastq -S $NAME.sam
                  samtools view -bS $NAME.sam | samtools sort - $NAME.sorted
                  done

                  I am not sure if bowtie can take piped stdout from a command and feed it in as stdout. But you can pipe stdout directly from bowtie2 to samtools, and obviate the need to write output to a sam file before sending it to samtools.

                  I don't have any sra files on hand for me to test at the moment, but this should be enough to get the ball rolling. You already know what directory structure you have in common (/raid/databases.../SRX062364. And considering the parent name of the directory and the name of the sra are exactly the same, you can store those as a character when iterating through.

                  The only caveat here is that you're doing all of this in serial, which may not be very time-efficient. It would be more efficient to execute these as one job at a time on cluster computing.
                  Last edited by winsettz; 10-08-2013, 02:14 PM.

                  Comment

                  • AnushaC
                    Member
                    • Sep 2013
                    • 78

                    #10
                    Hi ,
                    Firstly i don't want hardcode file names because latter i want to extend my shell many programs .
                    this is my shell


                    anusha@cn1:/raid/development/anusha/python_test/shelltest> cat SRAtoBAM.copy.sh
                    #!/bin/bash
                    # generating a fastq file using fastq dump
                    sraip=$1
                    hg19ref=$2
                    fastq-dump -Z $1 | bowtie -v 3 -k 2 --sam $2 - | samtools view -bS -o $3 -

                    i am not using bowtie2 but instead i am using old bow tie .

                    Is there any way to write generic instead of giving hardcore subdirectory names ?

                    Thanks,
                    Anusha.Ch

                    Comment

                    • AnushaC
                      Member
                      • Sep 2013
                      • 78

                      #11
                      To be specific

                      /raid/development/anusha/python_test/shelltest> ./SRAtoBAM.wc1.sh find /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062 -type f /raid/references-and-indexes/hg19/bowtie-indexes/hg19 ?
                      what should i give if i want to name my output with same name as sra file name?
                      Thanks,
                      Anusha.ch

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #12
                        In bash, given a filename with the full path stored in a variable called $f, you can strip the path and extension using:
                        Code:
                        f=${f##*/} #Strip the path
                        ${f%.sra} #strip the extension
                        BTW, the google search for this is "bash basename" and the first has the syntax and examples (I use this in a lot of scripts but always forget the exact syntax myself since I don't actually write it frequently). Much of figuring things out is just learning where to look.

                        Comment

                        • winsettz
                          Member
                          • Sep 2012
                          • 91

                          #13
                          Originally posted by dpryan View Post
                          In bash, given a filename with the full path stored in a variable called $f, you can strip the path and extension using:
                          Code:
                          f=${f##*/} #Strip the path
                          ${f%.sra} #strip the extension
                          BTW, the google search for this is "bash basename" and the first has the syntax and examples (I use this in a lot of scripts but always forget the exact syntax myself since I don't actually write it frequently). Much of figuring things out is just learning where to look.
                          Fascinating. Learn something new everyday.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            Originally posted by dpryan View Post
                            In bash, given a filename with the full path stored in a variable called $f, you can strip the path and extension using:
                            Code:
                            f=${f##*/} #Strip the path
                            ${f%.sra} #strip the extension
                            BTW, the google search for this is "bash basename" and the first has the syntax and examples (I use this in a lot of scripts but always forget the exact syntax myself since I don't actually write it frequently). Much of figuring things out is just learning where to look.
                            tcsh equivalent: http://stackoverflow.com/questions/1...a-path-in-tcsh

                            Comment

                            • AnushaC
                              Member
                              • Sep 2013
                              • 78

                              #15
                              Hi ,
                              I #bin/bash
                              set $f =/raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203400/SRR203400.sra
                              f=${f##*/} #Strip the path
                              ${f%.sra} #strip the extension
                              $echo $f
                              It did not work for me
                              I tried to
                              set j = `basename ~/foo/bar.c`
                              echo $j
                              I did try like this also
                              set i = ~/foo/bar.c
                              I did try like this

                              #!/bin/bash
                              #basename /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRX062364/SRR203400/SRR203400.sra
                              #set f=basename /raid/databases/ROADMAP/DGF/ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX062/SRR203400/SRR203400.sra
                              filename=`rev <<< "$1" | cut -d"." -f2- | rev`
                              why it is not working ?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              18 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...