Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Kurt View Post
    Noticed this a 2nd ago. It looks like you are working with single end sequencing data which you wouldn't want to remove duplicates.

    Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.


    (Item #6)

    The duplicate removal across chromosomes would only apply for paired end data.
    You should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.

    Comment


    • #32
      Originally posted by nilshomer View Post
      You should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.
      Would this still apply for a capture enrichment technology (say for Agilent's Sure Select platform or Rain Dance)? We haven't done those here for single end (and I'm not sure if we ever would), but I'm just wondering out loud at this point I guess. Sorry, I know that this doesn't necessarily apply to Keat's post.
      Last edited by Kurt; 05-07-2010, 11:13 AM. Reason: clarification

      Comment


      • #33
        In my case the data we have is paired-end but the first alignment test I've done was using bwa in single end mode. Odd choice I know, but this is actually an mRNAseq dataset and when aligned to genome the paired-end mode causes a lot of artifacts as it tries to pair reads between exons that often exceed the typical insert size.
        Regardless, I would always recommend to remove duplicates single-end or paired-end or mate-pair for that matter. In a real life example; a whole genome seq (multiple runs), 1 library, duplicates removed per run NOT across all runs, interesting biological hit is PCR artifact (identical read in multiple runs).
        Remember that for single-end reads duplicate removal limits your coverage to a max of your read length x2. Obviously it can be higher for paired-end reads were one read maybe identical but the other read is different.

        I like your version golharam, thanks for sharing
        Last edited by Jon_Keats; 01-05-2011, 11:31 PM. Reason: found error

        Comment


        • #34
          Thanks heaps for your posts Jonathan, this is very useful.
          I'm now in the same position you were a few month ago (on a SOLiD) and took the approach to first learn linux and Perl to then do some analysis (mainly because data are not there yet...). Looking forward to more interesting posts from you soon as it really helps newbies (at least me) to have a better overview of the pipeline to implement.

          Comment


          • #35
            Time to Git a Linux Machine

            I'm slowly getting back up and running after moving from my post-doc to an independent position. Other than learning how damn expensive everything is I'm slowly deciding that I should swear off the idea of a new MacPro workstation for a Linux workstation given all the issues I seem to run into with the "not quite so standard Mac OSX10.6 implementation of Unix". But with the idea of sharing my ongoing experiences and a trail I can follow to build my next machine I thought I'd update my thread. I hope some people have found it useful...

            Some new idea's and an update

            1) I'm becoming increasing certain that I'm getting good enough at command line issues to REALLY mess up my system
            2) My list of used programs continues to increase as I try each new sequencing method
            3) As per issue 1 - I'm also not reading instructions very well. New Rule - If at first you don't succeed...Go back and read the damn instructions again because most likely you didn't follow them correctly!

            New Applications to Install

            As previously stated in the post if you are using a Mac OS environment you need to do a couple of special things

            A) Install Xcode on your system (See earlier post)
            B) Install Fink on your system (See earlier post)
            - install the following fink packages:md5deep and pkgconfig
            - "fink install md5deep" (needed for bfast install)
            - "fink install pkgconfig" (needed for fastx-toolkit install)
            C) Install Git on your system (http://git-scm.com/)
            D) Create a $PATH Directory and update this directory in your .profile (See earlier post for instructions)
            - In my case "$HOME/local/bin"

            1) Install FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)

            ***Why did I get this package***
            Because I have some illumina mate-pair data that I want to analyze with BWA using sampe but the reads need to be reverse complemented to work correctly by my understanding so I'm using the fastx_reverse_complement application in the package that seems to be very fast and correctly reverse complements the reads and reverses the quality values

            Instructions:
            - Go to download page and download the following:
            a) fastx_toolkit-0.0.13.tar.bz2
            b) libgtextutils-0.6.tar.bz2
            - Move both to ngs/applications folder and unpack both packages
            - In Terminal navigate the libgtextutils folder "cd ngs/applications/libtestutils-0.6"
            - Install the package as follows:
            ./configure
            make
            sudo make install (this will ask for your password, must be admin level privileged set)

            - Move to fastx_toolkit folder "cd ../fastx_toolkit-0.0.13"
            - Install the package as follows:
            ./configure --prefix=$HOME/local/bin
            make
            make install

            - Test install by typing "fastx_uncollapser -h", this should pop up a usage documentation for this app

            2) Install Bfast, DNAA, and Breakway

            *** Why these packages***
            As you might guess from the above install I now have some mate-pair data and want to try out the Breakway package from the UCLA group but it depends on two of their other packages Bfast and DNAA

            Bfast - This package seems to be the vain of my existence but thankfully Nils and the helplist have been amazingly helpful

            Mac Related Issues:
            a) You must have fink and have installed the md5deep package otherwise "make check" will fail
            b) The current sourceforge version (0.6.4e) does not install correctly thought the previous version does, however, this is a known issue and has been fixed in the master branch (if that's a new term to you we are in the same boat) but this mean you need to use the git repository version
            c) Using ".configure -prefix=$HOME/local" works but makes DNAA mad when you install it so use sudo (time to be superman again)

            Instructions:
            a) In Terminal navigate to ngs/applications
            b) Get current Bfast version from Git (restart Terminal after git installation)
            - type "git clone git://bfast.git.sourceforge.net/gitroot/bfast/bfast"
            - this will create a folder called "bfast" in the current directory
            - Move into the directory "cd bfast"
            - Install bfast by typing the following:
            sh autogen.sh
            ./configure
            make
            make check
            sudo make install (requests a password with admin level privileges)

            - Test install and check current version by typing "bfast" in Terminal

            c) Navigate back to ngs directory by typing "cd ../"
            d) Get current version of DNAA from Git
            - type "git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa dnaa"
            - this will create a directory named "dnaa" in the current directory (ngs/applications)
            - move into the the dnaa directory by typing "cd dnaa"
            e) Because this package depends on both BFAST and SAMTOOLS you need to provide links to these application directories even though you already have them in a $PATH directory (/usr/local/bin and $HOME/local/bin repectively)
            - create a link to the BFAST package you just installed by typing "ln -s ../bfast bfast"
            - create a link to your current SAMTOOLS package by typing "ln -s ../samtools-0.1.8 samtools"
            f) Install DNAA by typing the following:
            sh autogen.sh
            ./configure
            make
            sudo make install (requests a password with admin level privileges)

            g) Download current version of BREAKWAY from sourceforge (http://sourceforge.net/projects/breakway/), move it to the ngs/applications folder and unpack it and you should be ready to go
            Last edited by Jon_Keats; 02-17-2011, 10:18 PM. Reason: found typo in url

            Comment


            • #36
              Building a Paired-End Pipeline

              Up till now I've been frustrated because I could not automate a variety of pairing steps that occur as I process raw data to BAM files. This is usually either in the SAMPE step of BWA or when I wanted to merge multiple lanes into one BAM file. I think I convinced myself that I can just use the "cat" function to merge the multiple lanes together before processing, which ends up being a simple solution as long as all the lanes are available at the same time. For the SAMPE pairing I spent sometime with my Unix guru from France when he came over to visit his wife and I seem to have a workable solution as long as a specific file tree structure is used in conjunction with two unix scripts, one that processes each pair from raw data to two sort BAM files, one with and without duplicates, and a second that pulls each sample into the analysis framework and launches the aforementioned script. So since this requires a specific directory structure I've updated my directory structure script to version 3.

              Code:
              #!/bin/sh
              
              # Create_NGS_DirectoryStructure_V3.sh
              # 
              #
              # Created by Jonathan Keats on 9/3/10 based on suggestion from Ryan Golhar on my Seqanswers thread.
              # Translational Genomics Research Institute
              #
              #########################################################################
              #  CREATES A DIRECTORY STURCTURE TO SUPPORT A VARIETY OF NGS PIPELINES  #
              #########################################################################
              #
              # Designed for a Mac OS enviroment and requires initiation from your home folder (/User/You/)
              
              # Check to confirm current location is $HOME/ (ie. /User/You/)
              
              echo "Confirming Script Initiation Directory"
              var1=$HOME
              if [ "`pwd`" != "$var1" ] 
              	then 
              	echo " The script must be launched from your home directory "
              	echo " The script was automatically killed due to a launch error - See Above Error Message" 
              	exit 2                              
              fi
              echo "1) Launch Location is Correct ($HOME/)"
              
              # Create required directories to support pipelines (BWAse, BWApe, and others to come...)
              
              echo ***Creating Pipeline Directory Structure***
              mkdir -p ngs/{analysisnotes,applications,scripts}
              mkdir -p ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,genome_downloads}
              mkdir -p ngs/refgenomes/genome_downloads/{hg18,hg19}
              mkdir -p ngs/finaloutputs/{alignmentresults_bwa,illumina,sangerfastq}
              mkdir -p ngs/finaloutputs/bamfiles/{merged,sorted,nodups}
              mkdir -p ngs/bwase/inputsequences/{illumina,sangerfastq}
              mkdir -p ngs/bwase/samfiles
              mkdir -p ngs/bwase/bamfiles/{merged,original,sorted,nodups}
              mkdir -p ngs/bwape/samfiles
              mkdir -p ngs/bwape/bamfiles/{merged,original,sorted,nodups}
              mkdir -p ngs/bwape/inputsequences/{illumina,sangerfastq,hold}
              mkdir -p ngs/bwape/inputsequences/illumina/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/sangerfastq/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/{lane1,lane2,lane3,lane4,lane5,lane6,lane7,lane8}
              mkdir -p ngs/bwape/inputsequences/hold/lane1/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane2/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane3/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane4/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane5/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane6/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane7/{read1,read2}
              mkdir -p ngs/bwape/inputsequences/hold/lane8/{read1,read2}
              
              mv create_ngs_directorystructure_v3.sh ngs/scripts/
              echo ***Pipeline Directory Structure Created***
              Last edited by Jon_Keats; 09-13-2010, 04:58 PM.

              Comment


              • #37
                Just wanted to chime in and affirm the sentiment that several people have already expressed; thank you for your posts, they are quite helpful. I am just starting out and this forum and your posts are eminently helpful. Thank you for taking the time to post!

                Comment


                • #38
                  Hello i am ronnie .. i am from chandigath city... currently i am btech student..and its a nice informative thread...

                  Comment


                  • #39
                    BWA SAMPE Pipeline Version

                    As I mentioned before its taken a while to sort out a method that can automate a paired-end analysis using BWA but it seems to work now. Feel free to use the scripts below in conjunction with the "create_ngs_directorystructure_v3.sh" script that creates the required directory structure.
                    The following two scripts can be used to process files using BWA to automate a paired-end analysis from the output "s_x_sequence.txt" files to aligned, indexed, and duplicate removed BAM files. The design of the pipeline has a couple of requirements:

                    1) You need to have all the required applications in a $PATH directory. As detailed in this thread I personally use "$HOME/local/bin".
                    2) You will need; MAQ with ill2sanger patch installed, BWA, SAMTOOLS, and PICARD MarkDuplicates.jar in this path directory.
                    NOTE: If you use a different path directory you need to alter line 623 of BWApe_hg18_v1.sh as MarkDuplicates.jar is being called specifically from this directory while all others are being called through the $PATH directory. ****If you know how to put a directory in the JAVA path on a Mac drop me a line****
                    3) Both shell scripts are designed to be in your $PATH directory so you can call them from the ngs directory using "BWApe_hg18_v1.sh" for a single sample analysis or "multi_bwape_analysis_v1.sh" for a multiple sample analysis. Alternatively, you can place them in the "/ngs" folder and call them directly using "./BWApe_hg18_v1.sh" or "./multi_bwape_analysis_v1.sh" (NOTE: If you do this you need to modify the lines that launch BWApe_hg18_v1.sh to include the direct launch indicator "./"
                    4) The input file names must be unique and end with a "_R1.txt" read identifier such as "YourSample_R1.txt" and "YourSample_R2.txt"

                    NOTE: The name BWApe_hg18_v1.sh only reflects the reference genome used in the development of the script. You can easily change to what ever genome mouse, human you want to use you just need to generate the bwa index and update the BWApe_hg18_v1.sh script as indicated in the script.

                    NOTE
                    : If using "BWApe_hg18_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2". If using "multi_bwape_analysis_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/hold/laneX/read1" and "ngs/bwape/inputsequences/hold/laneX/read2" as appropriate to your sample set. The script is only designed for 8 lanes/samples so if you have more you needed to copy/paste to extend the script. After completing each lane/sample it checks to see if there is data for another lane/sample in the next sequential lane/sample folder and process it if available or ends the script if it is empty, so you need to put files in the hold/lane1, 2, 3, 4, 5, 6, 7, and 8 read folders in order.

                    Code:
                    #!/bin/sh
                    
                    # BWApe_hg18_V1.sh
                    # Created by Jonathan Keats
                    # Translational Genomics Research Institute
                    
                    # This script is designed to take a batch of raw Illumina 1.3+ reads to sorted and indexed BAM files with and without duplicates using BWA in paired end mode.
                    # It is designed to be initiated from a folder called "ngs" in your $HOME folder with a specific subdirectory structure
                    # To create the directory struture launch "create_ngs_directorystructure_v3.sh" from your "$Home" folder
                    
                    ####################################################################################################
                    ##  To Run This Script You Must Have The Following Applications In One Of Your $PATH Directories  ##
                    ##						1) MAQ with ill2sanger patch installed									  ##
                    ##						2) BWA																	  ##
                    ##						3) SAMTOOLS																  ##
                    ##						4) PICARD - MarkDuplicates.jar (Must be in $HOME/local/bin)				  ##
                    ####################################################################################################
                    
                    # To run this script you MUST first place your reference file in ngs/refgenomes/bwa_indexed and have run the "bwa index" command to create the BWT index files
                    
                    ######################################################################################################
                    # WARNING - YOU MUST ENSURE THE NAME OF YOUR REFERENCE GENOME FILE MATCHES LINES (274, 310, and 367) #
                    ###################################################################################################### 
                    
                    # The script is based on having ***RENAMED*** Illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2"
                    # The renamed format ***MUST*** be "YourSampleName_R1.txt" and "YourSampleName_R2.txt" otherwise pairing and renaming will not occur correctly
                    # Multiple lanes should be concatinated together before initiating the script, unless you want to manually merge in samtools 
                    # At each step it queries specific folders for available files and passes them to the next analysis module
                    # After each step the filename extension of the output files are corrected. (ie. "MySequenceFile_R1.txt.fastq" to "MySequenceFile_R1.fastq")
                    # Order of Embedded Steps	- Converts Illumina 1.3+ fastq files "s_1_sequence.txt" to Sanger fastq files "s_1_sequence.fastq" using "maq ill2sanger" command
                    #							- Aligns created fastq files to reference genome using "bwa aln" command
                    #							- Generates SAM files from alignment files using "bwa sampe" command
                    #							- Converts SAM files to BAM files using "samtools view" command
                    #							- Sorts BAM files using "samtools sort" command
                    #							- Indexes the sorted BAM files for use in IGV browser using "samtools index" command
                    #							- Removes duplicates from the sorted bam files using "picard - MarkDuplicates.jar" command
                    #							- Indexes the no duplicates BAM files for use in IGV browser using "samtools index" command
                    #							- Final output files are archived then the input and analysis directories are cleaned-up and readied for the next analysis batch
                    # The script creates a log file in /ngs/analysisnotes to track the steps completed and the time each step started and finished
                    # Some of the log events will print to both the terminal screen and the log file so you can see what is going on
                    # Much of this would not be possible with out the help of a former colleagues husband who is a Unix programmer in France so I've kept some french terms such as linge instead of line in his honor (thanks Charabelle)
                    
                    #Starting directory = $HOME/ngs
                    
                    #In this step	- We check that you are lauching the script from the correct location in case you are using it from a path directory
                    #				- We check that the destination directories used by the script are empty to prevent deleting erroneous files and unexpected analysis events
                    #				- Hope to add a check for available disk space
                    
                    echo ***Checking Diretory Structure***
                    
                    #List of directoryies to check
                    var1=$HOME/ngs
                    var2=$HOME/ngs/bwape/samfiles
                    var3=$HOME/ngs/bwape/bamfiles/merged
                    var4=$HOME/ngs/bwape/bamfiles/original
                    var5=$HOME/ngs/bwape/bamfiles/sorted
                    var6=$HOME/ngs/bwape/bamfiles/nodups
                    var7=$HOME/ngs/bwape/inputsequences/sangerfastq/read1
                    var8=$HOME/ngs/bwape/inputsequences/sangerfastq/read2
                    
                    #Checking if launch location is correct
                    
                    if [ "`pwd`" != "$var1" ] 
                    	then 
                    	echo " The script must be launched from the NGS directory "
                    	echo " The script was automatically killed due to a launch error - See Above Error Message" 
                    	exit 2                              
                    fi
                    echo "1) Launch Location is Correct ($HOME/ngs)"
                    
                    #Checking if analysis directories are empty
                    
                    
                    if [ `ls $var2 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/samfiles directory is not empty - Any data in this directory would be deleted by the script "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "2) bwape/samfiles directory is empty as required"
                    if [ `ls $var3 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/bamfiles/merged directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/merged "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "3) bwape/bamfiles/merged directory is empty as required"
                    if [ `ls $var4 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/bamfiles/original directory is not empty - Any data in this directory would be deleted by the script "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "4) bwape/bamfiles/original directory is empty as required"
                    if [ `ls $var5 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/bamfiles/sorted directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/sorted by the script "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "5) bwape/bamfiles/sorted directory is empty as required"
                    if [ `ls $var6 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/bamfiles/nodups directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "6) bwape/bamfiles/nodups directory is empty as required"
                    if [ `ls $var7 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/illuminasequences/sangerfastq/read1 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "7) bwape/illuminasequences/sangerfastq/read1 directory is empty as required"
                    if [ `ls $var8 | wc -l` != 0 ]       
                    	then 
                    	echo " The bwape/illuminasequences/sangerfastq/read2 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo "8) bwape/illuminasequences/sangerfastq/read2 directory is empty as required"
                    
                    echo ***Pre Run Check Completed Successfully***
                    
                    #Current directory=ngs
                    echo ***Starting BWA SAMPE Analysis Batch***
                    date '+%m/%d/%y %H:%M:%S'
                    
                    #The following step creates the log file in the AnalysisNotes subdirectory the first time the script is run
                    #On subsequent runs the results are printed at the bottom of the pre-existing log file
                    
                    echo ***Starting BWA SAMPE Analysis Batch*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we convert the "Read1" illumina fastq files to sanger fastq files using the maq ill2sanger script
                    
                    echo Starting Step1a - Read1 Illumina to Sanger Fastq Conversion with maq ill2sanger
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step1a - Illumina to Sanger Fastq Conversion with maq ill2sanger >> AnalysisNotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    cd bwape/inputsequences/illumina/read1
                    #Current directory = ngs/bwape/inputsequences/illumina/read1
                    echo Converting the following Illumina files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.txt`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    maq ill2sanger $ligne ../../sangerfastq/read1/$ligne.fastq
                    done
                    
                    #In the next step we clean up the Illumina Read1 folder so it is ready for the next analysis batch
                    
                    echo Cleaning up Input Sequences Illumina Read1
                    date '+%m/%d/%y %H:%M:%S'
                    echo Cleaning up Input Sequences Illumina Read1 >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read1 to ngs/finaloutputs/illumina:
                    for ligne in `ls *.txt`
                    do
                    echo $ligne
                    done
                    echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read1 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.txt`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../../finaloutputs/illumina
                    done
                    
                    #In the next step we rename the "Read1" sanger format fastq files from ".txt.fastq" extensions to ".fastq"
                    
                    cd ../../sangerfastq/read1
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read1
                    old_ext=txt.fastq
                    new_ext=fastq
                    find . -type f -name "*.$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done
                    echo Finished Step1a - Illumina to Sanger Fastq Conversion
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step1a - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    #In the next step we convert the "Read2" illumina fastq files to sanger fastq files using the maq ill2sanger script
                    
                    cd ../../illumina/read2
                    #Current directory = ngs/bwape/inputsequences/illumina/read2
                    echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger >> ../../../../AnalysisNotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    echo Converting the following Illumina files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.txt`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    maq ill2sanger $ligne ../../sangerfastq/read2/$ligne.fastq
                    done
                    
                    #In the next step we clean up the Illumina Read2 folder so it is ready for the next analysis batch
                    
                    echo Cleaning up Input Sequences Illumina Read2
                    date '+%m/%d/%y %H:%M:%S'
                    echo Cleaning up Input Sequences Illumina Read2 >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read2 to ngs/finaloutputs/illumina:
                    for ligne in `ls *.txt`
                    do
                    echo $ligne
                    done
                    echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read2 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.txt`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../../finaloutputs/illumina
                    done
                    
                    #In the next step we rename the "Read2" sanger format fastq files from ".txt.fastq" extensions to ".fastq"
                    
                    cd ../../sangerfastq/read2
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read2
                    old_ext=txt.fastq
                    new_ext=fastq
                    find . -type f -name "*.$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done
                    echo Finished Step1b - Illumina to Sanger Fastq Conversion
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step1b - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    #In the next step we will align the converted "Read1" sanger fastq format files to the reference genome
                    
                    echo Starting Step2a - Read1 bwa aln process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step2a - Read1 bwa aln process >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    cd ../read1
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read1
                    echo The following fastq files will be aligned:
                    for ligne in `ls *.fastq`
                    do                                                                     
                    echo $ligne
                    done
                    echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.fastq`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.fastq`
                    do
                    bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai 	 
                    done
                    
                    #In the next step we will rename the "Read1" alignment files
                    
                    old_ext=.fastq.sai
                    new_ext=_bwa.sai
                    find . -type f -name "*$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done 
                    echo Finished Step2a - bwa aln process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step2a - bwa aln process >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    #In the next step we will align the converted "Read2" sanger fastq format files to the reference genome
                    
                    echo Starting Step2b - Read2 bwa aln process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step2b - Read2 bwa aln process >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    cd ../read2
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read2
                    echo The following fastq files will be aligned:
                    for ligne in `ls *.fastq`
                    do                                                                     
                    echo $ligne
                    done
                    echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.fastq`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.fastq`
                    do
                    bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai 	 
                    done
                    
                    #In the next step we will rename the "Read2" alignment files
                    
                    old_ext=.fastq.sai
                    new_ext=_bwa.sai
                    find . -type f -name "*$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done 
                    echo Finished Step2b - bwa aln process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step2b - bwa aln process >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    #In the next step we will generate SAM files for the alignments using bwa sampe
                    
                    echo Starting Step3 - bwa sampe process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step3 - bwa sampe process >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    echo The following alignment files will be converted to SAM files:
                    
                    cd ../read1
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read1
                    for ligne in `ls *.sai`
                    do                                                                     
                    aln1=`echo $ligne`
                    done
                    echo $aln1
                    for ligne in `ls *.fastq`
                    do                                                                     
                    read1=`echo $ligne`
                    done
                    echo $read1
                    
                    cd ../read2
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read2
                    for ligne in `ls *.sai`
                    do                                                                     
                    aln2=`echo $ligne`
                    done
                    echo $aln2
                    for ligne in `ls *.fastq`
                    do                                                                     
                    read2=`echo $ligne`
                    done
                    echo $read2
                    echo The following alignment files will be converted to SAM files: >> ../../../../analysisnotes/Analysis.log
                    echo $aln1 >> ../../../../analysisnotes/Analysis.log
                    echo $read1 >> ../../../../analysisnotes/Analysis.log
                    echo $aln2 >> ../../../../analysisnotes/Analysis.log
                    echo $read2 >> ../../../../analysisnotes/Analysis.log
                    cd ../../../samfiles
                    #Current directory = ngs/bwape/samfiles
                    #(bwa sampe <database.fasta> <aln1.sai> <aln2.sai> <input1.fastq> <input2.fastq> > aln.sam)
                    bwa sampe ../../refgenomes/bwa_indexed/hg18.fasta ../inputsequences/sangerfastq/read1/$aln1 ../inputsequences/sangerfastq/read2/$aln2 ../inputsequences/sangerfastq/read1/$read1 ../inputsequences/sangerfastq/read2/$read2 > $read1.sam
                    
                    #In the next step we will rename the SAM files generated by bwa sampe analysis of the "Read1" and "Read2" alignment files
                    
                    old_ext=_R1.fastq.sam
                    new_ext=_bwape.sam
                    find . -type f -name "*$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done 
                    echo Finished Step3 - bwa sampe process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step3 - bwa sampe process >> ../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log
                    
                    #In the next step we will convert each SAM file to a BAM file
                    
                    echo Starting Step4 - samtools SAM to BAM conversion
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step4 - samtools SAM to BAM conversion >> ../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log
                    echo The following SAM files will be converted to BAM files:
                    for ligne in `ls *.sam`
                    do                                                                     
                    echo $ligne
                    done
                    echo The following SAM files will be converted to BAM files: >> ../../analysisnotes/Analysis.log
                    for ligne in `ls *.sam`
                    do                                                                     
                    echo $ligne >> ../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.sam`
                    do
                    samtools view -bS -o ../bamfiles/original/$ligne.bam $ligne
                    done
                    
                    #In the next step we will delete the SAM file to save disc space as the BAM file contains all the data in a binary format
                    
                    echo Deleting the following SAM Files from ngs/bwape/samfiles:
                    for ligne in `ls *.sam`
                    do
                    echo $ligne
                    done
                    echo Deleting the following SAM Files from ngs/bwape/samfiles: >> ../../analysisnotes/Analysis.log
                    for ligne in `ls *.sam`
                    do
                    echo $ligne >> ../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.sam`
                    do
                    rm $ligne
                    done
                    echo Deleting SAM Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Deleting SAM Files Complete >> ../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log
                    
                    #In the next step we clean up the Sanger Fastq "Read1" folder so it is ready for the next analyis batch
                    
                    cd ../inputsequences/sangerfastq/read1
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read1
                    echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq:
                    for ligne in `ls *.fastq`
                    do
                    echo $ligne
                    done
                    echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.fastq`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.fastq`
                    do
                    mv $ligne ../../../../finaloutputs/sangerfastq/
                    done
                    echo Moving Sanger Format Fastq Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa:
                    for ligne in `ls *.sai`
                    do
                    echo $ligne
                    done
                    echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.sai`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.sai`
                    do
                    mv $ligne ../../../../finaloutputs/alignmentresults_bwa/
                    done
                    echo Moving Alignment Results Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    #In the next step we clean up the Sanger Fastq "Read2" folder so it is ready for the next analyis batch
                    
                    cd ../read2
                    #Current directory = ngs/bwape/inputsequences/sangerfastq/read2
                    echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq:
                    for ligne in `ls *.fastq`
                    do
                    echo $ligne
                    done
                    echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.fastq`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.fastq`
                    do
                    mv $ligne ../../../../finaloutputs/sangerfastq/
                    done
                    echo Moving Sanger Format Fastq Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa:
                    for ligne in `ls *.sai`
                    do
                    echo $ligne
                    done
                    echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log
                    for ligne in `ls *.sai`
                    do
                    echo $ligne >> ../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.sai`
                    do
                    mv $ligne ../../../../finaloutputs/alignmentresults_bwa/
                    done
                    echo Moving Alignment Results Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
                    
                    #In the next step we will rename the BAM files created by the samtools SAM-to-BAM conversion process
                    
                    cd ../../../bamfiles/original
                    #Current directory = ngs/bwape/bamfiles/original
                    old_ext=sam.bam
                    new_ext=bam
                    find . -type f -name "*.$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done 
                    echo Finished Step4 - samtools SAM to BAM conversion
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step4 - samtools SAM to BAM conversion >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we will sort the BAM file by chromosome coordinate
                    
                    echo Starting Step5 - samtools BAM sorting process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    echo The following BAM files will be sorted:
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne
                    done
                    echo The following BAM files will be sorted: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do                                                                     
                    samtools sort $ligne ../Sorted/$ligne
                    done
                    
                    #In the next step we will delete the original unsorted BAM file to save disc space as the sorted BAM contains all the needed information
                    
                    echo Deleting the following BAM Files from ngs/bwape/bamfiles/original:
                    for ligne in `ls *.bam`
                    do
                    echo $ligne
                    done
                    echo Deleting the following BAM Files from ngs/bwape/bamfiles/original: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do
                    rm $ligne
                    done
                    echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we will rename the sort BAM files created by the samtools sort process
                    
                    cd ../sorted
                    #Current directory = ngs/bwape/bamfiles/sorted
                    old_ext=.bam.bam
                    new_ext=_sorted.bam
                    find . -type f -name "*$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done 
                    echo Finished Step5 - samtools BAM sorting process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we will index the sorted BAM files for fast access and viewing in the IGV browser
                    
                    echo Starting Step6 - samtools BAM indexing process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    echo The following BAM files will be indexed:
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne
                    done
                    echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do                                                                     
                    samtools index $ligne
                    done
                    echo Finished Step6 - samtools BAM indexing process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we will remove the duplicate reads from the sorted bam files
                    
                    echo Starting Step7 - picard markduplicates process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    echo Duplicate reads will be removed from the following sorted BAM files:
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne
                    done
                    echo Duplicate reads will be removed from the following sorted BAM files: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do                                                                     
                    java -Xmx2g -jar $HOME/local/bin/MarkDuplicates.jar INPUT=$ligne OUTPUT=../nodups/$ligne METRICS_FILE=../nodups/$ligne.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT
                    done
                    
                    #In the next step we clean up the Sorted BAM files folder so it is ready for the next analyis batch
                    
                    echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted:
                    for ligne in `ls *.bam`
                    do
                    echo $ligne
                    done
                    echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do
                    mv $ligne ../../../finaloutputs/bamfiles/sorted/
                    done
                    echo Moving Sorted BAM Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Sorted BAM Files Complete >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted:
                    for ligne in `ls *.bai`
                    do
                    echo $ligne
                    done
                    echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bai`
                    do
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bai`
                    do
                    mv $ligne ../../../finaloutputs/bamfiles/sorted/
                    done
                    echo Moving Sorted BAM Index Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Sorted BAM Index Files Complete >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we will rename the BAM files and Metrics files created after duplicate removal by picard
                    
                    cd ../nodups
                    #Current directory = ngs/bwape/bamfiles/nodups
                    old_ext=_sorted.bam
                    new_ext=_sorted_nodups.bam
                    find . -type f -name "*$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done
                    old_ext=_sorted.bam.txt
                    new_ext=_sorted_nodups_metrics.txt
                    find . -type f -name "*$old_ext" -print | while read file
                    do
                        mv $file ${file%${old_ext}}${new_ext}
                    done
                    echo Finished Step7 - picard markduplicates process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we will index the nodups BAM files for fast access and viewing in the IGV browser
                    
                    echo Starting Step8 - samtools BAM indexing process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Starting Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    echo The following BAM files will be indexed:
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne
                    done
                    echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do                                                                     
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do                                                                     
                    samtools index $ligne
                    done
                    
                    #In the next step we clean up the nodups BAM files folder so it is ready for the next analyis batch
                    
                    echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
                    for ligne in `ls *.bam`
                    do
                    echo $ligne
                    done
                    echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bam`
                    do
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bam`
                    do
                    mv $ligne ../../../finaloutputs/bamfiles/nodups/
                    done
                    echo Moving NoDups BAM Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving NoDups BAM Files Complete >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
                    for ligne in `ls *.bai`
                    do
                    echo $ligne
                    done
                    echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.bai`
                    do
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.bai`
                    do
                    mv $ligne ../../../finaloutputs/bamfiles/nodups/
                    done
                    echo Moving NoDups BAM Index Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving NoDups BAM Index Files Complete >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
                    for ligne in `ls *.txt`
                    do
                    echo $ligne
                    done
                    echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
                    for ligne in `ls *.txt`
                    do
                    echo $ligne >> ../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../finaloutputs/bamfiles/nodups/
                    done
                    echo Moving MarkDuplicates Metrics Files Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving MarkDuplicates Metrics Files Complete >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    echo Finished Step8 - samtools BAM indexing process
                    date '+%m/%d/%y %H:%M:%S'
                    echo Finished Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
                    
                    #In the next step we return to the launch folder $HOME/Documents/ngs
                    
                    cd ../../..
                    #Current directory = ngs/
                    echo ***Analysis Batch Complete***
                    echo ***Analysis Batch Complete*** >> analysisnotes/Analysis.log

                    Code:
                    #!/bin/sh
                    
                    # multi_bwape_analysis_v1.sh
                    # 
                    #
                    # Created by Jonathan Keats on 9/5/10.
                    # Translational Genomics Research Institute
                    
                    # This script is designed to allow multiple samples/lanes of paired-end illumina data to be passed into the "BWApe_hg18_v1" pipeline
                    
                    ###############################################################################################################################
                    ## To facilitate its use you must put uniquely named Illumina 1.3+ files in ngs/bwape/inputsequences/hold/lane(X)/read(1-2)   #
                    ## It is essential that these file names are uniquely name or overwriting will occur										  #
                    ## These files MUST have the ".txt" extension characteristic of the Illumina V1.3+ output "s_x_sequences.txt"				  #
                    ###############################################################################################################################
                    
                    #In this step we check that you are lauching the script from the correct location in case you are using it from a path directory
                    
                    echo ***Checking Current Directory is Correct***
                    
                    #List of directoryies to check
                    temp1=$HOME/ngs
                    
                    #Checking if launch location is correct
                    
                    if [ "`pwd`" != "$temp1" ] 
                    	then 
                    	echo " The script must be launched from the NGS directory "
                    	echo " The script was automatically killed due to a launch error - See Above Error Message" 
                    	exit 2                              
                    fi
                    echo ***Current	Directory is Correct***
                    
                    #Check if files exist in the lane1 hold folder
                    
                    #List of directoryies to check
                    temp2=$HOME/ngs/bwape/inputsequences/hold/lane1/read1
                    temp3=$HOME/ngs/bwape/inputsequences/hold/lane1/read2
                    
                    echo ***Checking Lane1 Hold Folder***
                    if [ `ls $temp2 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane1 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp3 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane1 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    #Current directory=ngs
                    echo ***Starting The Analysis of Lane1***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane1*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane1" data from ngs/bwape/inputsequences/hold/lane1/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane1/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane1/read1
                    echo Moving Lane1 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane1 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane1/read2
                    echo Moving Lane1 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane1 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane1 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample2/lane2 data into the analysis directories
                    
                    #Check if files exist in the lane2 hold folder
                    
                    #List of directoryies to check
                    temp4=$HOME/ngs/bwape/inputsequences/hold/lane2/read1
                    temp5=$HOME/ngs/bwape/inputsequences/hold/lane2/read2
                    
                    echo ***Checking Lane2 Hold Folder***
                    if [ `ls $temp4 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane2 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp5 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane2 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    echo ***Starting The Analysis of Lane2***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane2*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane2" data from ngs/bwape/inputsequences/hold/lane2/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane2/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane2/read1
                    echo Moving Lane2 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane2 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane2/read2
                    echo Moving Lane2 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane2 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane2 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample3/lane3 data into the analysis directories
                    
                    #Check if files exist in the lane3 hold folder
                    
                    #List of directoryies to check
                    temp6=$HOME/ngs/bwape/inputsequences/hold/lane3/read1
                    temp7=$HOME/ngs/bwape/inputsequences/hold/lane3/read2
                    
                    echo ***Checking Lane3 Hold Folder***
                    if [ `ls $temp6 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane3 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp7 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane3 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    #Current directory=ngs
                    echo ***Starting The Analysis of Lane3***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane3*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane3" data from ngs/bwape/inputsequences/hold/lane3/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane3/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane3/read1
                    echo Moving Lane3 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane3 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane3/read2
                    echo Moving Lane3 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane3 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane3 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample4/lane4 data into the analysis directories
                    
                    #Check if files exist in the lane4 hold folder
                    
                    #List of directoryies to check
                    temp8=$HOME/ngs/bwape/inputsequences/hold/lane4/read1
                    temp9=$HOME/ngs/bwape/inputsequences/hold/lane4/read2
                    
                    echo ***Checking Lane4 Hold Folder***
                    if [ `ls $temp8 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane4 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp9 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane4 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    echo ***Starting The Analysis of Lane4***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane4*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane4" data from ngs/bwape/inputsequences/hold/lane4/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane4/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane4/read1
                    echo Moving Lane4 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane4 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane4/read2
                    echo Moving Lane4 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane4 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane4 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample5/lane5 data into the analysis directories
                    
                    #Check if files exist in the lane1 hold folder
                    
                    #List of directoryies to check
                    temp10=$HOME/ngs/bwape/inputsequences/hold/lane5/read1
                    temp11=$HOME/ngs/bwape/inputsequences/hold/lane5/read2
                    
                    echo ***Checking Lane1 Hold Folder***
                    if [ `ls $temp10 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane5 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp11 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane5 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    #Current directory=ngs
                    echo ***Starting The Analysis of Lane5***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane5*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane5" data from ngs/bwape/inputsequences/hold/lane5/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane5/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane5/read1
                    echo Moving Lane5 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane5 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane5/read2
                    echo Moving Lane5 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane5 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane5 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample6/lane6 data into the analysis directories
                    
                    #Check if files exist in the lane6 hold folder
                    
                    #List of directoryies to check
                    temp12=$HOME/ngs/bwape/inputsequences/hold/lane6/read1
                    temp13=$HOME/ngs/bwape/inputsequences/hold/lane6/read2
                    
                    echo ***Checking Lane6 Hold Folder***
                    if [ `ls $temp12 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane6 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp13 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane6 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    echo ***Starting The Analysis of Lane6***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane6*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane6" data from ngs/bwape/inputsequences/hold/lane6/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane6/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane6/read1
                    echo Moving Lane6 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane6 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane6/read2
                    echo Moving Lane6 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane6 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane6 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample7/lane7 data into the analysis directories
                    
                    #Check if files exist in the lane7 hold folder
                    
                    #List of directoryies to check
                    temp14=$HOME/ngs/bwape/inputsequences/hold/lane7/read1
                    temp15=$HOME/ngs/bwape/inputsequences/hold/lane7/read2
                    
                    echo ***Checking Lane7 Hold Folder***
                    if [ `ls $temp14 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane7 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp15 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane7 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    #Current directory=ngs
                    echo ***Starting The Analysis of Lane7***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane7*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane7" data from ngs/bwape/inputsequences/hold/lane7/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane7/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane7/read1
                    echo Moving Lane7 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane7 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane3/read2
                    echo Moving Lane7 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane7 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane7 Analysis Complete***
                    
                    # The analysis directory should now be empty and we can now load the sample8/lane8 data into the analysis directories
                    
                    #Check if files exist in the lane8 hold folder
                    
                    #List of directoryies to check
                    temp16=$HOME/ngs/bwape/inputsequences/hold/lane8/read1
                    temp17=$HOME/ngs/bwape/inputsequences/hold/lane8/read2
                    
                    echo ***Checking Lane8 Hold Folder***
                    if [ `ls $temp16 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane8 Read1 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    if [ `ls $temp17 | wc -l` != 1 ]       
                    	then 
                    	echo " The Lane8 Read2 hold folder does not contain the expect single file "
                    	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
                    	exit 2                     
                    fi
                    echo ***Found Expected Files***
                    echo ***Starting The Analysis of Lane8***
                    date '+%m/%d/%y %H:%M:%S'
                    echo ***Starting The Analysis of Lane8*** >> analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
                    
                    #In the next step we move the "Lane8" data from ngs/bwape/inputsequences/hold/lane8/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)
                    
                    cd bwape/inputsequences/hold/lane8/read1
                    #Current Directory=ngs/bwape/inputsequences/hold/lane8/read1
                    echo Moving Lane8 Read1 File to Read1 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane8 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read1/
                    done
                    echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../read2
                    #Current Directory=ngs/bwape/inputsequences/hold/lane8/read2
                    echo Moving Lane8 Read2 File to Read2 Analysis Directory
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane8 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    echo Moving the following files:
                    for ligne in `ls *.txt`
                    do                                                                     
                    echo $ligne
                    done
                    for ligne in `ls *.txt`
                    do
                    echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
                    done
                    for ligne in `ls *.txt`
                    do
                    mv $ligne ../../../illumina/read2/
                    done
                    echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete
                    date '+%m/%d/%y %H:%M:%S'
                    echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
                    date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
                    
                    cd ../../../../../
                    #Current Directory=ngs/
                    
                    # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe
                    
                    BWApe_hg18_v1.sh
                    
                    echo ***Lane8 Analysis Complete***
                    Last edited by Jon_Keats; 10-04-2010, 11:09 AM. Reason: Fixed bug in pipeline script

                    Comment


                    • #40
                      Best post

                      Glad I found this, so far its the best post I've seen here. Thanks for the help!

                      Comment


                      • #41
                        Jon Keats,
                        Good job! I dont think you are alone there are several researchers in the same situation. I am also one.

                        Good luck!

                        Comment


                        • #42
                          Getting the Mens Formal Wear Packages Going

                          I've finally jumped into the TopHat-Cufflinks world for RNAseq analysis. Because most of the pre-compiled binaries are for Mac OSX10.5 not 10.6 I've built all the binaries from the source code. As previous I've included detailed instructions on the install.

                          1) Install bowtie

                          - Download current version
                          (http://sourceforge.net/projects/bowtie-bio/files/bowtie)
                          - Move to applications folder (ngs/applications)
                          - Decompress
                          - Using terminal navigate to the unpacked bowtie folder
                          - To make the package type "make"
                          - Copy "bowtie", "bowtie-build", and "bowtie-inspect" to your path directory
                          - If you follow this thread I use $HOME/local/bin
                          - Thus type: "cp bowtie $HOME/local/bin"
                          "cp bowtie-build $HOME/local/bin"
                          "cp bowtie-inspect $HOME/local/bin"

                          2) Install Boost and Configure $PATH directory to support tophat and cufflinks install

                          *** If not installed, download and install Samtools and copy the binary to $PATH directory ($HOME/local/bin)***
                          *** See previous posts if you need instructions ***

                          - Download Boost version 1.45.0 (http://www.boost.org/) [boost_1_45_0.tar.bz2]
                          - Move to applications folder (ngs/applications)
                          - Decompress the package (double click)
                          - Using terminal navigate to the decompressed folder (ngs/applications/boost_1_45_0)
                          - Build the package
                          - Type "./bootstrap.sh"
                          - Type "./bjam --prefix=$HOME/local --toolset=darwin architecture=x86 address-model=32_64 link=static runtime-link=static --layout=versioned stage install"
                          *** This will create "include" and "lib" subfolders in $HOME/local/ ***

                          - In the new "include" folder create a subfolder "bam"
                          - Using terminal navigate to the samtools folder in the ngs/applications folder
                          - Copy the "libbam.a" file in the samtools folder to $HOME/local/lib
                          - Type "cp libbam.a $HOME/local/lib"
                          - Copy the header files (files ending in .h) to $HOME/include/bam
                          - Type "cp *.h $HOME/include/bam"

                          3) Install tophat

                          - Download current version (http://tophat.cbcb.umd.edu/)
                          - Move to applications folder (ngs/applications)
                          - Using terminal navigate to the applications folder
                          - Decompress the package
                          - Type "tar zxvf tophat-1.2.0.tar.gz"
                          - Navigate into the decompressed folder
                          - Type "cd tophat-1.2.0"
                          - Build the package
                          - Type "./configure --prefix=$HOME/local --with-bam=$HOME/local"
                          - Type "make"
                          - Type "make install"
                          *** The executable is now available in your $PATH directory ***

                          4) Install Cufflinks

                          - Download current version (http://cufflinks.cbcb.umd.edu/tutorial.html)
                          - Move to applications folder (ngs/applications)
                          - Using terminal navigate to the applications folder
                          - Decompress the package
                          - Type "tar zxvf cufflinks-0.9.3.tar.gz"
                          - Navigate into the decompressed folder
                          - Type "cd cufflinks-0.9.3"
                          - Build the package
                          - Type "./configure --prefix=$HOME/local --with-boost=$HOME/local --with-bam=$HOME/local"
                          - Type "make"
                          - Type "make install"
                          ***The executable is now available in your $PATH directory***

                          5) Test the installs

                          - Navigate to the bowtie folder
                          - Type "cd $HOME/ngs/applications/bowtie-0.12.7"
                          - Test the bowtie install
                          - Type "bowtie indexes/e_coli reads/e_coli_1000.fq"
                          - Should spill a bunch to the terminal window ending with:
                          # reads processed: 1000
                          # reads with at least one reported alignment: 699 (69.90%)
                          # reads that failed to align: 301 (30.10%)
                          Reported 699 alignments to 1 output stream(s)

                          - Download to the tophat test data (http://tophat.cbcb.umd.edu/tutorial.html)
                          - Decompress it and navigate into the downloaded folder "test_data"
                          - Test the tophat install
                          - Type "tophat -r 20 test_ref reads_1.fq reads_2.fq"
                          - Should create a subfolder called "tophat_out" with four files; accepted_hits.bam, deletions.bed, insertions.bed, junctions.bed
                          - Download the cufflinks test data (http://cufflinks.cbcb.umd.edu/tutorial.html)
                          - Navigate to the folder with the downloaded sam file
                          - Test the cufflinks install
                          - Type "cufflinks test_data.sam"
                          - Should create three files; genes.expr, transcripts.expr, and transcripts.gtf
                          Last edited by Jon_Keats; 10-04-2011, 09:41 AM. Reason: Found error in the boost install, Follow step by step, seems to make a difference

                          Comment


                          • #43
                            Analysis

                            Hi Jon,

                            Have you started analysis. Make sure you have right Ensembl GTF file and if you can post how you linked the analysis files, I mean Cuffdiff output with tracking files so that unique identifier of each file that will be great.

                            Best

                            Comment


                            • #44
                              Building Tophat-Cufflinks Compatible GTF files from Ensembl

                              I'll apologize in advance for the length of this post but I hope the verbosity is of some use to someone else other than myself should I go through these steps again. In TopHat, Cufflinks, Cuffcompare, Cuffdiff you often have the option to use a GTF file to define exon junctions to aid in junction detection, limit abundance calculations to a defined gene list, or exclude certain elements from the abundance calculations so things like mitochondrial transcripts or ribosomal transcripts don't make up the majority of your FPKM values.

                              So here were my steps to get files that seem to work as expected.

                              1) Download the bowtie index fiile for hg19 (http://bowtie-bio.sourceforge.net/index.shtml)
                              2) Move to /ngs/refgenomes/bowtie_indexed/
                              3) Decompress
                              4) Run bowtie-inspect to check the chromosome list and annotation format embedded in the file
                              Code:
                              bowtie-inspect -n hg19
                              Output:
                              chr1
                              chr2
                              chr3
                              chr4
                              chr5
                              chr6
                              chr7
                              chr8
                              chr9
                              chr10
                              chr11
                              chr12
                              chr13
                              chr14
                              chr15
                              chr16
                              chr17
                              chr18
                              chr19
                              chr20
                              chr21
                              chr22
                              chrX
                              chrY
                              chrM

                              5) Download the human GTF from Ensembl (http://uswest.ensembl.org/info/data/ftp/index.html)
                              6) Decompress and move to ngs/refgenomes/annotation_tracks (new folder)
                              7) Navigate to the location of the decompressed file
                              6) Generate a list of chromosomes in the GTF file
                              Code:
                              cut -f 1 Homo_sapiens.GRCh37.60.gtf | sort  | uniq > Homo_sapiens.GRCh37.60_Unique_ChromosomeList.txt
                              7) Check and modify the file output file
                              Code:
                              nano Homo_sapiens.GRC37.60_Unique_ChromosomeList.txt
                              Output:
                              1
                              10
                              11
                              12
                              13
                              14
                              15
                              16
                              17
                              18
                              19
                              2
                              20
                              21
                              22
                              3
                              4
                              5
                              6
                              7
                              8
                              9
                              GL000191.1
                              GL000192.1
                              GL000193.1
                              GL000194.1
                              GL000195.1
                              GL000197.1
                              GL000199.1
                              GL000200.1
                              GL000201.1
                              GL000204.1
                              GL000205.1
                              GL000209.1
                              GL000211.1
                              GL000212.1
                              GL000213.1
                              GL000214.1
                              GL000216.1
                              GL000218.1
                              GL000219.1
                              GL000220.1
                              GL000221.1
                              GL000222.1
                              GL000223.1
                              GL000224.1
                              GL000225.1
                              GL000227.1
                              GL000228.1
                              GL000229.1
                              GL000230.1
                              GL000233.1
                              GL000236.1
                              GL000237.1
                              GL000238.1
                              GL000239.1
                              GL000240.1
                              GL000241.1
                              GL000242.1
                              GL000243.1
                              GL000247.1
                              HSCHR17_1
                              HSCHR6_MHC_APD
                              HSCHR6_MHC_COX
                              HSCHR6_MHC_DBB
                              HSCHR6_MHC_MANN
                              HSCHR6_MHC_MCF
                              HSCHR6_MHC_QBL
                              HSCHR6_MHC_SSTO
                              MT
                              X
                              Y

                              - Delete the chromosome IDs present in the bowtie index file (ie. Delete 1-22, X, Y, MT)
                              - Save file with new name [control-O], change file name to "Homo_sapiens.GRCh37.60_ChrToExclude.txt, save, close nano editor [control-X]
                              8) Generate a new GTF file with just the chromosomes in the bowtie index
                              Code:
                              grep -vf Homo_sapiens.GRCh37.60_ChrToExclude.txt Homo_sapiens.GRCh37.60.gtf > GRCh37_E60_BowtieIndexChr.gtf
                              9) Update the new GTF chromosome names from 1, 2, 3, ... to chr1, chr2, chr3, ... to match the bowtie index nomenclature
                              Code:
                              awk '{print "chr"$0}' GRCh37_E60_BowtieIndexChr.gtf | sed 's/chrMT/chrM/g' > GRCh37_E60_BowtieIndexCompatible.gtf
                              10) Check that the new output file chromosome IDs match the bowtie index
                              Code:
                              cut -f 1 GRCh37_E60_BowtieIndexCompatible.gtf | sort | uniq > GRCh37_E60_BowtieIndexCompatible_Check.txt
                              less GRCh37_E60_BowtieIndexCompatible_Check.txt
                              Output:
                              chr1
                              chr10
                              chr11
                              chr12
                              chr13
                              chr14
                              chr15
                              chr16
                              chr17
                              chr18
                              chr19
                              chr2
                              chr20
                              chr21
                              chr22
                              chr3
                              chr4
                              chr5
                              chr6
                              chr7
                              chr8
                              chr9
                              chrM
                              chrX
                              chrY

                              ***You now have a GTF file ready to use with TopHat and Cufflinks***

                              Creating a GTF of regions to excluded from FPKM calculations in cufflinks. Unfortunately, this will come down to personal choice I suspect. But abundance estimates from certain tissues and library prep methods could vary greatly due to differences in levels of mitochondrial RNA, ribosomal RNA, or tissue specific transcripts like immunoglobulin in my case (sucks when 50% of your reads are from 1Mb of the genome...argh).

                              A) Get a list of RNA types from the second column of the GTF file
                              Code:
                              cut -f 2 Homo_sapiens.GRCh37.60.gtf | sort | uniq > Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt
                              less Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt
                              Output:
                              IG_C_gene
                              IG_C_pseudogene
                              IG_D_gene
                              IG_J_gene
                              IG_J_pseudogene
                              IG_V_gene
                              IG_V_pseudogene
                              lincRNA
                              miRNA
                              miRNA_pseudogene
                              misc_RNA
                              misc_RNA_pseudogene
                              Mt_rRNA
                              Mt_tRNA
                              Mt_tRNA_pseudogene
                              polymorphic_pseudogene
                              processed_transcript
                              protein_coding
                              pseudogene
                              rRNA
                              rRNA_pseudogene
                              scRNA_pseudogene
                              snoRNA
                              snoRNA_pseudogene
                              snRNA
                              snRNA_pseudogene
                              TR_C_gene
                              TR_J_gene
                              tRNA_pseudogene
                              TR_V_gene
                              TR_V_pseudogene

                              B) Again this is a personal choice but I'm getting rid of all transcripts from the mitochondrial genome (chrM, Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene), those from ribosomal genes (rRNA, rRNA_pseudogene), and those from immunoglobulin elements (IG_C_gene, IG_C_pseudogene, IG_D_gene, IG_J_gene, IG_J_pseudogene, IG_V_gene, IG_V_pseudogene).

                              - I modified the "Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt" file to create a list to select using grep called "Homo_sapiens.GRCh37.60_AnnotationsToExclude.txt"
                              Looks like:
                              IG_C_gene
                              IG_C_pseudogene
                              IG_D_gene
                              IG_J_gene
                              IG_J_pseudogene
                              IG_V_gene
                              IG_V_pseudogene
                              Mt_rRNA
                              Mt_tRNA
                              Mt_tRNA_pseudogene
                              rRNA
                              rRNA_pseudogene
                              chrM

                              Code:
                              grep -f Homo_sapiens.GRCh37.60_AnnotationsToExclude.txt GRCh37_E60_BowtieIndexCompatible.gtf > GRCh37_E60_CufflinksExcludedTranscripts.gtf
                              Obviously check the final output file to see if it makes sense but you should be ready to go

                              Comment


                              • #45
                                Thanks for this great thread

                                Just wanted to say thank you for this thread. Has been a great help. Now to everyone new to NGS, I recommend reading this thread and the book you mentioned about
                                "Unix and Perl for Biologists".

                                Thanks for the help
                                --
                                Prakhar

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-16-2024, 05:49 AM
                                0 responses
                                26 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-15-2024, 06:53 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                40 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                205 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X