Originally posted by Kurt
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by nilshomer View PostYou should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.
Comment
-
In my case the data we have is paired-end but the first alignment test I've done was using bwa in single end mode. Odd choice I know, but this is actually an mRNAseq dataset and when aligned to genome the paired-end mode causes a lot of artifacts as it tries to pair reads between exons that often exceed the typical insert size.
Regardless, I would always recommend to remove duplicates single-end or paired-end or mate-pair for that matter. In a real life example; a whole genome seq (multiple runs), 1 library, duplicates removed per run NOT across all runs, interesting biological hit is PCR artifact (identical read in multiple runs).
Remember that for single-end reads duplicate removal limits your coverage to a max of your read length x2. Obviously it can be higher for paired-end reads were one read maybe identical but the other read is different.
I like your version golharam, thanks for sharing
Comment
-
Thanks heaps for your posts Jonathan, this is very useful.
I'm now in the same position you were a few month ago (on a SOLiD) and took the approach to first learn linux and Perl to then do some analysis (mainly because data are not there yet...). Looking forward to more interesting posts from you soon as it really helps newbies (at least me) to have a better overview of the pipeline to implement.
Comment
-
Time to Git a Linux Machine
I'm slowly getting back up and running after moving from my post-doc to an independent position. Other than learning how damn expensive everything is I'm slowly deciding that I should swear off the idea of a new MacPro workstation for a Linux workstation given all the issues I seem to run into with the "not quite so standard Mac OSX10.6 implementation of Unix". But with the idea of sharing my ongoing experiences and a trail I can follow to build my next machine I thought I'd update my thread. I hope some people have found it useful...
Some new idea's and an update
1) I'm becoming increasing certain that I'm getting good enough at command line issues to REALLY mess up my system
2) My list of used programs continues to increase as I try each new sequencing method
3) As per issue 1 - I'm also not reading instructions very well. New Rule - If at first you don't succeed...Go back and read the damn instructions again because most likely you didn't follow them correctly!
New Applications to Install
As previously stated in the post if you are using a Mac OS environment you need to do a couple of special things
A) Install Xcode on your system (See earlier post)
B) Install Fink on your system (See earlier post)
- install the following fink packages:md5deep and pkgconfig
- "fink install md5deep" (needed for bfast install)
- "fink install pkgconfig" (needed for fastx-toolkit install)
C) Install Git on your system (http://git-scm.com/)
D) Create a $PATH Directory and update this directory in your .profile (See earlier post for instructions)
- In my case "$HOME/local/bin"
1) Install FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
***Why did I get this package***
Because I have some illumina mate-pair data that I want to analyze with BWA using sampe but the reads need to be reverse complemented to work correctly by my understanding so I'm using the fastx_reverse_complement application in the package that seems to be very fast and correctly reverse complements the reads and reverses the quality values
Instructions:
- Go to download page and download the following:
a) fastx_toolkit-0.0.13.tar.bz2
b) libgtextutils-0.6.tar.bz2
- Move both to ngs/applications folder and unpack both packages
- In Terminal navigate the libgtextutils folder "cd ngs/applications/libtestutils-0.6"
- Install the package as follows:
./configure
make
sudo make install (this will ask for your password, must be admin level privileged set)
- Move to fastx_toolkit folder "cd ../fastx_toolkit-0.0.13"
- Install the package as follows:
./configure --prefix=$HOME/local/bin
make
make install
- Test install by typing "fastx_uncollapser -h", this should pop up a usage documentation for this app
2) Install Bfast, DNAA, and Breakway
*** Why these packages***
As you might guess from the above install I now have some mate-pair data and want to try out the Breakway package from the UCLA group but it depends on two of their other packages Bfast and DNAA
Bfast - This package seems to be the vain of my existence but thankfully Nils and the helplist have been amazingly helpful
Mac Related Issues:
a) You must have fink and have installed the md5deep package otherwise "make check" will fail
b) The current sourceforge version (0.6.4e) does not install correctly thought the previous version does, however, this is a known issue and has been fixed in the master branch (if that's a new term to you we are in the same boat) but this mean you need to use the git repository version
c) Using ".configure -prefix=$HOME/local" works but makes DNAA mad when you install it so use sudo (time to be superman again)
Instructions:
a) In Terminal navigate to ngs/applications
b) Get current Bfast version from Git (restart Terminal after git installation)
- type "git clone git://bfast.git.sourceforge.net/gitroot/bfast/bfast"
- this will create a folder called "bfast" in the current directory
- Move into the directory "cd bfast"
- Install bfast by typing the following:
sh autogen.sh
./configure
make
make check
sudo make install (requests a password with admin level privileges)
- Test install and check current version by typing "bfast" in Terminal
c) Navigate back to ngs directory by typing "cd ../"
d) Get current version of DNAA from Git
- type "git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa dnaa"
- this will create a directory named "dnaa" in the current directory (ngs/applications)
- move into the the dnaa directory by typing "cd dnaa"
e) Because this package depends on both BFAST and SAMTOOLS you need to provide links to these application directories even though you already have them in a $PATH directory (/usr/local/bin and $HOME/local/bin repectively)
- create a link to the BFAST package you just installed by typing "ln -s ../bfast bfast"
- create a link to your current SAMTOOLS package by typing "ln -s ../samtools-0.1.8 samtools"
f) Install DNAA by typing the following:
sh autogen.sh
./configure
make
sudo make install (requests a password with admin level privileges)
g) Download current version of BREAKWAY from sourceforge (http://sourceforge.net/projects/breakway/), move it to the ngs/applications folder and unpack it and you should be ready to go
Comment
-
Building a Paired-End Pipeline
Up till now I've been frustrated because I could not automate a variety of pairing steps that occur as I process raw data to BAM files. This is usually either in the SAMPE step of BWA or when I wanted to merge multiple lanes into one BAM file. I think I convinced myself that I can just use the "cat" function to merge the multiple lanes together before processing, which ends up being a simple solution as long as all the lanes are available at the same time. For the SAMPE pairing I spent sometime with my Unix guru from France when he came over to visit his wife and I seem to have a workable solution as long as a specific file tree structure is used in conjunction with two unix scripts, one that processes each pair from raw data to two sort BAM files, one with and without duplicates, and a second that pulls each sample into the analysis framework and launches the aforementioned script. So since this requires a specific directory structure I've updated my directory structure script to version 3.
Code:#!/bin/sh # Create_NGS_DirectoryStructure_V3.sh # # # Created by Jonathan Keats on 9/3/10 based on suggestion from Ryan Golhar on my Seqanswers thread. # Translational Genomics Research Institute # ######################################################################### # CREATES A DIRECTORY STURCTURE TO SUPPORT A VARIETY OF NGS PIPELINES # ######################################################################### # # Designed for a Mac OS enviroment and requires initiation from your home folder (/User/You/) # Check to confirm current location is $HOME/ (ie. /User/You/) echo "Confirming Script Initiation Directory" var1=$HOME if [ "`pwd`" != "$var1" ] then echo " The script must be launched from your home directory " echo " The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "1) Launch Location is Correct ($HOME/)" # Create required directories to support pipelines (BWAse, BWApe, and others to come...) echo ***Creating Pipeline Directory Structure*** mkdir -p ngs/{analysisnotes,applications,scripts} mkdir -p ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,genome_downloads} mkdir -p ngs/refgenomes/genome_downloads/{hg18,hg19} mkdir -p ngs/finaloutputs/{alignmentresults_bwa,illumina,sangerfastq} mkdir -p ngs/finaloutputs/bamfiles/{merged,sorted,nodups} mkdir -p ngs/bwase/inputsequences/{illumina,sangerfastq} mkdir -p ngs/bwase/samfiles mkdir -p ngs/bwase/bamfiles/{merged,original,sorted,nodups} mkdir -p ngs/bwape/samfiles mkdir -p ngs/bwape/bamfiles/{merged,original,sorted,nodups} mkdir -p ngs/bwape/inputsequences/{illumina,sangerfastq,hold} mkdir -p ngs/bwape/inputsequences/illumina/{read1,read2} mkdir -p ngs/bwape/inputsequences/sangerfastq/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/{lane1,lane2,lane3,lane4,lane5,lane6,lane7,lane8} mkdir -p ngs/bwape/inputsequences/hold/lane1/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane2/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane3/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane4/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane5/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane6/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane7/{read1,read2} mkdir -p ngs/bwape/inputsequences/hold/lane8/{read1,read2} mv create_ngs_directorystructure_v3.sh ngs/scripts/ echo ***Pipeline Directory Structure Created***
Last edited by Jon_Keats; 09-13-2010, 04:58 PM.
Comment
-
BWA SAMPE Pipeline Version
As I mentioned before its taken a while to sort out a method that can automate a paired-end analysis using BWA but it seems to work now. Feel free to use the scripts below in conjunction with the "create_ngs_directorystructure_v3.sh" script that creates the required directory structure.
The following two scripts can be used to process files using BWA to automate a paired-end analysis from the output "s_x_sequence.txt" files to aligned, indexed, and duplicate removed BAM files. The design of the pipeline has a couple of requirements:
1) You need to have all the required applications in a $PATH directory. As detailed in this thread I personally use "$HOME/local/bin".
2) You will need; MAQ with ill2sanger patch installed, BWA, SAMTOOLS, and PICARD MarkDuplicates.jar in this path directory.
NOTE: If you use a different path directory you need to alter line 623 of BWApe_hg18_v1.sh as MarkDuplicates.jar is being called specifically from this directory while all others are being called through the $PATH directory. ****If you know how to put a directory in the JAVA path on a Mac drop me a line****
3) Both shell scripts are designed to be in your $PATH directory so you can call them from the ngs directory using "BWApe_hg18_v1.sh" for a single sample analysis or "multi_bwape_analysis_v1.sh" for a multiple sample analysis. Alternatively, you can place them in the "/ngs" folder and call them directly using "./BWApe_hg18_v1.sh" or "./multi_bwape_analysis_v1.sh" (NOTE: If you do this you need to modify the lines that launch BWApe_hg18_v1.sh to include the direct launch indicator "./"
4) The input file names must be unique and end with a "_R1.txt" read identifier such as "YourSample_R1.txt" and "YourSample_R2.txt"
NOTE: The name BWApe_hg18_v1.sh only reflects the reference genome used in the development of the script. You can easily change to what ever genome mouse, human you want to use you just need to generate the bwa index and update the BWApe_hg18_v1.sh script as indicated in the script.
NOTE: If using "BWApe_hg18_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2". If using "multi_bwape_analysis_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/hold/laneX/read1" and "ngs/bwape/inputsequences/hold/laneX/read2" as appropriate to your sample set. The script is only designed for 8 lanes/samples so if you have more you needed to copy/paste to extend the script. After completing each lane/sample it checks to see if there is data for another lane/sample in the next sequential lane/sample folder and process it if available or ends the script if it is empty, so you need to put files in the hold/lane1, 2, 3, 4, 5, 6, 7, and 8 read folders in order.
Code:#!/bin/sh # BWApe_hg18_V1.sh # Created by Jonathan Keats # Translational Genomics Research Institute # This script is designed to take a batch of raw Illumina 1.3+ reads to sorted and indexed BAM files with and without duplicates using BWA in paired end mode. # It is designed to be initiated from a folder called "ngs" in your $HOME folder with a specific subdirectory structure # To create the directory struture launch "create_ngs_directorystructure_v3.sh" from your "$Home" folder #################################################################################################### ## To Run This Script You Must Have The Following Applications In One Of Your $PATH Directories ## ## 1) MAQ with ill2sanger patch installed ## ## 2) BWA ## ## 3) SAMTOOLS ## ## 4) PICARD - MarkDuplicates.jar (Must be in $HOME/local/bin) ## #################################################################################################### # To run this script you MUST first place your reference file in ngs/refgenomes/bwa_indexed and have run the "bwa index" command to create the BWT index files ###################################################################################################### # WARNING - YOU MUST ENSURE THE NAME OF YOUR REFERENCE GENOME FILE MATCHES LINES (274, 310, and 367) # ###################################################################################################### # The script is based on having ***RENAMED*** Illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2" # The renamed format ***MUST*** be "YourSampleName_R1.txt" and "YourSampleName_R2.txt" otherwise pairing and renaming will not occur correctly # Multiple lanes should be concatinated together before initiating the script, unless you want to manually merge in samtools # At each step it queries specific folders for available files and passes them to the next analysis module # After each step the filename extension of the output files are corrected. (ie. "MySequenceFile_R1.txt.fastq" to "MySequenceFile_R1.fastq") # Order of Embedded Steps - Converts Illumina 1.3+ fastq files "s_1_sequence.txt" to Sanger fastq files "s_1_sequence.fastq" using "maq ill2sanger" command # - Aligns created fastq files to reference genome using "bwa aln" command # - Generates SAM files from alignment files using "bwa sampe" command # - Converts SAM files to BAM files using "samtools view" command # - Sorts BAM files using "samtools sort" command # - Indexes the sorted BAM files for use in IGV browser using "samtools index" command # - Removes duplicates from the sorted bam files using "picard - MarkDuplicates.jar" command # - Indexes the no duplicates BAM files for use in IGV browser using "samtools index" command # - Final output files are archived then the input and analysis directories are cleaned-up and readied for the next analysis batch # The script creates a log file in /ngs/analysisnotes to track the steps completed and the time each step started and finished # Some of the log events will print to both the terminal screen and the log file so you can see what is going on # Much of this would not be possible with out the help of a former colleagues husband who is a Unix programmer in France so I've kept some french terms such as linge instead of line in his honor (thanks Charabelle) #Starting directory = $HOME/ngs #In this step - We check that you are lauching the script from the correct location in case you are using it from a path directory # - We check that the destination directories used by the script are empty to prevent deleting erroneous files and unexpected analysis events # - Hope to add a check for available disk space echo ***Checking Diretory Structure*** #List of directoryies to check var1=$HOME/ngs var2=$HOME/ngs/bwape/samfiles var3=$HOME/ngs/bwape/bamfiles/merged var4=$HOME/ngs/bwape/bamfiles/original var5=$HOME/ngs/bwape/bamfiles/sorted var6=$HOME/ngs/bwape/bamfiles/nodups var7=$HOME/ngs/bwape/inputsequences/sangerfastq/read1 var8=$HOME/ngs/bwape/inputsequences/sangerfastq/read2 #Checking if launch location is correct if [ "`pwd`" != "$var1" ] then echo " The script must be launched from the NGS directory " echo " The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "1) Launch Location is Correct ($HOME/ngs)" #Checking if analysis directories are empty if [ `ls $var2 | wc -l` != 0 ] then echo " The bwape/samfiles directory is not empty - Any data in this directory would be deleted by the script " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "2) bwape/samfiles directory is empty as required" if [ `ls $var3 | wc -l` != 0 ] then echo " The bwape/bamfiles/merged directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/merged " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "3) bwape/bamfiles/merged directory is empty as required" if [ `ls $var4 | wc -l` != 0 ] then echo " The bwape/bamfiles/original directory is not empty - Any data in this directory would be deleted by the script " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "4) bwape/bamfiles/original directory is empty as required" if [ `ls $var5 | wc -l` != 0 ] then echo " The bwape/bamfiles/sorted directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/sorted by the script " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "5) bwape/bamfiles/sorted directory is empty as required" if [ `ls $var6 | wc -l` != 0 ] then echo " The bwape/bamfiles/nodups directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "6) bwape/bamfiles/nodups directory is empty as required" if [ `ls $var7 | wc -l` != 0 ] then echo " The bwape/illuminasequences/sangerfastq/read1 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "7) bwape/illuminasequences/sangerfastq/read1 directory is empty as required" if [ `ls $var8 | wc -l` != 0 ] then echo " The bwape/illuminasequences/sangerfastq/read2 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo "8) bwape/illuminasequences/sangerfastq/read2 directory is empty as required" echo ***Pre Run Check Completed Successfully*** #Current directory=ngs echo ***Starting BWA SAMPE Analysis Batch*** date '+%m/%d/%y %H:%M:%S' #The following step creates the log file in the AnalysisNotes subdirectory the first time the script is run #On subsequent runs the results are printed at the bottom of the pre-existing log file echo ***Starting BWA SAMPE Analysis Batch*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we convert the "Read1" illumina fastq files to sanger fastq files using the maq ill2sanger script echo Starting Step1a - Read1 Illumina to Sanger Fastq Conversion with maq ill2sanger date '+%m/%d/%y %H:%M:%S' echo Starting Step1a - Illumina to Sanger Fastq Conversion with maq ill2sanger >> AnalysisNotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log cd bwape/inputsequences/illumina/read1 #Current directory = ngs/bwape/inputsequences/illumina/read1 echo Converting the following Illumina files: for ligne in `ls *.txt` do echo $ligne done echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.txt` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do maq ill2sanger $ligne ../../sangerfastq/read1/$ligne.fastq done #In the next step we clean up the Illumina Read1 folder so it is ready for the next analysis batch echo Cleaning up Input Sequences Illumina Read1 date '+%m/%d/%y %H:%M:%S' echo Cleaning up Input Sequences Illumina Read1 >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read1 to ngs/finaloutputs/illumina: for ligne in `ls *.txt` do echo $ligne done echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read1 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.txt` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../../finaloutputs/illumina done #In the next step we rename the "Read1" sanger format fastq files from ".txt.fastq" extensions to ".fastq" cd ../../sangerfastq/read1 #Current directory = ngs/bwape/inputsequences/sangerfastq/read1 old_ext=txt.fastq new_ext=fastq find . -type f -name "*.$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step1a - Illumina to Sanger Fastq Conversion date '+%m/%d/%y %H:%M:%S' echo Finished Step1a - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log #In the next step we convert the "Read2" illumina fastq files to sanger fastq files using the maq ill2sanger script cd ../../illumina/read2 #Current directory = ngs/bwape/inputsequences/illumina/read2 echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger date '+%m/%d/%y %H:%M:%S' echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger >> ../../../../AnalysisNotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log echo Converting the following Illumina files: for ligne in `ls *.txt` do echo $ligne done echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.txt` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do maq ill2sanger $ligne ../../sangerfastq/read2/$ligne.fastq done #In the next step we clean up the Illumina Read2 folder so it is ready for the next analysis batch echo Cleaning up Input Sequences Illumina Read2 date '+%m/%d/%y %H:%M:%S' echo Cleaning up Input Sequences Illumina Read2 >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read2 to ngs/finaloutputs/illumina: for ligne in `ls *.txt` do echo $ligne done echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read2 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.txt` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../../finaloutputs/illumina done #In the next step we rename the "Read2" sanger format fastq files from ".txt.fastq" extensions to ".fastq" cd ../../sangerfastq/read2 #Current directory = ngs/bwape/inputsequences/sangerfastq/read2 old_ext=txt.fastq new_ext=fastq find . -type f -name "*.$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step1b - Illumina to Sanger Fastq Conversion date '+%m/%d/%y %H:%M:%S' echo Finished Step1b - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log #In the next step we will align the converted "Read1" sanger fastq format files to the reference genome echo Starting Step2a - Read1 bwa aln process date '+%m/%d/%y %H:%M:%S' echo Starting Step2a - Read1 bwa aln process >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log cd ../read1 #Current directory = ngs/bwape/inputsequences/sangerfastq/read1 echo The following fastq files will be aligned: for ligne in `ls *.fastq` do echo $ligne done echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.fastq` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.fastq` do bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai done #In the next step we will rename the "Read1" alignment files old_ext=.fastq.sai new_ext=_bwa.sai find . -type f -name "*$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step2a - bwa aln process date '+%m/%d/%y %H:%M:%S' echo Finished Step2a - bwa aln process >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log #In the next step we will align the converted "Read2" sanger fastq format files to the reference genome echo Starting Step2b - Read2 bwa aln process date '+%m/%d/%y %H:%M:%S' echo Starting Step2b - Read2 bwa aln process >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log cd ../read2 #Current directory = ngs/bwape/inputsequences/sangerfastq/read2 echo The following fastq files will be aligned: for ligne in `ls *.fastq` do echo $ligne done echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.fastq` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.fastq` do bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai done #In the next step we will rename the "Read2" alignment files old_ext=.fastq.sai new_ext=_bwa.sai find . -type f -name "*$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step2b - bwa aln process date '+%m/%d/%y %H:%M:%S' echo Finished Step2b - bwa aln process >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log #In the next step we will generate SAM files for the alignments using bwa sampe echo Starting Step3 - bwa sampe process date '+%m/%d/%y %H:%M:%S' echo Starting Step3 - bwa sampe process >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log echo The following alignment files will be converted to SAM files: cd ../read1 #Current directory = ngs/bwape/inputsequences/sangerfastq/read1 for ligne in `ls *.sai` do aln1=`echo $ligne` done echo $aln1 for ligne in `ls *.fastq` do read1=`echo $ligne` done echo $read1 cd ../read2 #Current directory = ngs/bwape/inputsequences/sangerfastq/read2 for ligne in `ls *.sai` do aln2=`echo $ligne` done echo $aln2 for ligne in `ls *.fastq` do read2=`echo $ligne` done echo $read2 echo The following alignment files will be converted to SAM files: >> ../../../../analysisnotes/Analysis.log echo $aln1 >> ../../../../analysisnotes/Analysis.log echo $read1 >> ../../../../analysisnotes/Analysis.log echo $aln2 >> ../../../../analysisnotes/Analysis.log echo $read2 >> ../../../../analysisnotes/Analysis.log cd ../../../samfiles #Current directory = ngs/bwape/samfiles #(bwa sampe <database.fasta> <aln1.sai> <aln2.sai> <input1.fastq> <input2.fastq> > aln.sam) bwa sampe ../../refgenomes/bwa_indexed/hg18.fasta ../inputsequences/sangerfastq/read1/$aln1 ../inputsequences/sangerfastq/read2/$aln2 ../inputsequences/sangerfastq/read1/$read1 ../inputsequences/sangerfastq/read2/$read2 > $read1.sam #In the next step we will rename the SAM files generated by bwa sampe analysis of the "Read1" and "Read2" alignment files old_ext=_R1.fastq.sam new_ext=_bwape.sam find . -type f -name "*$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step3 - bwa sampe process date '+%m/%d/%y %H:%M:%S' echo Finished Step3 - bwa sampe process >> ../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log #In the next step we will convert each SAM file to a BAM file echo Starting Step4 - samtools SAM to BAM conversion date '+%m/%d/%y %H:%M:%S' echo Starting Step4 - samtools SAM to BAM conversion >> ../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log echo The following SAM files will be converted to BAM files: for ligne in `ls *.sam` do echo $ligne done echo The following SAM files will be converted to BAM files: >> ../../analysisnotes/Analysis.log for ligne in `ls *.sam` do echo $ligne >> ../../analysisnotes/Analysis.log done for ligne in `ls *.sam` do samtools view -bS -o ../bamfiles/original/$ligne.bam $ligne done #In the next step we will delete the SAM file to save disc space as the BAM file contains all the data in a binary format echo Deleting the following SAM Files from ngs/bwape/samfiles: for ligne in `ls *.sam` do echo $ligne done echo Deleting the following SAM Files from ngs/bwape/samfiles: >> ../../analysisnotes/Analysis.log for ligne in `ls *.sam` do echo $ligne >> ../../analysisnotes/Analysis.log done for ligne in `ls *.sam` do rm $ligne done echo Deleting SAM Files Complete date '+%m/%d/%y %H:%M:%S' echo Deleting SAM Files Complete >> ../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log #In the next step we clean up the Sanger Fastq "Read1" folder so it is ready for the next analyis batch cd ../inputsequences/sangerfastq/read1 #Current directory = ngs/bwape/inputsequences/sangerfastq/read1 echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq: for ligne in `ls *.fastq` do echo $ligne done echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.fastq` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.fastq` do mv $ligne ../../../../finaloutputs/sangerfastq/ done echo Moving Sanger Format Fastq Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa: for ligne in `ls *.sai` do echo $ligne done echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.sai` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.sai` do mv $ligne ../../../../finaloutputs/alignmentresults_bwa/ done echo Moving Alignment Results Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log #In the next step we clean up the Sanger Fastq "Read2" folder so it is ready for the next analyis batch cd ../read2 #Current directory = ngs/bwape/inputsequences/sangerfastq/read2 echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq: for ligne in `ls *.fastq` do echo $ligne done echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.fastq` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.fastq` do mv $ligne ../../../../finaloutputs/sangerfastq/ done echo Moving Sanger Format Fastq Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa: for ligne in `ls *.sai` do echo $ligne done echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log for ligne in `ls *.sai` do echo $ligne >> ../../../../analysisnotes/Analysis.log done for ligne in `ls *.sai` do mv $ligne ../../../../finaloutputs/alignmentresults_bwa/ done echo Moving Alignment Results Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log #In the next step we will rename the BAM files created by the samtools SAM-to-BAM conversion process cd ../../../bamfiles/original #Current directory = ngs/bwape/bamfiles/original old_ext=sam.bam new_ext=bam find . -type f -name "*.$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step4 - samtools SAM to BAM conversion date '+%m/%d/%y %H:%M:%S' echo Finished Step4 - samtools SAM to BAM conversion >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we will sort the BAM file by chromosome coordinate echo Starting Step5 - samtools BAM sorting process date '+%m/%d/%y %H:%M:%S' echo Starting Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo The following BAM files will be sorted: for ligne in `ls *.bam` do echo $ligne done echo The following BAM files will be sorted: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do samtools sort $ligne ../Sorted/$ligne done #In the next step we will delete the original unsorted BAM file to save disc space as the sorted BAM contains all the needed information echo Deleting the following BAM Files from ngs/bwape/bamfiles/original: for ligne in `ls *.bam` do echo $ligne done echo Deleting the following BAM Files from ngs/bwape/bamfiles/original: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do rm $ligne done echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete date '+%m/%d/%y %H:%M:%S' echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we will rename the sort BAM files created by the samtools sort process cd ../sorted #Current directory = ngs/bwape/bamfiles/sorted old_ext=.bam.bam new_ext=_sorted.bam find . -type f -name "*$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step5 - samtools BAM sorting process date '+%m/%d/%y %H:%M:%S' echo Finished Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we will index the sorted BAM files for fast access and viewing in the IGV browser echo Starting Step6 - samtools BAM indexing process date '+%m/%d/%y %H:%M:%S' echo Starting Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo The following BAM files will be indexed: for ligne in `ls *.bam` do echo $ligne done echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do samtools index $ligne done echo Finished Step6 - samtools BAM indexing process date '+%m/%d/%y %H:%M:%S' echo Finished Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we will remove the duplicate reads from the sorted bam files echo Starting Step7 - picard markduplicates process date '+%m/%d/%y %H:%M:%S' echo Starting Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo Duplicate reads will be removed from the following sorted BAM files: for ligne in `ls *.bam` do echo $ligne done echo Duplicate reads will be removed from the following sorted BAM files: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do java -Xmx2g -jar $HOME/local/bin/MarkDuplicates.jar INPUT=$ligne OUTPUT=../nodups/$ligne METRICS_FILE=../nodups/$ligne.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT done #In the next step we clean up the Sorted BAM files folder so it is ready for the next analyis batch echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: for ligne in `ls *.bam` do echo $ligne done echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do mv $ligne ../../../finaloutputs/bamfiles/sorted/ done echo Moving Sorted BAM Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving Sorted BAM Files Complete >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: for ligne in `ls *.bai` do echo $ligne done echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bai` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bai` do mv $ligne ../../../finaloutputs/bamfiles/sorted/ done echo Moving Sorted BAM Index Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving Sorted BAM Index Files Complete >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we will rename the BAM files and Metrics files created after duplicate removal by picard cd ../nodups #Current directory = ngs/bwape/bamfiles/nodups old_ext=_sorted.bam new_ext=_sorted_nodups.bam find . -type f -name "*$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done old_ext=_sorted.bam.txt new_ext=_sorted_nodups_metrics.txt find . -type f -name "*$old_ext" -print | while read file do mv $file ${file%${old_ext}}${new_ext} done echo Finished Step7 - picard markduplicates process date '+%m/%d/%y %H:%M:%S' echo Finished Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we will index the nodups BAM files for fast access and viewing in the IGV browser echo Starting Step8 - samtools BAM indexing process date '+%m/%d/%y %H:%M:%S' echo Starting Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo The following BAM files will be indexed: for ligne in `ls *.bam` do echo $ligne done echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do samtools index $ligne done #In the next step we clean up the nodups BAM files folder so it is ready for the next analyis batch echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: for ligne in `ls *.bam` do echo $ligne done echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bam` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bam` do mv $ligne ../../../finaloutputs/bamfiles/nodups/ done echo Moving NoDups BAM Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving NoDups BAM Files Complete >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: for ligne in `ls *.bai` do echo $ligne done echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.bai` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.bai` do mv $ligne ../../../finaloutputs/bamfiles/nodups/ done echo Moving NoDups BAM Index Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving NoDups BAM Index Files Complete >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: for ligne in `ls *.txt` do echo $ligne done echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log for ligne in `ls *.txt` do echo $ligne >> ../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../finaloutputs/bamfiles/nodups/ done echo Moving MarkDuplicates Metrics Files Complete date '+%m/%d/%y %H:%M:%S' echo Moving MarkDuplicates Metrics Files Complete >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log echo Finished Step8 - samtools BAM indexing process date '+%m/%d/%y %H:%M:%S' echo Finished Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log #In the next step we return to the launch folder $HOME/Documents/ngs cd ../../.. #Current directory = ngs/ echo ***Analysis Batch Complete*** echo ***Analysis Batch Complete*** >> analysisnotes/Analysis.log
Code:#!/bin/sh # multi_bwape_analysis_v1.sh # # # Created by Jonathan Keats on 9/5/10. # Translational Genomics Research Institute # This script is designed to allow multiple samples/lanes of paired-end illumina data to be passed into the "BWApe_hg18_v1" pipeline ############################################################################################################################### ## To facilitate its use you must put uniquely named Illumina 1.3+ files in ngs/bwape/inputsequences/hold/lane(X)/read(1-2) # ## It is essential that these file names are uniquely name or overwriting will occur # ## These files MUST have the ".txt" extension characteristic of the Illumina V1.3+ output "s_x_sequences.txt" # ############################################################################################################################### #In this step we check that you are lauching the script from the correct location in case you are using it from a path directory echo ***Checking Current Directory is Correct*** #List of directoryies to check temp1=$HOME/ngs #Checking if launch location is correct if [ "`pwd`" != "$temp1" ] then echo " The script must be launched from the NGS directory " echo " The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Current Directory is Correct*** #Check if files exist in the lane1 hold folder #List of directoryies to check temp2=$HOME/ngs/bwape/inputsequences/hold/lane1/read1 temp3=$HOME/ngs/bwape/inputsequences/hold/lane1/read2 echo ***Checking Lane1 Hold Folder*** if [ `ls $temp2 | wc -l` != 1 ] then echo " The Lane1 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp3 | wc -l` != 1 ] then echo " The Lane1 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** #Current directory=ngs echo ***Starting The Analysis of Lane1*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane1*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane1" data from ngs/bwape/inputsequences/hold/lane1/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane1/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane1/read1 echo Moving Lane1 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane1 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane1/read2 echo Moving Lane1 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane1 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane1 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample2/lane2 data into the analysis directories #Check if files exist in the lane2 hold folder #List of directoryies to check temp4=$HOME/ngs/bwape/inputsequences/hold/lane2/read1 temp5=$HOME/ngs/bwape/inputsequences/hold/lane2/read2 echo ***Checking Lane2 Hold Folder*** if [ `ls $temp4 | wc -l` != 1 ] then echo " The Lane2 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp5 | wc -l` != 1 ] then echo " The Lane2 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** echo ***Starting The Analysis of Lane2*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane2*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane2" data from ngs/bwape/inputsequences/hold/lane2/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane2/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane2/read1 echo Moving Lane2 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane2 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane2/read2 echo Moving Lane2 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane2 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane2 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample3/lane3 data into the analysis directories #Check if files exist in the lane3 hold folder #List of directoryies to check temp6=$HOME/ngs/bwape/inputsequences/hold/lane3/read1 temp7=$HOME/ngs/bwape/inputsequences/hold/lane3/read2 echo ***Checking Lane3 Hold Folder*** if [ `ls $temp6 | wc -l` != 1 ] then echo " The Lane3 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp7 | wc -l` != 1 ] then echo " The Lane3 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** #Current directory=ngs echo ***Starting The Analysis of Lane3*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane3*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane3" data from ngs/bwape/inputsequences/hold/lane3/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane3/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane3/read1 echo Moving Lane3 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane3 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane3/read2 echo Moving Lane3 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane3 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane3 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample4/lane4 data into the analysis directories #Check if files exist in the lane4 hold folder #List of directoryies to check temp8=$HOME/ngs/bwape/inputsequences/hold/lane4/read1 temp9=$HOME/ngs/bwape/inputsequences/hold/lane4/read2 echo ***Checking Lane4 Hold Folder*** if [ `ls $temp8 | wc -l` != 1 ] then echo " The Lane4 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp9 | wc -l` != 1 ] then echo " The Lane4 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** echo ***Starting The Analysis of Lane4*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane4*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane4" data from ngs/bwape/inputsequences/hold/lane4/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane4/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane4/read1 echo Moving Lane4 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane4 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane4/read2 echo Moving Lane4 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane4 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane4 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample5/lane5 data into the analysis directories #Check if files exist in the lane1 hold folder #List of directoryies to check temp10=$HOME/ngs/bwape/inputsequences/hold/lane5/read1 temp11=$HOME/ngs/bwape/inputsequences/hold/lane5/read2 echo ***Checking Lane1 Hold Folder*** if [ `ls $temp10 | wc -l` != 1 ] then echo " The Lane5 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp11 | wc -l` != 1 ] then echo " The Lane5 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** #Current directory=ngs echo ***Starting The Analysis of Lane5*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane5*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane5" data from ngs/bwape/inputsequences/hold/lane5/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane5/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane5/read1 echo Moving Lane5 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane5 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane5/read2 echo Moving Lane5 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane5 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane5 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample6/lane6 data into the analysis directories #Check if files exist in the lane6 hold folder #List of directoryies to check temp12=$HOME/ngs/bwape/inputsequences/hold/lane6/read1 temp13=$HOME/ngs/bwape/inputsequences/hold/lane6/read2 echo ***Checking Lane6 Hold Folder*** if [ `ls $temp12 | wc -l` != 1 ] then echo " The Lane6 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp13 | wc -l` != 1 ] then echo " The Lane6 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** echo ***Starting The Analysis of Lane6*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane6*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane6" data from ngs/bwape/inputsequences/hold/lane6/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane6/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane6/read1 echo Moving Lane6 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane6 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane6/read2 echo Moving Lane6 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane6 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane6 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample7/lane7 data into the analysis directories #Check if files exist in the lane7 hold folder #List of directoryies to check temp14=$HOME/ngs/bwape/inputsequences/hold/lane7/read1 temp15=$HOME/ngs/bwape/inputsequences/hold/lane7/read2 echo ***Checking Lane7 Hold Folder*** if [ `ls $temp14 | wc -l` != 1 ] then echo " The Lane7 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp15 | wc -l` != 1 ] then echo " The Lane7 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** #Current directory=ngs echo ***Starting The Analysis of Lane7*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane7*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane7" data from ngs/bwape/inputsequences/hold/lane7/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane7/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane7/read1 echo Moving Lane7 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane7 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane3/read2 echo Moving Lane7 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane7 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane7 Analysis Complete*** # The analysis directory should now be empty and we can now load the sample8/lane8 data into the analysis directories #Check if files exist in the lane8 hold folder #List of directoryies to check temp16=$HOME/ngs/bwape/inputsequences/hold/lane8/read1 temp17=$HOME/ngs/bwape/inputsequences/hold/lane8/read2 echo ***Checking Lane8 Hold Folder*** if [ `ls $temp16 | wc -l` != 1 ] then echo " The Lane8 Read1 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi if [ `ls $temp17 | wc -l` != 1 ] then echo " The Lane8 Read2 hold folder does not contain the expect single file " echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message" exit 2 fi echo ***Found Expected Files*** echo ***Starting The Analysis of Lane8*** date '+%m/%d/%y %H:%M:%S' echo ***Starting The Analysis of Lane8*** >> analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log #In the next step we move the "Lane8" data from ngs/bwape/inputsequences/hold/lane8/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2) cd bwape/inputsequences/hold/lane8/read1 #Current Directory=ngs/bwape/inputsequences/hold/lane8/read1 echo Moving Lane8 Read1 File to Read1 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane8 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read1/ done echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../read2 #Current Directory=ngs/bwape/inputsequences/hold/lane8/read2 echo Moving Lane8 Read2 File to Read2 Analysis Directory date '+%m/%d/%y %H:%M:%S' echo Moving Lane8 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log echo Moving the following files: for ligne in `ls *.txt` do echo $ligne done for ligne in `ls *.txt` do echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log done for ligne in `ls *.txt` do mv $ligne ../../../illumina/read2/ done echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete date '+%m/%d/%y %H:%M:%S' echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log cd ../../../../../ #Current Directory=ngs/ # Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe BWApe_hg18_v1.sh echo ***Lane8 Analysis Complete***
Comment
-
Getting the Mens Formal Wear Packages Going
I've finally jumped into the TopHat-Cufflinks world for RNAseq analysis. Because most of the pre-compiled binaries are for Mac OSX10.5 not 10.6 I've built all the binaries from the source code. As previous I've included detailed instructions on the install.
1) Install bowtie
- Download current version
(http://sourceforge.net/projects/bowtie-bio/files/bowtie)
- Move to applications folder (ngs/applications)
- Decompress
- Using terminal navigate to the unpacked bowtie folder
- To make the package type "make"
- Copy "bowtie", "bowtie-build", and "bowtie-inspect" to your path directory
- If you follow this thread I use $HOME/local/bin
- Thus type: "cp bowtie $HOME/local/bin"
"cp bowtie-build $HOME/local/bin"
"cp bowtie-inspect $HOME/local/bin"
2) Install Boost and Configure $PATH directory to support tophat and cufflinks install
*** If not installed, download and install Samtools and copy the binary to $PATH directory ($HOME/local/bin)***
*** See previous posts if you need instructions ***
- Download Boost version 1.45.0 (http://www.boost.org/) [boost_1_45_0.tar.bz2]
- Move to applications folder (ngs/applications)
- Decompress the package (double click)
- Using terminal navigate to the decompressed folder (ngs/applications/boost_1_45_0)
- Build the package
- Type "./bootstrap.sh"
- Type "./bjam --prefix=$HOME/local --toolset=darwin architecture=x86 address-model=32_64 link=static runtime-link=static --layout=versioned stage install"
*** This will create "include" and "lib" subfolders in $HOME/local/ ***
- In the new "include" folder create a subfolder "bam"
- Using terminal navigate to the samtools folder in the ngs/applications folder
- Copy the "libbam.a" file in the samtools folder to $HOME/local/lib
- Type "cp libbam.a $HOME/local/lib"
- Copy the header files (files ending in .h) to $HOME/include/bam
- Type "cp *.h $HOME/include/bam"
3) Install tophat
- Download current version (http://tophat.cbcb.umd.edu/)
- Move to applications folder (ngs/applications)
- Using terminal navigate to the applications folder
- Decompress the package
- Type "tar zxvf tophat-1.2.0.tar.gz"
- Navigate into the decompressed folder
- Type "cd tophat-1.2.0"
- Build the package
- Type "./configure --prefix=$HOME/local --with-bam=$HOME/local"
- Type "make"
- Type "make install"
*** The executable is now available in your $PATH directory ***
4) Install Cufflinks
- Download current version (http://cufflinks.cbcb.umd.edu/tutorial.html)
- Move to applications folder (ngs/applications)
- Using terminal navigate to the applications folder
- Decompress the package
- Type "tar zxvf cufflinks-0.9.3.tar.gz"
- Navigate into the decompressed folder
- Type "cd cufflinks-0.9.3"
- Build the package
- Type "./configure --prefix=$HOME/local --with-boost=$HOME/local --with-bam=$HOME/local"
- Type "make"
- Type "make install"
***The executable is now available in your $PATH directory***
5) Test the installs
- Navigate to the bowtie folder
- Type "cd $HOME/ngs/applications/bowtie-0.12.7"
- Test the bowtie install
- Type "bowtie indexes/e_coli reads/e_coli_1000.fq"
- Should spill a bunch to the terminal window ending with:
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
Reported 699 alignments to 1 output stream(s)
- Download to the tophat test data (http://tophat.cbcb.umd.edu/tutorial.html)
- Decompress it and navigate into the downloaded folder "test_data"
- Test the tophat install
- Type "tophat -r 20 test_ref reads_1.fq reads_2.fq"
- Should create a subfolder called "tophat_out" with four files; accepted_hits.bam, deletions.bed, insertions.bed, junctions.bed
- Download the cufflinks test data (http://cufflinks.cbcb.umd.edu/tutorial.html)
- Navigate to the folder with the downloaded sam file
- Test the cufflinks install
- Type "cufflinks test_data.sam"
- Should create three files; genes.expr, transcripts.expr, and transcripts.gtfLast edited by Jon_Keats; 10-04-2011, 09:41 AM. Reason: Found error in the boost install, Follow step by step, seems to make a difference
Comment
-
Analysis
Hi Jon,
Have you started analysis. Make sure you have right Ensembl GTF file and if you can post how you linked the analysis files, I mean Cuffdiff output with tracking files so that unique identifier of each file that will be great.
Best
Comment
-
Building Tophat-Cufflinks Compatible GTF files from Ensembl
I'll apologize in advance for the length of this post but I hope the verbosity is of some use to someone else other than myself should I go through these steps again. In TopHat, Cufflinks, Cuffcompare, Cuffdiff you often have the option to use a GTF file to define exon junctions to aid in junction detection, limit abundance calculations to a defined gene list, or exclude certain elements from the abundance calculations so things like mitochondrial transcripts or ribosomal transcripts don't make up the majority of your FPKM values.
So here were my steps to get files that seem to work as expected.
1) Download the bowtie index fiile for hg19 (http://bowtie-bio.sourceforge.net/index.shtml)
2) Move to /ngs/refgenomes/bowtie_indexed/
3) Decompress
4) Run bowtie-inspect to check the chromosome list and annotation format embedded in the file
Code:bowtie-inspect -n hg19
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY
chrM
5) Download the human GTF from Ensembl (http://uswest.ensembl.org/info/data/ftp/index.html)
6) Decompress and move to ngs/refgenomes/annotation_tracks (new folder)
7) Navigate to the location of the decompressed file
6) Generate a list of chromosomes in the GTF file
Code:cut -f 1 Homo_sapiens.GRCh37.60.gtf | sort | uniq > Homo_sapiens.GRCh37.60_Unique_ChromosomeList.txt
Code:nano Homo_sapiens.GRC37.60_Unique_ChromosomeList.txt
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
GL000191.1
GL000192.1
GL000193.1
GL000194.1
GL000195.1
GL000197.1
GL000199.1
GL000200.1
GL000201.1
GL000204.1
GL000205.1
GL000209.1
GL000211.1
GL000212.1
GL000213.1
GL000214.1
GL000216.1
GL000218.1
GL000219.1
GL000220.1
GL000221.1
GL000222.1
GL000223.1
GL000224.1
GL000225.1
GL000227.1
GL000228.1
GL000229.1
GL000230.1
GL000233.1
GL000236.1
GL000237.1
GL000238.1
GL000239.1
GL000240.1
GL000241.1
GL000242.1
GL000243.1
GL000247.1
HSCHR17_1
HSCHR6_MHC_APD
HSCHR6_MHC_COX
HSCHR6_MHC_DBB
HSCHR6_MHC_MANN
HSCHR6_MHC_MCF
HSCHR6_MHC_QBL
HSCHR6_MHC_SSTO
MT
X
Y
- Delete the chromosome IDs present in the bowtie index file (ie. Delete 1-22, X, Y, MT)
- Save file with new name [control-O], change file name to "Homo_sapiens.GRCh37.60_ChrToExclude.txt, save, close nano editor [control-X]
8) Generate a new GTF file with just the chromosomes in the bowtie index
Code:grep -vf Homo_sapiens.GRCh37.60_ChrToExclude.txt Homo_sapiens.GRCh37.60.gtf > GRCh37_E60_BowtieIndexChr.gtf
Code:awk '{print "chr"$0}' GRCh37_E60_BowtieIndexChr.gtf | sed 's/chrMT/chrM/g' > GRCh37_E60_BowtieIndexCompatible.gtf
Code:cut -f 1 GRCh37_E60_BowtieIndexCompatible.gtf | sort | uniq > GRCh37_E60_BowtieIndexCompatible_Check.txt less GRCh37_E60_BowtieIndexCompatible_Check.txt
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY
***You now have a GTF file ready to use with TopHat and Cufflinks***
Creating a GTF of regions to excluded from FPKM calculations in cufflinks. Unfortunately, this will come down to personal choice I suspect. But abundance estimates from certain tissues and library prep methods could vary greatly due to differences in levels of mitochondrial RNA, ribosomal RNA, or tissue specific transcripts like immunoglobulin in my case (sucks when 50% of your reads are from 1Mb of the genome...argh).
A) Get a list of RNA types from the second column of the GTF file
Code:cut -f 2 Homo_sapiens.GRCh37.60.gtf | sort | uniq > Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt less Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt
IG_C_gene
IG_C_pseudogene
IG_D_gene
IG_J_gene
IG_J_pseudogene
IG_V_gene
IG_V_pseudogene
lincRNA
miRNA
miRNA_pseudogene
misc_RNA
misc_RNA_pseudogene
Mt_rRNA
Mt_tRNA
Mt_tRNA_pseudogene
polymorphic_pseudogene
processed_transcript
protein_coding
pseudogene
rRNA
rRNA_pseudogene
scRNA_pseudogene
snoRNA
snoRNA_pseudogene
snRNA
snRNA_pseudogene
TR_C_gene
TR_J_gene
tRNA_pseudogene
TR_V_gene
TR_V_pseudogene
B) Again this is a personal choice but I'm getting rid of all transcripts from the mitochondrial genome (chrM, Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene), those from ribosomal genes (rRNA, rRNA_pseudogene), and those from immunoglobulin elements (IG_C_gene, IG_C_pseudogene, IG_D_gene, IG_J_gene, IG_J_pseudogene, IG_V_gene, IG_V_pseudogene).
- I modified the "Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt" file to create a list to select using grep called "Homo_sapiens.GRCh37.60_AnnotationsToExclude.txt"
Looks like:
IG_C_gene
IG_C_pseudogene
IG_D_gene
IG_J_gene
IG_J_pseudogene
IG_V_gene
IG_V_pseudogene
Mt_rRNA
Mt_tRNA
Mt_tRNA_pseudogene
rRNA
rRNA_pseudogene
chrM
Code:grep -f Homo_sapiens.GRCh37.60_AnnotationsToExclude.txt GRCh37_E60_BowtieIndexCompatible.gtf > GRCh37_E60_CufflinksExcludedTranscripts.gtf
Comment
-
Thanks for this great thread
Just wanted to say thank you for this thread. Has been a great help. Now to everyone new to NGS, I recommend reading this thread and the book you mentioned about
"Unix and Perl for Biologists".
Thanks for the help
--
Prakhar
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment