Seqanswers Leaderboard Ad

**nilshomer** · 05-07-2010, 10:52 AM

Originally posted by Kurt View Post

Noticed this a 2nd ago. It looks like you are working with single end sequencing data which you wouldn't want to remove duplicates.

SAM tools

http://sourceforge.net/apps/mediawiki/samtools/index.php?title=SAM_protocol#Generate_merged_alignment

Download SAM tools for free. SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format.

(Item #6)

The duplicate removal across chromosomes would only apply for paired end data.

You should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.

**Kurt** · 05-07-2010, 11:09 AM

Originally posted by nilshomer View Post

You should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.

Would this still apply for a capture enrichment technology (say for Agilent's Sure Select platform or Rain Dance)? We haven't done those here for single end (and I'm not sure if we ever would), but I'm just wondering out loud at this point I guess. Sorry, I know that this doesn't necessarily apply to Keat's post.

**Jon_Keats** · 05-16-2010, 10:39 PM

In my case the data we have is paired-end but the first alignment test I've done was using bwa in single end mode. Odd choice I know, but this is actually an mRNAseq dataset and when aligned to genome the paired-end mode causes a lot of artifacts as it tries to pair reads between exons that often exceed the typical insert size.
Regardless, I would always recommend to remove duplicates single-end or paired-end or mate-pair for that matter. In a real life example; a whole genome seq (multiple runs), 1 library, duplicates removed per run NOT across all runs, interesting biological hit is PCR artifact (identical read in multiple runs).
Remember that for single-end reads duplicate removal limits your coverage to a max of your read length x2. Obviously it can be higher for paired-end reads were one read maybe identical but the other read is different.

I like your version golharam, thanks for sharing

**Fabrice ODEFREY** · 06-02-2010, 08:39 PM

Thanks heaps for your posts Jonathan, this is very useful.
I'm now in the same position you were a few month ago (on a SOLiD) and took the approach to first learn linux and Perl to then do some analysis (mainly because data are not there yet...). Looking forward to more interesting posts from you soon as it really helps newbies (at least me) to have a better overview of the pipeline to implement.

**Jon_Keats** · 09-08-2010, 09:44 AM

Time to Git a Linux Machine

I'm slowly getting back up and running after moving from my post-doc to an independent position. Other than learning how damn expensive everything is I'm slowly deciding that I should swear off the idea of a new MacPro workstation for a Linux workstation given all the issues I seem to run into with the "not quite so standard Mac OSX10.6 implementation of Unix". But with the idea of sharing my ongoing experiences and a trail I can follow to build my next machine I thought I'd update my thread. I hope some people have found it useful...

Some new idea's and an update

1) I'm becoming increasing certain that I'm getting good enough at command line issues to REALLY mess up my system
2) My list of used programs continues to increase as I try each new sequencing method
3) As per issue 1 - I'm also not reading instructions very well. New Rule - If at first you don't succeed...Go back and read the damn instructions again because most likely you didn't follow them correctly!

New Applications to Install

As previously stated in the post if you are using a Mac OS environment you need to do a couple of special things

A) Install Xcode on your system (See earlier post)
B) Install Fink on your system (See earlier post)
- install the following fink packages:md5deep and pkgconfig
- "fink install md5deep" (needed for bfast install)
- "fink install pkgconfig" (needed for fastx-toolkit install)
C) Install Git on your system (http://git-scm.com/)
D) Create a $PATH Directory and update this directory in your .profile (See earlier post for instructions)
- In my case "$HOME/local/bin"

1) Install FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)

***Why did I get this package***
Because I have some illumina mate-pair data that I want to analyze with BWA using sampe but the reads need to be reverse complemented to work correctly by my understanding so I'm using the fastx_reverse_complement application in the package that seems to be very fast and correctly reverse complements the reads and reverses the quality values

Instructions:
- Go to download page and download the following:
a) fastx_toolkit-0.0.13.tar.bz2
b) libgtextutils-0.6.tar.bz2
- Move both to ngs/applications folder and unpack both packages
- In Terminal navigate the libgtextutils folder "cd ngs/applications/libtestutils-0.6"
- Install the package as follows:
./configure
make
sudo make install (this will ask for your password, must be admin level privileged set)

- Move to fastx_toolkit folder "cd ../fastx_toolkit-0.0.13"
- Install the package as follows:
./configure --prefix=$HOME/local/bin
make
make install

- Test install by typing "fastx_uncollapser -h", this should pop up a usage documentation for this app

2) Install Bfast, DNAA, and Breakway

*** Why these packages***
As you might guess from the above install I now have some mate-pair data and want to try out the Breakway package from the UCLA group but it depends on two of their other packages Bfast and DNAA

Bfast - This package seems to be the vain of my existence but thankfully Nils and the helplist have been amazingly helpful

Mac Related Issues:
a) You must have fink and have installed the md5deep package otherwise "make check" will fail
b) The current sourceforge version (0.6.4e) does not install correctly thought the previous version does, however, this is a known issue and has been fixed in the master branch (if that's a new term to you we are in the same boat) but this mean you need to use the git repository version
c) Using ".configure -prefix=$HOME/local" works but makes DNAA mad when you install it so use sudo (time to be superman again)

Instructions:
a) In Terminal navigate to ngs/applications
b) Get current Bfast version from Git (restart Terminal after git installation)
- type "git clone git://bfast.git.sourceforge.net/gitroot/bfast/bfast"
- this will create a folder called "bfast" in the current directory
- Move into the directory "cd bfast"
- Install bfast by typing the following:
sh autogen.sh
./configure
make
make check
sudo make install (requests a password with admin level privileges)

- Test install and check current version by typing "bfast" in Terminal

c) Navigate back to ngs directory by typing "cd ../"
d) Get current version of DNAA from Git
- type "git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa dnaa"
- this will create a directory named "dnaa" in the current directory (ngs/applications)
- move into the the dnaa directory by typing "cd dnaa"
e) Because this package depends on both BFAST and SAMTOOLS you need to provide links to these application directories even though you already have them in a $PATH directory (/usr/local/bin and $HOME/local/bin repectively)
- create a link to the BFAST package you just installed by typing "ln -s ../bfast bfast"
- create a link to your current SAMTOOLS package by typing "ln -s ../samtools-0.1.8 samtools"
f) Install DNAA by typing the following:
sh autogen.sh
./configure
make
sudo make install (requests a password with admin level privileges)

g) Download current version of BREAKWAY from sourceforge (http://sourceforge.net/projects/breakway/), move it to the ngs/applications folder and unpack it and you should be ready to go

**Jon_Keats** · 09-13-2010, 04:55 PM

Building a Paired-End Pipeline

Up till now I've been frustrated because I could not automate a variety of pairing steps that occur as I process raw data to BAM files. This is usually either in the SAMPE step of BWA or when I wanted to merge multiple lanes into one BAM file. I think I convinced myself that I can just use the "cat" function to merge the multiple lanes together before processing, which ends up being a simple solution as long as all the lanes are available at the same time. For the SAMPE pairing I spent sometime with my Unix guru from France when he came over to visit his wife and I seem to have a workable solution as long as a specific file tree structure is used in conjunction with two unix scripts, one that processes each pair from raw data to two sort BAM files, one with and without duplicates, and a second that pulls each sample into the analysis framework and launches the aforementioned script. So since this requires a specific directory structure I've updated my directory structure script to version 3.

Code:

#!/bin/sh

# Create_NGS_DirectoryStructure_V3.sh
# 
#
# Created by Jonathan Keats on 9/3/10 based on suggestion from Ryan Golhar on my Seqanswers thread.
# Translational Genomics Research Institute
#
#########################################################################
#  CREATES A DIRECTORY STURCTURE TO SUPPORT A VARIETY OF NGS PIPELINES  #
#########################################################################
#
# Designed for a Mac OS enviroment and requires initiation from your home folder (/User/You/)

# Check to confirm current location is $HOME/ (ie. /User/You/)

echo "Confirming Script Initiation Directory"
var1=$HOME
if [ "`pwd`" != "$var1" ] 
	then 
	echo " The script must be launched from your home directory "
	echo " The script was automatically killed due to a launch error - See Above Error Message" 
	exit 2                              
fi
echo "1) Launch Location is Correct ($HOME/)"

# Create required directories to support pipelines (BWAse, BWApe, and others to come...)

echo ***Creating Pipeline Directory Structure***
mkdir -p ngs/{analysisnotes,applications,scripts}
mkdir -p ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,genome_downloads}
mkdir -p ngs/refgenomes/genome_downloads/{hg18,hg19}
mkdir -p ngs/finaloutputs/{alignmentresults_bwa,illumina,sangerfastq}
mkdir -p ngs/finaloutputs/bamfiles/{merged,sorted,nodups}
mkdir -p ngs/bwase/inputsequences/{illumina,sangerfastq}
mkdir -p ngs/bwase/samfiles
mkdir -p ngs/bwase/bamfiles/{merged,original,sorted,nodups}
mkdir -p ngs/bwape/samfiles
mkdir -p ngs/bwape/bamfiles/{merged,original,sorted,nodups}
mkdir -p ngs/bwape/inputsequences/{illumina,sangerfastq,hold}
mkdir -p ngs/bwape/inputsequences/illumina/{read1,read2}
mkdir -p ngs/bwape/inputsequences/sangerfastq/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/{lane1,lane2,lane3,lane4,lane5,lane6,lane7,lane8}
mkdir -p ngs/bwape/inputsequences/hold/lane1/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane2/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane3/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane4/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane5/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane6/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane7/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane8/{read1,read2}

mv create_ngs_directorystructure_v3.sh ngs/scripts/
echo ***Pipeline Directory Structure Created***

**jdanderson** · 09-22-2010, 02:01 PM

Just wanted to chime in and affirm the sentiment that several people have already expressed; thank you for your posts, they are quite helpful. I am just starting out and this forum and your posts are eminently helpful. Thank you for taking the time to post!

**maverick123** · 09-26-2010, 01:20 AM

Hello i am ronnie .. i am from chandigath city... currently i am btech student..and its a nice informative thread...

**Jon_Keats** · 09-28-2010, 07:36 PM

BWA SAMPE Pipeline Version

As I mentioned before its taken a while to sort out a method that can automate a paired-end analysis using BWA but it seems to work now. Feel free to use the scripts below in conjunction with the "create_ngs_directorystructure_v3.sh" script that creates the required directory structure.
The following two scripts can be used to process files using BWA to automate a paired-end analysis from the output "s_x_sequence.txt" files to aligned, indexed, and duplicate removed BAM files. The design of the pipeline has a couple of requirements:

1) You need to have all the required applications in a $PATH directory. As detailed in this thread I personally use "$HOME/local/bin".
2) You will need; MAQ with ill2sanger patch installed, BWA, SAMTOOLS, and PICARD MarkDuplicates.jar in this path directory.
NOTE: If you use a different path directory you need to alter line 623 of BWApe_hg18_v1.sh as MarkDuplicates.jar is being called specifically from this directory while all others are being called through the $PATH directory. ****If you know how to put a directory in the JAVA path on a Mac drop me a line****
3) Both shell scripts are designed to be in your $PATH directory so you can call them from the ngs directory using "BWApe_hg18_v1.sh" for a single sample analysis or "multi_bwape_analysis_v1.sh" for a multiple sample analysis. Alternatively, you can place them in the "/ngs" folder and call them directly using "./BWApe_hg18_v1.sh" or "./multi_bwape_analysis_v1.sh" (NOTE: If you do this you need to modify the lines that launch BWApe_hg18_v1.sh to include the direct launch indicator "./"
4) The input file names must be unique and end with a "_R1.txt" read identifier such as "YourSample_R1.txt" and "YourSample_R2.txt"

NOTE: The name BWApe_hg18_v1.sh only reflects the reference genome used in the development of the script. You can easily change to what ever genome mouse, human you want to use you just need to generate the bwa index and update the BWApe_hg18_v1.sh script as indicated in the script.

NOTE: If using "BWApe_hg18_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2". If using "multi_bwape_analysis_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/hold/laneX/read1" and "ngs/bwape/inputsequences/hold/laneX/read2" as appropriate to your sample set. The script is only designed for 8 lanes/samples so if you have more you needed to copy/paste to extend the script. After completing each lane/sample it checks to see if there is data for another lane/sample in the next sequential lane/sample folder and process it if available or ends the script if it is empty, so you need to put files in the hold/lane1, 2, 3, 4, 5, 6, 7, and 8 read folders in order.

Code:

#!/bin/sh

# BWApe_hg18_V1.sh
# Created by Jonathan Keats
# Translational Genomics Research Institute

# This script is designed to take a batch of raw Illumina 1.3+ reads to sorted and indexed BAM files with and without duplicates using BWA in paired end mode.
# It is designed to be initiated from a folder called "ngs" in your $HOME folder with a specific subdirectory structure
# To create the directory struture launch "create_ngs_directorystructure_v3.sh" from your "$Home" folder

####################################################################################################
##  To Run This Script You Must Have The Following Applications In One Of Your $PATH Directories  ##
##						1) MAQ with ill2sanger patch installed									  ##
##						2) BWA																	  ##
##						3) SAMTOOLS																  ##
##						4) PICARD - MarkDuplicates.jar (Must be in $HOME/local/bin)				  ##
####################################################################################################

# To run this script you MUST first place your reference file in ngs/refgenomes/bwa_indexed and have run the "bwa index" command to create the BWT index files

######################################################################################################
# WARNING - YOU MUST ENSURE THE NAME OF YOUR REFERENCE GENOME FILE MATCHES LINES (274, 310, and 367) #
###################################################################################################### 

# The script is based on having ***RENAMED*** Illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2"
# The renamed format ***MUST*** be "YourSampleName_R1.txt" and "YourSampleName_R2.txt" otherwise pairing and renaming will not occur correctly
# Multiple lanes should be concatinated together before initiating the script, unless you want to manually merge in samtools 
# At each step it queries specific folders for available files and passes them to the next analysis module
# After each step the filename extension of the output files are corrected. (ie. "MySequenceFile_R1.txt.fastq" to "MySequenceFile_R1.fastq")
# Order of Embedded Steps	- Converts Illumina 1.3+ fastq files "s_1_sequence.txt" to Sanger fastq files "s_1_sequence.fastq" using "maq ill2sanger" command
#							- Aligns created fastq files to reference genome using "bwa aln" command
#							- Generates SAM files from alignment files using "bwa sampe" command
#							- Converts SAM files to BAM files using "samtools view" command
#							- Sorts BAM files using "samtools sort" command
#							- Indexes the sorted BAM files for use in IGV browser using "samtools index" command
#							- Removes duplicates from the sorted bam files using "picard - MarkDuplicates.jar" command
#							- Indexes the no duplicates BAM files for use in IGV browser using "samtools index" command
#							- Final output files are archived then the input and analysis directories are cleaned-up and readied for the next analysis batch
# The script creates a log file in /ngs/analysisnotes to track the steps completed and the time each step started and finished
# Some of the log events will print to both the terminal screen and the log file so you can see what is going on
# Much of this would not be possible with out the help of a former colleagues husband who is a Unix programmer in France so I've kept some french terms such as linge instead of line in his honor (thanks Charabelle)

#Starting directory = $HOME/ngs

#In this step	- We check that you are lauching the script from the correct location in case you are using it from a path directory
#				- We check that the destination directories used by the script are empty to prevent deleting erroneous files and unexpected analysis events
#				- Hope to add a check for available disk space

echo ***Checking Diretory Structure***

#List of directoryies to check
var1=$HOME/ngs
var2=$HOME/ngs/bwape/samfiles
var3=$HOME/ngs/bwape/bamfiles/merged
var4=$HOME/ngs/bwape/bamfiles/original
var5=$HOME/ngs/bwape/bamfiles/sorted
var6=$HOME/ngs/bwape/bamfiles/nodups
var7=$HOME/ngs/bwape/inputsequences/sangerfastq/read1
var8=$HOME/ngs/bwape/inputsequences/sangerfastq/read2

#Checking if launch location is correct

if [ "`pwd`" != "$var1" ] 
	then 
	echo " The script must be launched from the NGS directory "
	echo " The script was automatically killed due to a launch error - See Above Error Message" 
	exit 2                              
fi
echo "1) Launch Location is Correct ($HOME/ngs)"

#Checking if analysis directories are empty


if [ `ls $var2 | wc -l` != 0 ]       
	then 
	echo " The bwape/samfiles directory is not empty - Any data in this directory would be deleted by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "2) bwape/samfiles directory is empty as required"
if [ `ls $var3 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/merged directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/merged "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "3) bwape/bamfiles/merged directory is empty as required"
if [ `ls $var4 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/original directory is not empty - Any data in this directory would be deleted by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "4) bwape/bamfiles/original directory is empty as required"
if [ `ls $var5 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/sorted directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/sorted by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "5) bwape/bamfiles/sorted directory is empty as required"
if [ `ls $var6 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/nodups directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "6) bwape/bamfiles/nodups directory is empty as required"
if [ `ls $var7 | wc -l` != 0 ]       
	then 
	echo " The bwape/illuminasequences/sangerfastq/read1 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "7) bwape/illuminasequences/sangerfastq/read1 directory is empty as required"
if [ `ls $var8 | wc -l` != 0 ]       
	then 
	echo " The bwape/illuminasequences/sangerfastq/read2 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "8) bwape/illuminasequences/sangerfastq/read2 directory is empty as required"

echo ***Pre Run Check Completed Successfully***

#Current directory=ngs
echo ***Starting BWA SAMPE Analysis Batch***
date '+%m/%d/%y %H:%M:%S'

#The following step creates the log file in the AnalysisNotes subdirectory the first time the script is run
#On subsequent runs the results are printed at the bottom of the pre-existing log file

echo ***Starting BWA SAMPE Analysis Batch*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we convert the "Read1" illumina fastq files to sanger fastq files using the maq ill2sanger script

echo Starting Step1a - Read1 Illumina to Sanger Fastq Conversion with maq ill2sanger
date '+%m/%d/%y %H:%M:%S'
echo Starting Step1a - Illumina to Sanger Fastq Conversion with maq ill2sanger >> AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
cd bwape/inputsequences/illumina/read1
#Current directory = ngs/bwape/inputsequences/illumina/read1
echo Converting the following Illumina files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
maq ill2sanger $ligne ../../sangerfastq/read1/$ligne.fastq
done

#In the next step we clean up the Illumina Read1 folder so it is ready for the next analysis batch

echo Cleaning up Input Sequences Illumina Read1
date '+%m/%d/%y %H:%M:%S'
echo Cleaning up Input Sequences Illumina Read1 >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read1 to ngs/finaloutputs/illumina:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read1 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../../finaloutputs/illumina
done

#In the next step we rename the "Read1" sanger format fastq files from ".txt.fastq" extensions to ".fastq"

cd ../../sangerfastq/read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
old_ext=txt.fastq
new_ext=fastq
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step1a - Illumina to Sanger Fastq Conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step1a - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we convert the "Read2" illumina fastq files to sanger fastq files using the maq ill2sanger script

cd ../../illumina/read2
#Current directory = ngs/bwape/inputsequences/illumina/read2
echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger
date '+%m/%d/%y %H:%M:%S'
echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger >> ../../../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo Converting the following Illumina files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
maq ill2sanger $ligne ../../sangerfastq/read2/$ligne.fastq
done

#In the next step we clean up the Illumina Read2 folder so it is ready for the next analysis batch

echo Cleaning up Input Sequences Illumina Read2
date '+%m/%d/%y %H:%M:%S'
echo Cleaning up Input Sequences Illumina Read2 >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read2 to ngs/finaloutputs/illumina:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read2 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../../finaloutputs/illumina
done

#In the next step we rename the "Read2" sanger format fastq files from ".txt.fastq" extensions to ".fastq"

cd ../../sangerfastq/read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
old_ext=txt.fastq
new_ext=fastq
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step1b - Illumina to Sanger Fastq Conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step1b - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will align the converted "Read1" sanger fastq format files to the reference genome

echo Starting Step2a - Read1 bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step2a - Read1 bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
cd ../read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
echo The following fastq files will be aligned:
for ligne in `ls *.fastq`
do                                                                     
echo $ligne
done
echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai 	 
done

#In the next step we will rename the "Read1" alignment files

old_ext=.fastq.sai
new_ext=_bwa.sai
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step2a - bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step2a - bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will align the converted "Read2" sanger fastq format files to the reference genome

echo Starting Step2b - Read2 bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step2b - Read2 bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
cd ../read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
echo The following fastq files will be aligned:
for ligne in `ls *.fastq`
do                                                                     
echo $ligne
done
echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai 	 
done

#In the next step we will rename the "Read2" alignment files

old_ext=.fastq.sai
new_ext=_bwa.sai
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step2b - bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step2b - bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will generate SAM files for the alignments using bwa sampe

echo Starting Step3 - bwa sampe process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step3 - bwa sampe process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo The following alignment files will be converted to SAM files:

cd ../read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
for ligne in `ls *.sai`
do                                                                     
aln1=`echo $ligne`
done
echo $aln1
for ligne in `ls *.fastq`
do                                                                     
read1=`echo $ligne`
done
echo $read1

cd ../read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
for ligne in `ls *.sai`
do                                                                     
aln2=`echo $ligne`
done
echo $aln2
for ligne in `ls *.fastq`
do                                                                     
read2=`echo $ligne`
done
echo $read2
echo The following alignment files will be converted to SAM files: >> ../../../../analysisnotes/Analysis.log
echo $aln1 >> ../../../../analysisnotes/Analysis.log
echo $read1 >> ../../../../analysisnotes/Analysis.log
echo $aln2 >> ../../../../analysisnotes/Analysis.log
echo $read2 >> ../../../../analysisnotes/Analysis.log
cd ../../../samfiles
#Current directory = ngs/bwape/samfiles
#(bwa sampe <database.fasta> <aln1.sai> <aln2.sai> <input1.fastq> <input2.fastq> > aln.sam)
bwa sampe ../../refgenomes/bwa_indexed/hg18.fasta ../inputsequences/sangerfastq/read1/$aln1 ../inputsequences/sangerfastq/read2/$aln2 ../inputsequences/sangerfastq/read1/$read1 ../inputsequences/sangerfastq/read2/$read2 > $read1.sam

#In the next step we will rename the SAM files generated by bwa sampe analysis of the "Read1" and "Read2" alignment files

old_ext=_R1.fastq.sam
new_ext=_bwape.sam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step3 - bwa sampe process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step3 - bwa sampe process >> ../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log

#In the next step we will convert each SAM file to a BAM file

echo Starting Step4 - samtools SAM to BAM conversion
date '+%m/%d/%y %H:%M:%S'
echo Starting Step4 - samtools SAM to BAM conversion >> ../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log
echo The following SAM files will be converted to BAM files:
for ligne in `ls *.sam`
do                                                                     
echo $ligne
done
echo The following SAM files will be converted to BAM files: >> ../../analysisnotes/Analysis.log
for ligne in `ls *.sam`
do                                                                     
echo $ligne >> ../../analysisnotes/Analysis.log
done
for ligne in `ls *.sam`
do
samtools view -bS -o ../bamfiles/original/$ligne.bam $ligne
done

#In the next step we will delete the SAM file to save disc space as the BAM file contains all the data in a binary format

echo Deleting the following SAM Files from ngs/bwape/samfiles:
for ligne in `ls *.sam`
do
echo $ligne
done
echo Deleting the following SAM Files from ngs/bwape/samfiles: >> ../../analysisnotes/Analysis.log
for ligne in `ls *.sam`
do
echo $ligne >> ../../analysisnotes/Analysis.log
done
for ligne in `ls *.sam`
do
rm $ligne
done
echo Deleting SAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Deleting SAM Files Complete >> ../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log

#In the next step we clean up the Sanger Fastq "Read1" folder so it is ready for the next analyis batch

cd ../inputsequences/sangerfastq/read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq:
for ligne in `ls *.fastq`
do
echo $ligne
done
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
mv $ligne ../../../../finaloutputs/sangerfastq/
done
echo Moving Sanger Format Fastq Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa:
for ligne in `ls *.sai`
do
echo $ligne
done
echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.sai`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.sai`
do
mv $ligne ../../../../finaloutputs/alignmentresults_bwa/
done
echo Moving Alignment Results Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we clean up the Sanger Fastq "Read2" folder so it is ready for the next analyis batch

cd ../read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq:
for ligne in `ls *.fastq`
do
echo $ligne
done
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
mv $ligne ../../../../finaloutputs/sangerfastq/
done
echo Moving Sanger Format Fastq Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa:
for ligne in `ls *.sai`
do
echo $ligne
done
echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.sai`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.sai`
do
mv $ligne ../../../../finaloutputs/alignmentresults_bwa/
done
echo Moving Alignment Results Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will rename the BAM files created by the samtools SAM-to-BAM conversion process

cd ../../../bamfiles/original
#Current directory = ngs/bwape/bamfiles/original
old_ext=sam.bam
new_ext=bam
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step4 - samtools SAM to BAM conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step4 - samtools SAM to BAM conversion >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will sort the BAM file by chromosome coordinate

echo Starting Step5 - samtools BAM sorting process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo The following BAM files will be sorted:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be sorted: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools sort $ligne ../Sorted/$ligne
done

#In the next step we will delete the original unsorted BAM file to save disc space as the sorted BAM contains all the needed information

echo Deleting the following BAM Files from ngs/bwape/bamfiles/original:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Deleting the following BAM Files from ngs/bwape/bamfiles/original: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do
rm $ligne
done
echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete
date '+%m/%d/%y %H:%M:%S'
echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will rename the sort BAM files created by the samtools sort process

cd ../sorted
#Current directory = ngs/bwape/bamfiles/sorted
old_ext=.bam.bam
new_ext=_sorted.bam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step5 - samtools BAM sorting process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will index the sorted BAM files for fast access and viewing in the IGV browser

echo Starting Step6 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo The following BAM files will be indexed:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools index $ligne
done
echo Finished Step6 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will remove the duplicate reads from the sorted bam files

echo Starting Step7 - picard markduplicates process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo Duplicate reads will be removed from the following sorted BAM files:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo Duplicate reads will be removed from the following sorted BAM files: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
java -Xmx2g -jar $HOME/local/bin/MarkDuplicates.jar INPUT=$ligne OUTPUT=../nodups/$ligne METRICS_FILE=../nodups/$ligne.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT
done

#In the next step we clean up the Sorted BAM files folder so it is ready for the next analyis batch

echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do
mv $ligne ../../../finaloutputs/bamfiles/sorted/
done
echo Moving Sorted BAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sorted BAM Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted:
for ligne in `ls *.bai`
do
echo $ligne
done
echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bai`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bai`
do
mv $ligne ../../../finaloutputs/bamfiles/sorted/
done
echo Moving Sorted BAM Index Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sorted BAM Index Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will rename the BAM files and Metrics files created after duplicate removal by picard

cd ../nodups
#Current directory = ngs/bwape/bamfiles/nodups
old_ext=_sorted.bam
new_ext=_sorted_nodups.bam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
old_ext=_sorted.bam.txt
new_ext=_sorted_nodups_metrics.txt
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step7 - picard markduplicates process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will index the nodups BAM files for fast access and viewing in the IGV browser

echo Starting Step8 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo The following BAM files will be indexed:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools index $ligne
done

#In the next step we clean up the nodups BAM files folder so it is ready for the next analyis batch

echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do
mv $ligne ../../../finaloutputs/bamfiles/nodups/
done
echo Moving NoDups BAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving NoDups BAM Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
for ligne in `ls *.bai`
do
echo $ligne
done
echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bai`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bai`
do
mv $ligne ../../../finaloutputs/bamfiles/nodups/
done
echo Moving NoDups BAM Index Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving NoDups BAM Index Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../finaloutputs/bamfiles/nodups/
done
echo Moving MarkDuplicates Metrics Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving MarkDuplicates Metrics Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Finished Step8 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we return to the launch folder $HOME/Documents/ngs

cd ../../..
#Current directory = ngs/
echo ***Analysis Batch Complete***
echo ***Analysis Batch Complete*** >> analysisnotes/Analysis.log

Code:

#!/bin/sh

# multi_bwape_analysis_v1.sh
# 
#
# Created by Jonathan Keats on 9/5/10.
# Translational Genomics Research Institute

# This script is designed to allow multiple samples/lanes of paired-end illumina data to be passed into the "BWApe_hg18_v1" pipeline

###############################################################################################################################
## To facilitate its use you must put uniquely named Illumina 1.3+ files in ngs/bwape/inputsequences/hold/lane(X)/read(1-2)   #
## It is essential that these file names are uniquely name or overwriting will occur										  #
## These files MUST have the ".txt" extension characteristic of the Illumina V1.3+ output "s_x_sequences.txt"				  #
###############################################################################################################################

#In this step we check that you are lauching the script from the correct location in case you are using it from a path directory

echo ***Checking Current Directory is Correct***

#List of directoryies to check
temp1=$HOME/ngs

#Checking if launch location is correct

if [ "`pwd`" != "$temp1" ] 
	then 
	echo " The script must be launched from the NGS directory "
	echo " The script was automatically killed due to a launch error - See Above Error Message" 
	exit 2                              
fi
echo ***Current	Directory is Correct***

#Check if files exist in the lane1 hold folder

#List of directoryies to check
temp2=$HOME/ngs/bwape/inputsequences/hold/lane1/read1
temp3=$HOME/ngs/bwape/inputsequences/hold/lane1/read2

echo ***Checking Lane1 Hold Folder***
if [ `ls $temp2 | wc -l` != 1 ]       
	then 
	echo " The Lane1 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp3 | wc -l` != 1 ]       
	then 
	echo " The Lane1 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane1***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane1*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane1" data from ngs/bwape/inputsequences/hold/lane1/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane1/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane1/read1
echo Moving Lane1 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane1/read2
echo Moving Lane1 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane1 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample2/lane2 data into the analysis directories

#Check if files exist in the lane2 hold folder

#List of directoryies to check
temp4=$HOME/ngs/bwape/inputsequences/hold/lane2/read1
temp5=$HOME/ngs/bwape/inputsequences/hold/lane2/read2

echo ***Checking Lane2 Hold Folder***
if [ `ls $temp4 | wc -l` != 1 ]       
	then 
	echo " The Lane2 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp5 | wc -l` != 1 ]       
	then 
	echo " The Lane2 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane2***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane2*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane2" data from ngs/bwape/inputsequences/hold/lane2/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane2/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane2/read1
echo Moving Lane2 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane2/read2
echo Moving Lane2 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane2 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample3/lane3 data into the analysis directories

#Check if files exist in the lane3 hold folder

#List of directoryies to check
temp6=$HOME/ngs/bwape/inputsequences/hold/lane3/read1
temp7=$HOME/ngs/bwape/inputsequences/hold/lane3/read2

echo ***Checking Lane3 Hold Folder***
if [ `ls $temp6 | wc -l` != 1 ]       
	then 
	echo " The Lane3 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp7 | wc -l` != 1 ]       
	then 
	echo " The Lane3 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane3***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane3*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane3" data from ngs/bwape/inputsequences/hold/lane3/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane3/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane3/read1
echo Moving Lane3 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane3/read2
echo Moving Lane3 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane3 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample4/lane4 data into the analysis directories

#Check if files exist in the lane4 hold folder

#List of directoryies to check
temp8=$HOME/ngs/bwape/inputsequences/hold/lane4/read1
temp9=$HOME/ngs/bwape/inputsequences/hold/lane4/read2

echo ***Checking Lane4 Hold Folder***
if [ `ls $temp8 | wc -l` != 1 ]       
	then 
	echo " The Lane4 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp9 | wc -l` != 1 ]       
	then 
	echo " The Lane4 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane4***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane4*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane4" data from ngs/bwape/inputsequences/hold/lane4/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane4/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane4/read1
echo Moving Lane4 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane4/read2
echo Moving Lane4 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane4 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample5/lane5 data into the analysis directories

#Check if files exist in the lane1 hold folder

#List of directoryies to check
temp10=$HOME/ngs/bwape/inputsequences/hold/lane5/read1
temp11=$HOME/ngs/bwape/inputsequences/hold/lane5/read2

echo ***Checking Lane1 Hold Folder***
if [ `ls $temp10 | wc -l` != 1 ]       
	then 
	echo " The Lane5 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp11 | wc -l` != 1 ]       
	then 
	echo " The Lane5 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane5***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane5*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane5" data from ngs/bwape/inputsequences/hold/lane5/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane5/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane5/read1
echo Moving Lane5 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane5/read2
echo Moving Lane5 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane5 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample6/lane6 data into the analysis directories

#Check if files exist in the lane6 hold folder

#List of directoryies to check
temp12=$HOME/ngs/bwape/inputsequences/hold/lane6/read1
temp13=$HOME/ngs/bwape/inputsequences/hold/lane6/read2

echo ***Checking Lane6 Hold Folder***
if [ `ls $temp12 | wc -l` != 1 ]       
	then 
	echo " The Lane6 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp13 | wc -l` != 1 ]       
	then 
	echo " The Lane6 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane6***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane6*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane6" data from ngs/bwape/inputsequences/hold/lane6/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane6/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane6/read1
echo Moving Lane6 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane6/read2
echo Moving Lane6 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane6 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample7/lane7 data into the analysis directories

#Check if files exist in the lane7 hold folder

#List of directoryies to check
temp14=$HOME/ngs/bwape/inputsequences/hold/lane7/read1
temp15=$HOME/ngs/bwape/inputsequences/hold/lane7/read2

echo ***Checking Lane7 Hold Folder***
if [ `ls $temp14 | wc -l` != 1 ]       
	then 
	echo " The Lane7 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp15 | wc -l` != 1 ]       
	then 
	echo " The Lane7 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane7***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane7*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane7" data from ngs/bwape/inputsequences/hold/lane7/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane7/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane7/read1
echo Moving Lane7 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane3/read2
echo Moving Lane7 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane7 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample8/lane8 data into the analysis directories

#Check if files exist in the lane8 hold folder

#List of directoryies to check
temp16=$HOME/ngs/bwape/inputsequences/hold/lane8/read1
temp17=$HOME/ngs/bwape/inputsequences/hold/lane8/read2

echo ***Checking Lane8 Hold Folder***
if [ `ls $temp16 | wc -l` != 1 ]       
	then 
	echo " The Lane8 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp17 | wc -l` != 1 ]       
	then 
	echo " The Lane8 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane8***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane8*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane8" data from ngs/bwape/inputsequences/hold/lane8/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane8/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane8/read1
echo Moving Lane8 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane8/read2
echo Moving Lane8 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane8 Analysis Complete***

**JBuenrostro** · 01-04-2011, 09:24 PM

Best post

Glad I found this, so far its the best post I've seen here. Thanks for the help!

**honey** · 01-09-2011, 06:52 PM

Jon Keats,
Good job! I dont think you are alone there are several researchers in the same situation. I am also one.

Good luck!

**Jon_Keats** · 01-25-2011, 12:20 AM

Getting the Mens Formal Wear Packages Going

I've finally jumped into the TopHat-Cufflinks world for RNAseq analysis. Because most of the pre-compiled binaries are for Mac OSX10.5 not 10.6 I've built all the binaries from the source code. As previous I've included detailed instructions on the install.

1) Install bowtie

- Download current version
(http://sourceforge.net/projects/bowtie-bio/files/bowtie)
- Move to applications folder (ngs/applications)
- Decompress
- Using terminal navigate to the unpacked bowtie folder
- To make the package type "make"
- Copy "bowtie", "bowtie-build", and "bowtie-inspect" to your path directory
- If you follow this thread I use $HOME/local/bin
- Thus type: "cp bowtie $HOME/local/bin"
"cp bowtie-build $HOME/local/bin"
"cp bowtie-inspect $HOME/local/bin"

2) Install Boost and Configure $PATH directory to support tophat and cufflinks install

*** If not installed, download and install Samtools and copy the binary to $PATH directory ($HOME/local/bin)***
*** See previous posts if you need instructions ***

- Download Boost version 1.45.0 (http://www.boost.org/) [boost_1_45_0.tar.bz2]
- Move to applications folder (ngs/applications)
- Decompress the package (double click)
- Using terminal navigate to the decompressed folder (ngs/applications/boost_1_45_0)
- Build the package
- Type "./bootstrap.sh"
- Type "./bjam --prefix=$HOME/local --toolset=darwin architecture=x86 address-model=32_64 link=static runtime-link=static --layout=versioned stage install"
*** This will create "include" and "lib" subfolders in $HOME/local/ ***

- In the new "include" folder create a subfolder "bam"
- Using terminal navigate to the samtools folder in the ngs/applications folder
- Copy the "libbam.a" file in the samtools folder to $HOME/local/lib
- Type "cp libbam.a $HOME/local/lib"
- Copy the header files (files ending in .h) to $HOME/include/bam
- Type "cp *.h $HOME/include/bam"

3) Install tophat

- Download current version (http://tophat.cbcb.umd.edu/)
- Move to applications folder (ngs/applications)
- Using terminal navigate to the applications folder
- Decompress the package
- Type "tar zxvf tophat-1.2.0.tar.gz"
- Navigate into the decompressed folder
- Type "cd tophat-1.2.0"
- Build the package
- Type "./configure --prefix=$HOME/local --with-bam=$HOME/local"
- Type "make"
- Type "make install"
*** The executable is now available in your $PATH directory ***

4) Install Cufflinks

- Download current version (http://cufflinks.cbcb.umd.edu/tutorial.html)
- Move to applications folder (ngs/applications)
- Using terminal navigate to the applications folder
- Decompress the package
- Type "tar zxvf cufflinks-0.9.3.tar.gz"
- Navigate into the decompressed folder
- Type "cd cufflinks-0.9.3"
- Build the package
- Type "./configure --prefix=$HOME/local --with-boost=$HOME/local --with-bam=$HOME/local"
- Type "make"
- Type "make install"
***The executable is now available in your $PATH directory***

5) Test the installs

- Navigate to the bowtie folder
- Type "cd $HOME/ngs/applications/bowtie-0.12.7"
- Test the bowtie install
- Type "bowtie indexes/e_coli reads/e_coli_1000.fq"
- Should spill a bunch to the terminal window ending with:
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
Reported 699 alignments to 1 output stream(s)

- Download to the tophat test data (http://tophat.cbcb.umd.edu/tutorial.html)
- Decompress it and navigate into the downloaded folder "test_data"
- Test the tophat install
- Type "tophat -r 20 test_ref reads_1.fq reads_2.fq"
- Should create a subfolder called "tophat_out" with four files; accepted_hits.bam, deletions.bed, insertions.bed, junctions.bed
- Download the cufflinks test data (http://cufflinks.cbcb.umd.edu/tutorial.html)
- Navigate to the folder with the downloaded sam file
- Test the cufflinks install
- Type "cufflinks test_data.sam"
- Should create three files; genes.expr, transcripts.expr, and transcripts.gtf

**honey** · 01-25-2011, 12:38 AM

Analysis

Hi Jon,

Have you started analysis. Make sure you have right Ensembl GTF file and if you can post how you linked the analysis files, I mean Cuffdiff output with tracking files so that unique identifier of each file that will be great.

Best

**Jon_Keats** · 01-26-2011, 10:40 PM

Building Tophat-Cufflinks Compatible GTF files from Ensembl

I'll apologize in advance for the length of this post but I hope the verbosity is of some use to someone else other than myself should I go through these steps again. In TopHat, Cufflinks, Cuffcompare, Cuffdiff you often have the option to use a GTF file to define exon junctions to aid in junction detection, limit abundance calculations to a defined gene list, or exclude certain elements from the abundance calculations so things like mitochondrial transcripts or ribosomal transcripts don't make up the majority of your FPKM values.

So here were my steps to get files that seem to work as expected.

1) Download the bowtie index fiile for hg19 (http://bowtie-bio.sourceforge.net/index.shtml)
2) Move to /ngs/refgenomes/bowtie_indexed/
3) Decompress
4) Run bowtie-inspect to check the chromosome list and annotation format embedded in the file

Code:

bowtie-inspect -n hg19

Output:
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY
chrM

5) Download the human GTF from Ensembl (http://uswest.ensembl.org/info/data/ftp/index.html)
6) Decompress and move to ngs/refgenomes/annotation_tracks (new folder)
7) Navigate to the location of the decompressed file
6) Generate a list of chromosomes in the GTF file

Code:

cut -f 1 Homo_sapiens.GRCh37.60.gtf | sort  | uniq > Homo_sapiens.GRCh37.60_Unique_ChromosomeList.txt

7) Check and modify the file output file

Code:

nano Homo_sapiens.GRC37.60_Unique_ChromosomeList.txt

Output:
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
GL000191.1
GL000192.1
GL000193.1
GL000194.1
GL000195.1
GL000197.1
GL000199.1
GL000200.1
GL000201.1
GL000204.1
GL000205.1
GL000209.1
GL000211.1
GL000212.1
GL000213.1
GL000214.1
GL000216.1
GL000218.1
GL000219.1
GL000220.1
GL000221.1
GL000222.1
GL000223.1
GL000224.1
GL000225.1
GL000227.1
GL000228.1
GL000229.1
GL000230.1
GL000233.1
GL000236.1
GL000237.1
GL000238.1
GL000239.1
GL000240.1
GL000241.1
GL000242.1
GL000243.1
GL000247.1
HSCHR17_1
HSCHR6_MHC_APD
HSCHR6_MHC_COX
HSCHR6_MHC_DBB
HSCHR6_MHC_MANN
HSCHR6_MHC_MCF
HSCHR6_MHC_QBL
HSCHR6_MHC_SSTO
MT
X
Y

- Delete the chromosome IDs present in the bowtie index file (ie. Delete 1-22, X, Y, MT)
- Save file with new name [control-O], change file name to "Homo_sapiens.GRCh37.60_ChrToExclude.txt, save, close nano editor [control-X]
8) Generate a new GTF file with just the chromosomes in the bowtie index

Code:

grep -vf Homo_sapiens.GRCh37.60_ChrToExclude.txt Homo_sapiens.GRCh37.60.gtf > GRCh37_E60_BowtieIndexChr.gtf

9) Update the new GTF chromosome names from 1, 2, 3, ... to chr1, chr2, chr3, ... to match the bowtie index nomenclature

Code:

awk '{print "chr"$0}' GRCh37_E60_BowtieIndexChr.gtf | sed 's/chrMT/chrM/g' > GRCh37_E60_BowtieIndexCompatible.gtf

10) Check that the new output file chromosome IDs match the bowtie index

Code:

cut -f 1 GRCh37_E60_BowtieIndexCompatible.gtf | sort | uniq > GRCh37_E60_BowtieIndexCompatible_Check.txt
less GRCh37_E60_BowtieIndexCompatible_Check.txt

Output:
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY

***You now have a GTF file ready to use with TopHat and Cufflinks***

Creating a GTF of regions to excluded from FPKM calculations in cufflinks. Unfortunately, this will come down to personal choice I suspect. But abundance estimates from certain tissues and library prep methods could vary greatly due to differences in levels of mitochondrial RNA, ribosomal RNA, or tissue specific transcripts like immunoglobulin in my case (sucks when 50% of your reads are from 1Mb of the genome...argh).

A) Get a list of RNA types from the second column of the GTF file

Code:

cut -f 2 Homo_sapiens.GRCh37.60.gtf | sort | uniq > Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt
less Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt

Output:
IG_C_gene
IG_C_pseudogene
IG_D_gene
IG_J_gene
IG_J_pseudogene
IG_V_gene
IG_V_pseudogene
lincRNA
miRNA
miRNA_pseudogene
misc_RNA
misc_RNA_pseudogene
Mt_rRNA
Mt_tRNA
Mt_tRNA_pseudogene
polymorphic_pseudogene
processed_transcript
protein_coding
pseudogene
rRNA
rRNA_pseudogene
scRNA_pseudogene
snoRNA
snoRNA_pseudogene
snRNA
snRNA_pseudogene
TR_C_gene
TR_J_gene
tRNA_pseudogene
TR_V_gene
TR_V_pseudogene

B) Again this is a personal choice but I'm getting rid of all transcripts from the mitochondrial genome (chrM, Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene), those from ribosomal genes (rRNA, rRNA_pseudogene), and those from immunoglobulin elements (IG_C_gene, IG_C_pseudogene, IG_D_gene, IG_J_gene, IG_J_pseudogene, IG_V_gene, IG_V_pseudogene).

- I modified the "Homo_sapiens.GRCh37.60_Unique_AnnotationType.txt" file to create a list to select using grep called "Homo_sapiens.GRCh37.60_AnnotationsToExclude.txt"
Looks like:
IG_C_gene
IG_C_pseudogene
IG_D_gene
IG_J_gene
IG_J_pseudogene
IG_V_gene
IG_V_pseudogene
Mt_rRNA
Mt_tRNA
Mt_tRNA_pseudogene
rRNA
rRNA_pseudogene
chrM

Code:

grep -f Homo_sapiens.GRCh37.60_AnnotationsToExclude.txt GRCh37_E60_BowtieIndexCompatible.gtf > GRCh37_E60_CufflinksExcludedTranscripts.gtf

Obviously check the final output file to see if it makes sense but you should be ready to go

**gprakhar** · 01-27-2011, 01:57 PM

Thanks for this great thread

Just wanted to say thank you for this thread. Has been a great help. Now to everyone new to NGS, I recommend reading this thread and the book you mentioned about
"Unix and Perl for Biologists".

Thanks for the help
--
Prakhar

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, Yesterday, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 24 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 29 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 215 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News