Hi
I have one individual genome reads grouped into 17 lanes, each lane is divided into 10 fastq files. In other words, I have 17 folders each of them correspond to one lane and contain 10 fastq files, so I have in total 170 fastq files. I need to map each of the 170 fastq file to hg19 and merge all SAMs to one big SAM, but I am not sure if it is the right thing to do.
1) For duplicates removal, should I merge the 10 SAMs that correspond to one lane, remove the duplicates from this merged SAM , do so for all lanes -merged-SAM (17 lane-merged-SAMs), and then merge them to one big ? OR should I remove duplicates after merging all SAMs (170 SAMs) to one big ? In other words, removing the duplicates is done for each lane alignment or it doesn't matter ?
2) For @RG tag, and specifically the ID, is it going to be different for each of the 170 SAMs or should I give all SAMs that come from one lane (10 SAMs that are in one folder) the same ID ?
3) After merging (either merging SAMs that comes from one specific lane or merging all 170 SAMs in one big SAM), how should I write the @RG header and ID ?
Thanks
I have one individual genome reads grouped into 17 lanes, each lane is divided into 10 fastq files. In other words, I have 17 folders each of them correspond to one lane and contain 10 fastq files, so I have in total 170 fastq files. I need to map each of the 170 fastq file to hg19 and merge all SAMs to one big SAM, but I am not sure if it is the right thing to do.
1) For duplicates removal, should I merge the 10 SAMs that correspond to one lane, remove the duplicates from this merged SAM , do so for all lanes -merged-SAM (17 lane-merged-SAMs), and then merge them to one big ? OR should I remove duplicates after merging all SAMs (170 SAMs) to one big ? In other words, removing the duplicates is done for each lane alignment or it doesn't matter ?
2) For @RG tag, and specifically the ID, is it going to be different for each of the 170 SAMs or should I give all SAMs that come from one lane (10 SAMs that are in one folder) the same ID ?
3) After merging (either merging SAMs that comes from one specific lane or merging all 170 SAMs in one big SAM), how should I write the @RG header and ID ?
Thanks
Comment