Sorry if this is a really obvious question, but I am new to the analysis of sequencing data. (Also sorry if this is posted in the wrong forum.) We are aligning an Illumina paired-end RNA Seq run to the hg19 human genome. I want to know how many of the reads aligned to the genome and I am not sure I am looking in the right place. For each lane, there are 5 files in the logs directory that I think might be helpful:
bowtie.left_kept_reads.fixmap.log
# reads processed: 51079805
# reads with at least one reported alignment: 29895367 (58.53%)
# reads that failed to align: 21144050 (41.39%)
# reads with alignments suppressed due to -m: 40388 (0.08%)
Reported 38111199 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg1.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4892206 (23.14%)
# reads that failed to align: 16209674 (76.66%)
# reads with alignments suppressed due to -m: 42170 (0.20%)
Reported 8921307 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg2.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 5032085 (23.80%)
# reads that failed to align: 16048266 (75.90%)
# reads with alignments suppressed due to -m: 63699 (0.30%)
Reported 9308464 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg3.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4938783 (23.36%)
# reads that failed to align: 16146855 (76.37%)
# reads with alignments suppressed due to -m: 58412 (0.28%)
Reported 9092214 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg4.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 3500454 (16.56%)
# reads that failed to align: 17621660 (83.34%)
# reads with alignments suppressed due to -m: 21936 (0.10%)
Reported 5527529 alignments to 1 output stream(s)
(There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)
Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?
58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?
I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.
Thanks for any help/advice you all can give me!
bowtie.left_kept_reads.fixmap.log
# reads processed: 51079805
# reads with at least one reported alignment: 29895367 (58.53%)
# reads that failed to align: 21144050 (41.39%)
# reads with alignments suppressed due to -m: 40388 (0.08%)
Reported 38111199 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg1.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4892206 (23.14%)
# reads that failed to align: 16209674 (76.66%)
# reads with alignments suppressed due to -m: 42170 (0.20%)
Reported 8921307 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg2.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 5032085 (23.80%)
# reads that failed to align: 16048266 (75.90%)
# reads with alignments suppressed due to -m: 63699 (0.30%)
Reported 9308464 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg3.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 4938783 (23.36%)
# reads that failed to align: 16146855 (76.37%)
# reads with alignments suppressed due to -m: 58412 (0.28%)
Reported 9092214 alignments to 1 output stream(s)
bowtie.left_kept_reads_seg4.fixmap.log
# reads processed: 21144050
# reads with at least one reported alignment: 3500454 (16.56%)
# reads that failed to align: 17621660 (83.34%)
# reads with alignments suppressed due to -m: 21936 (0.10%)
Reported 5527529 alignments to 1 output stream(s)
(There are the duplicate files for the right kept reads, which I know should be dealt with in the same way...)
Obviously the first file is the initial alignment. The next 4 seem to be mapping the reads that were unmapped during the first pass (given the reads processed in each is the same as the reads unmapped in the first file). From the run data, I am also assuming that these originally unmapped reads are mapped to junctions?
58% alignment isn't very good, but if I add the reads aligned in the 4 seg files, the total alignment is 94% - is this actually correct to do though?
I also want to know how many of the reads map to junction sites - am I correct in thinking the 4 seg files are mapping reads to junctions? This seems like a really high number map to junction sites if this is the case (35%). If not, is there somewhere else I can find this data.
Thanks for any help/advice you all can give me!
Comment