Hi!
I have made a test bam as follow, then I run CollectInsertSizeMetrics with this bam, however, I don't understand how the insert size was calculated, after google I still feel confused, so I look for help here. Any suggestion would be grateful!
The bam is:
1: ST-2047 2195 chr10 15766308 60 111H39M = 15767443 1098 AGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:0 MD:Z:39 AS:i:39 XS:i:19 SA:Z:chr10,15767753,-,69M81S,60,1;
2: ST-2047 99 chr10 15767443 60 150M = 15767753 379 CATTAGTGGGCGTGAATCTATCATTGATACCTCTATTGATGGGGAACTTACTACCTTACAAGGTAGCCCCCTCTCTTGTGAGAAAGCTCCAAGTGGTGTAAGAATGGATTAATCCAAACAGTGGTCTCTTGCACAGATCCCGTAGGACTC AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFK NM:i:1 MD:Z:11A138 AS:i:145 XS:i:19
3: ST-2047 147 chr10 15767753 60 69M81S = 15767443 -379 GTTTTCAGTACCATAGTATGTCTCTTTTGAACGTGACTCTATTCTAATTTATTAGGACAGTCTGTTCAGCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAAGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KFFFKF<KKK<KKKFKKAKFKFKFF<AFKFFKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:1 MD:Z:32A36 AS:i:64 XS:i:21 SA:Z:chr10,15766308,-,111S39M,60,0;
4: ST-25745 145 chr14 38147141 42 6S133M11S chrX 42742816 0 TATTACGGTGAATAGGAGTATGGCTAGACAGAAGACAGTAGGGATGATAGTTTTTGGGGTGCAGTCCAAGCTGGTCTGGTGTCTGGAATGAGACTGGGACCTAATAAAAAGGAGTGTCCACACAGGAACTCAAATGGGCTGGAACCTGTA FAKKKKKFKKKKFKKAF<KKKFFFKKFA,KKFKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFAA NM:i:1 MD:Z:41A91 AS:i:128 XS:i:109 XA:Z:chr15,-103273055,4S45M2D101M,8;chrX,-42741252,4S47M1D99M,8;
5: ST-49513 129 chr14 66949070 0 64M86S chr9 46824220 0 CAGATATTTCGAATCCCTTTGAAAACTATAGGGCCAAAGGAAATATCCTCCGATAACAAAGAGACGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA AAFFFKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK77AKKFAFAK<,F<FAK,,AFFFAFAF7<<<7,,A,<,<FAKK,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:1 MD:Z:11G52 AS:i:59 XS:i:59 SA:Z:chr3,34012850,+,106S44M,0,3;
6: ST-43730 83 chr18 49543551 60 43S107M = 49545057 1401 GTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAGACCTAGGACACAAGTGGTCTTTCTCCCATAGCAAAGAAACAATAAATATTGCTCTAACTTCCGGGTTTCTGATGATTAGATCCTGTTTTCTCTCCAATATTCTCC <<A7FKKKKKA7KFFFKFKFKAAKKKFFKKKFAKFKKKKKKKKFKKKKFAFKKKKFKKKAFKKKFF,KAKKKKKKKKKKKFKKKKKKKKKKKKKA7KKKKKKKFAKKKKKKFKKKKKKKKKKKKFAKKKKKKKFKFKFKKKKFKKFFAAA NM:i:0 MD:Z:107 AS:i:107 XS:i:19
7: ST-43730 163 chr18 49545057 60 150M = 49543551 -1401 AACGAGATAGGTTCATGACAGAATTCACTATTTCTAGCACACCATGTCAGTATGTCATTAAGTGGAGGCTTTGTCAGACCTACTGGTAAAGTCTTATAGGCATGAACCGCTGCGTCCAGCCCTCCTGTCTGCTGAGAGCCCCACTCCAAG AAAFFKAFFFFKKFFKF<FAFFFKKFFKKKKKKKKKKFKKKF7KFKKAKKF<FKK<KKKKKKAKKKKK<7FAFAAFKF,,A<FFFKKKAKF7,,AFFKFKA7AAAKA7AFKK<FF<<FKKK,A,<<KFAAKFKFFFA,7,7AFFKKAF7A NM:i:1 MD:Z:107T42 AS:i:145 XS:i:20
8: ST-49513 2177 chr3 34012850 0 106H44M chr9 46824220 0 TCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA K,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:3 MD:Z:5G29A3G4 AS:i:30 XS:i:29 SA:Z:chr14,66949070,+,64M86S,0,1;
9:ST-49513 65 chr9 46824220 0 150M chr14 66949070 0 CCCAAATATCCCTTTGCCAATTCCACAAGAACTGTCTTAGCGAAAGGCTTCTTGAAGGGAAAGCTGTAACTCTGTGAGTTGATATCACAGAACACAAAGAAGTTTCTCAGAAAGCTTCTTTCTCTTTGTTATCGGAGGATATTTCCTTTG AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKK<AFKFKKKKKFKFKKK NM:i:4 MD:Z:55G22A4T15C50 AS:i:130 XS:i:129
10:ST-25745 97 chrX 42742816 0 4S146M chr14 38147141 0 GGGGTGGATAGGCAAGACAATTTGGTTGACAAGGCACAGATCTTGAACTAACCTGTAAGCCTTGTCTGGTTTTTGGACAGGTAAAATGGGGGAATTGTAAGGAGAGTTTATAGGTTTTAAAAGGCCATGCTGTAGCAGGTGAGTGATAAC AAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKFKKKKKKKKKKKKKAKFKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKK<FAKKKFKK NM:i:7 MD:Z:11A13T5G15T7A31A47C10 AS:i:111 XS:i:110
After run CollectInsertSizeMetrics, I got:
insert_size All_Reads.fr_count All_Reads.rf_count
379 1 0
1401 0 1
Following is my question:
1) I think reads of ST-25745 and ST-49513 were discarded, since they were chimeric reads and map to different chromosome, am I right?
2) Then I confirmed the 379 was the insert of ST-2047 by running CollectInsertSizeMetrics with these reads. I guess the first alignment with flag 2195 was discarded, then the insert size should be 15767553+69-15767443=179, I have no idea of the 379?
3) For ST-43730, I think it should be 49545057+150-49423551=1656, even add the 43S, it should be 1656+43=1699, how 1401 was produced?
4) For the orientation of reads, for ST-2042, the flag of second reads is 147 (128+16+2+1), the 16 means the SEQ was complemented, so the orientation is FR. For ST-43730, the flag of second reads is 163 (128+32+2+1), 32 means the paired reads (first reads) was complemented, so the orientation is RF, am I right?
5) In fact, I had a library of 2K insert size, but after mapping with bwa and run with CollectInsertSizeMetrics, I got the insert size about 270~300bp, and the orientation is FR, I think the experiment was failed, that is I failed to link reads in 2K distance to a single fragment before sequencing, so I check the bam, then encountered the problem above, any suggestion about the potential reason why I got wrong insert size of the 2K library would be grateful.
Thanks for advance!
Best wishes!
I have made a test bam as follow, then I run CollectInsertSizeMetrics with this bam, however, I don't understand how the insert size was calculated, after google I still feel confused, so I look for help here. Any suggestion would be grateful!
The bam is:
1: ST-2047 2195 chr10 15766308 60 111H39M = 15767443 1098 AGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:0 MD:Z:39 AS:i:39 XS:i:19 SA:Z:chr10,15767753,-,69M81S,60,1;
2: ST-2047 99 chr10 15767443 60 150M = 15767753 379 CATTAGTGGGCGTGAATCTATCATTGATACCTCTATTGATGGGGAACTTACTACCTTACAAGGTAGCCCCCTCTCTTGTGAGAAAGCTCCAAGTGGTGTAAGAATGGATTAATCCAAACAGTGGTCTCTTGCACAGATCCCGTAGGACTC AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFK NM:i:1 MD:Z:11A138 AS:i:145 XS:i:19
3: ST-2047 147 chr10 15767753 60 69M81S = 15767443 -379 GTTTTCAGTACCATAGTATGTCTCTTTTGAACGTGACTCTATTCTAATTTATTAGGACAGTCTGTTCAGCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAAGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KFFFKF<KKK<KKKFKKAKFKFKFF<AFKFFKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:1 MD:Z:32A36 AS:i:64 XS:i:21 SA:Z:chr10,15766308,-,111S39M,60,0;
4: ST-25745 145 chr14 38147141 42 6S133M11S chrX 42742816 0 TATTACGGTGAATAGGAGTATGGCTAGACAGAAGACAGTAGGGATGATAGTTTTTGGGGTGCAGTCCAAGCTGGTCTGGTGTCTGGAATGAGACTGGGACCTAATAAAAAGGAGTGTCCACACAGGAACTCAAATGGGCTGGAACCTGTA FAKKKKKFKKKKFKKAF<KKKFFFKKFA,KKFKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFAA NM:i:1 MD:Z:41A91 AS:i:128 XS:i:109 XA:Z:chr15,-103273055,4S45M2D101M,8;chrX,-42741252,4S47M1D99M,8;
5: ST-49513 129 chr14 66949070 0 64M86S chr9 46824220 0 CAGATATTTCGAATCCCTTTGAAAACTATAGGGCCAAAGGAAATATCCTCCGATAACAAAGAGACGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA AAFFFKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK77AKKFAFAK<,F<FAK,,AFFFAFAF7<<<7,,A,<,<FAKK,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:1 MD:Z:11G52 AS:i:59 XS:i:59 SA:Z:chr3,34012850,+,106S44M,0,3;
6: ST-43730 83 chr18 49543551 60 43S107M = 49545057 1401 GTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAGACCTAGGACACAAGTGGTCTTTCTCCCATAGCAAAGAAACAATAAATATTGCTCTAACTTCCGGGTTTCTGATGATTAGATCCTGTTTTCTCTCCAATATTCTCC <<A7FKKKKKA7KFFFKFKFKAAKKKFFKKKFAKFKKKKKKKKFKKKKFAFKKKKFKKKAFKKKFF,KAKKKKKKKKKKKFKKKKKKKKKKKKKA7KKKKKKKFAKKKKKKFKKKKKKKKKKKKFAKKKKKKKFKFKFKKKKFKKFFAAA NM:i:0 MD:Z:107 AS:i:107 XS:i:19
7: ST-43730 163 chr18 49545057 60 150M = 49543551 -1401 AACGAGATAGGTTCATGACAGAATTCACTATTTCTAGCACACCATGTCAGTATGTCATTAAGTGGAGGCTTTGTCAGACCTACTGGTAAAGTCTTATAGGCATGAACCGCTGCGTCCAGCCCTCCTGTCTGCTGAGAGCCCCACTCCAAG AAAFFKAFFFFKKFFKF<FAFFFKKFFKKKKKKKKKKFKKKF7KFKKAKKF<FKK<KKKKKKAKKKKK<7FAFAAFKF,,A<FFFKKKAKF7,,AFFKFKA7AAAKA7AFKK<FF<<FKKK,A,<<KFAAKFKFFFA,7,7AFFKKAF7A NM:i:1 MD:Z:107T42 AS:i:145 XS:i:20
8: ST-49513 2177 chr3 34012850 0 106H44M chr9 46824220 0 TCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA K,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:3 MD:Z:5G29A3G4 AS:i:30 XS:i:29 SA:Z:chr14,66949070,+,64M86S,0,1;
9:ST-49513 65 chr9 46824220 0 150M chr14 66949070 0 CCCAAATATCCCTTTGCCAATTCCACAAGAACTGTCTTAGCGAAAGGCTTCTTGAAGGGAAAGCTGTAACTCTGTGAGTTGATATCACAGAACACAAAGAAGTTTCTCAGAAAGCTTCTTTCTCTTTGTTATCGGAGGATATTTCCTTTG AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKK<AFKFKKKKKFKFKKK NM:i:4 MD:Z:55G22A4T15C50 AS:i:130 XS:i:129
10:ST-25745 97 chrX 42742816 0 4S146M chr14 38147141 0 GGGGTGGATAGGCAAGACAATTTGGTTGACAAGGCACAGATCTTGAACTAACCTGTAAGCCTTGTCTGGTTTTTGGACAGGTAAAATGGGGGAATTGTAAGGAGAGTTTATAGGTTTTAAAAGGCCATGCTGTAGCAGGTGAGTGATAAC AAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKFKKKKKKKKKKKKKAKFKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKK<FAKKKFKK NM:i:7 MD:Z:11A13T5G15T7A31A47C10 AS:i:111 XS:i:110
After run CollectInsertSizeMetrics, I got:
insert_size All_Reads.fr_count All_Reads.rf_count
379 1 0
1401 0 1
Following is my question:
1) I think reads of ST-25745 and ST-49513 were discarded, since they were chimeric reads and map to different chromosome, am I right?
2) Then I confirmed the 379 was the insert of ST-2047 by running CollectInsertSizeMetrics with these reads. I guess the first alignment with flag 2195 was discarded, then the insert size should be 15767553+69-15767443=179, I have no idea of the 379?
3) For ST-43730, I think it should be 49545057+150-49423551=1656, even add the 43S, it should be 1656+43=1699, how 1401 was produced?
4) For the orientation of reads, for ST-2042, the flag of second reads is 147 (128+16+2+1), the 16 means the SEQ was complemented, so the orientation is FR. For ST-43730, the flag of second reads is 163 (128+32+2+1), 32 means the paired reads (first reads) was complemented, so the orientation is RF, am I right?
5) In fact, I had a library of 2K insert size, but after mapping with bwa and run with CollectInsertSizeMetrics, I got the insert size about 270~300bp, and the orientation is FR, I think the experiment was failed, that is I failed to link reads in 2K distance to a single fragment before sequencing, so I check the bam, then encountered the problem above, any suggestion about the potential reason why I got wrong insert size of the 2K library would be grateful.
Thanks for advance!
Best wishes!