Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem of insert size that was calculated by picard, thanks!

    Hi!

    I have made a test bam as follow, then I run CollectInsertSizeMetrics with this bam, however, I don't understand how the insert size was calculated, after google I still feel confused, so I look for help here. Any suggestion would be grateful!

    The bam is:
    1: ST-2047 2195 chr10 15766308 60 111H39M = 15767443 1098 AGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:0 MD:Z:39 AS:i:39 XS:i:19 SA:Z:chr10,15767753,-,69M81S,60,1;

    2: ST-2047 99 chr10 15767443 60 150M = 15767753 379 CATTAGTGGGCGTGAATCTATCATTGATACCTCTATTGATGGGGAACTTACTACCTTACAAGGTAGCCCCCTCTCTTGTGAGAAAGCTCCAAGTGGTGTAAGAATGGATTAATCCAAACAGTGGTCTCTTGCACAGATCCCGTAGGACTC AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFK NM:i:1 MD:Z:11A138 AS:i:145 XS:i:19

    3: ST-2047 147 chr10 15767753 60 69M81S = 15767443 -379 GTTTTCAGTACCATAGTATGTCTCTTTTGAACGTGACTCTATTCTAATTTATTAGGACAGTCTGTTCAGCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAAGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KFFFKF<KKK<KKKFKKAKFKFKFF<AFKFFKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:1 MD:Z:32A36 AS:i:64 XS:i:21 SA:Z:chr10,15766308,-,111S39M,60,0;

    4: ST-25745 145 chr14 38147141 42 6S133M11S chrX 42742816 0 TATTACGGTGAATAGGAGTATGGCTAGACAGAAGACAGTAGGGATGATAGTTTTTGGGGTGCAGTCCAAGCTGGTCTGGTGTCTGGAATGAGACTGGGACCTAATAAAAAGGAGTGTCCACACAGGAACTCAAATGGGCTGGAACCTGTA FAKKKKKFKKKKFKKAF<KKKFFFKKFA,KKFKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFAA NM:i:1 MD:Z:41A91 AS:i:128 XS:i:109 XA:Z:chr15,-103273055,4S45M2D101M,8;chrX,-42741252,4S47M1D99M,8;

    5: ST-49513 129 chr14 66949070 0 64M86S chr9 46824220 0 CAGATATTTCGAATCCCTTTGAAAACTATAGGGCCAAAGGAAATATCCTCCGATAACAAAGAGACGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA AAFFFKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK77AKKFAFAK<,F<FAK,,AFFFAFAF7<<<7,,A,<,<FAKK,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:1 MD:Z:11G52 AS:i:59 XS:i:59 SA:Z:chr3,34012850,+,106S44M,0,3;

    6: ST-43730 83 chr18 49543551 60 43S107M = 49545057 1401 GTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAGACCTAGGACACAAGTGGTCTTTCTCCCATAGCAAAGAAACAATAAATATTGCTCTAACTTCCGGGTTTCTGATGATTAGATCCTGTTTTCTCTCCAATATTCTCC <<A7FKKKKKA7KFFFKFKFKAAKKKFFKKKFAKFKKKKKKKKFKKKKFAFKKKKFKKKAFKKKFF,KAKKKKKKKKKKKFKKKKKKKKKKKKKA7KKKKKKKFAKKKKKKFKKKKKKKKKKKKFAKKKKKKKFKFKFKKKKFKKFFAAA NM:i:0 MD:Z:107 AS:i:107 XS:i:19

    7: ST-43730 163 chr18 49545057 60 150M = 49543551 -1401 AACGAGATAGGTTCATGACAGAATTCACTATTTCTAGCACACCATGTCAGTATGTCATTAAGTGGAGGCTTTGTCAGACCTACTGGTAAAGTCTTATAGGCATGAACCGCTGCGTCCAGCCCTCCTGTCTGCTGAGAGCCCCACTCCAAG AAAFFKAFFFFKKFFKF<FAFFFKKFFKKKKKKKKKKFKKKF7KFKKAKKF<FKK<KKKKKKAKKKKK<7FAFAAFKF,,A<FFFKKKAKF7,,AFFKFKA7AAAKA7AFKK<FF<<FKKK,A,<<KFAAKFKFFFA,7,7AFFKKAF7A NM:i:1 MD:Z:107T42 AS:i:145 XS:i:20

    8: ST-49513 2177 chr3 34012850 0 106H44M chr9 46824220 0 TCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA K,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:3 MD:Z:5G29A3G4 AS:i:30 XS:i:29 SA:Z:chr14,66949070,+,64M86S,0,1;

    9:ST-49513 65 chr9 46824220 0 150M chr14 66949070 0 CCCAAATATCCCTTTGCCAATTCCACAAGAACTGTCTTAGCGAAAGGCTTCTTGAAGGGAAAGCTGTAACTCTGTGAGTTGATATCACAGAACACAAAGAAGTTTCTCAGAAAGCTTCTTTCTCTTTGTTATCGGAGGATATTTCCTTTG AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKK<AFKFKKKKKFKFKKK NM:i:4 MD:Z:55G22A4T15C50 AS:i:130 XS:i:129

    10:ST-25745 97 chrX 42742816 0 4S146M chr14 38147141 0 GGGGTGGATAGGCAAGACAATTTGGTTGACAAGGCACAGATCTTGAACTAACCTGTAAGCCTTGTCTGGTTTTTGGACAGGTAAAATGGGGGAATTGTAAGGAGAGTTTATAGGTTTTAAAAGGCCATGCTGTAGCAGGTGAGTGATAAC AAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKFKKKKKKKKKKKKKAKFKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKK<FAKKKFKK NM:i:7 MD:Z:11A13T5G15T7A31A47C10 AS:i:111 XS:i:110

    After run CollectInsertSizeMetrics, I got:
    insert_size All_Reads.fr_count All_Reads.rf_count
    379 1 0
    1401 0 1

    Following is my question:
    1) I think reads of ST-25745 and ST-49513 were discarded, since they were chimeric reads and map to different chromosome, am I right?
    2) Then I confirmed the 379 was the insert of ST-2047 by running CollectInsertSizeMetrics with these reads. I guess the first alignment with flag 2195 was discarded, then the insert size should be 15767553+69-15767443=179, I have no idea of the 379?
    3) For ST-43730, I think it should be 49545057+150-49423551=1656, even add the 43S, it should be 1656+43=1699, how 1401 was produced?
    4) For the orientation of reads, for ST-2042, the flag of second reads is 147 (128+16+2+1), the 16 means the SEQ was complemented, so the orientation is FR. For ST-43730, the flag of second reads is 163 (128+32+2+1), 32 means the paired reads (first reads) was complemented, so the orientation is RF, am I right?
    5) In fact, I had a library of 2K insert size, but after mapping with bwa and run with CollectInsertSizeMetrics, I got the insert size about 270~300bp, and the orientation is FR, I think the experiment was failed, that is I failed to link reads in 2K distance to a single fragment before sequencing, so I check the bam, then encountered the problem above, any suggestion about the potential reason why I got wrong insert size of the 2K library would be grateful.

    Thanks for advance!
    Best wishes!

Latest Articles

Collapse

  • seqadmin
    Quality Control Essentials for Next-Generation Sequencing Workflows
    by seqadmin




    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

    Nucleic Acid Quality Control
    Preparing for NGS starts with isolating the...
    02-10-2025, 01:58 PM
  • seqadmin
    An Introduction to the Technologies Transforming Precision Medicine
    by seqadmin


    In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
    01-27-2025, 07:46 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-07-2025, 09:30 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-05-2025, 10:34 AM
0 responses
107 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-03-2025, 09:07 AM
0 responses
83 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-31-2025, 08:31 AM
0 responses
47 views
0 likes
Last Post seqadmin  
Working...
X