Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem of insert size that was calculated by picard, thanks!

    Hi!

    I have made a test bam as follow, then I run CollectInsertSizeMetrics with this bam, however, I don't understand how the insert size was calculated, after google I still feel confused, so I look for help here. Any suggestion would be grateful!

    The bam is:
    1: ST-2047 2195 chr10 15766308 60 111H39M = 15767443 1098 AGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:0 MD:Z:39 AS:i:39 XS:i:19 SA:Z:chr10,15767753,-,69M81S,60,1;

    2: ST-2047 99 chr10 15767443 60 150M = 15767753 379 CATTAGTGGGCGTGAATCTATCATTGATACCTCTATTGATGGGGAACTTACTACCTTACAAGGTAGCCCCCTCTCTTGTGAGAAAGCTCCAAGTGGTGTAAGAATGGATTAATCCAAACAGTGGTCTCTTGCACAGATCCCGTAGGACTC AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFK NM:i:1 MD:Z:11A138 AS:i:145 XS:i:19

    3: ST-2047 147 chr10 15767753 60 69M81S = 15767443 -379 GTTTTCAGTACCATAGTATGTCTCTTTTGAACGTGACTCTATTCTAATTTATTAGGACAGTCTGTTCAGCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAAGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KFFFKF<KKK<KKKFKKAKFKFKFF<AFKFFKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:1 MD:Z:32A36 AS:i:64 XS:i:21 SA:Z:chr10,15766308,-,111S39M,60,0;

    4: ST-25745 145 chr14 38147141 42 6S133M11S chrX 42742816 0 TATTACGGTGAATAGGAGTATGGCTAGACAGAAGACAGTAGGGATGATAGTTTTTGGGGTGCAGTCCAAGCTGGTCTGGTGTCTGGAATGAGACTGGGACCTAATAAAAAGGAGTGTCCACACAGGAACTCAAATGGGCTGGAACCTGTA FAKKKKKFKKKKFKKAF<KKKFFFKKFA,KKFKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFAA NM:i:1 MD:Z:41A91 AS:i:128 XS:i:109 XA:Z:chr15,-103273055,4S45M2D101M,8;chrX,-42741252,4S47M1D99M,8;

    5: ST-49513 129 chr14 66949070 0 64M86S chr9 46824220 0 CAGATATTTCGAATCCCTTTGAAAACTATAGGGCCAAAGGAAATATCCTCCGATAACAAAGAGACGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA AAFFFKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK77AKKFAFAK<,F<FAK,,AFFFAFAF7<<<7,,A,<,<FAKK,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:1 MD:Z:11G52 AS:i:59 XS:i:59 SA:Z:chr3,34012850,+,106S44M,0,3;

    6: ST-43730 83 chr18 49543551 60 43S107M = 49545057 1401 GTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAGACCTAGGACACAAGTGGTCTTTCTCCCATAGCAAAGAAACAATAAATATTGCTCTAACTTCCGGGTTTCTGATGATTAGATCCTGTTTTCTCTCCAATATTCTCC <<A7FKKKKKA7KFFFKFKFKAAKKKFFKKKFAKFKKKKKKKKFKKKKFAFKKKKFKKKAFKKKFF,KAKKKKKKKKKKKFKKKKKKKKKKKKKA7KKKKKKKFAKKKKKKFKKKKKKKKKKKKFAKKKKKKKFKFKFKKKKFKKFFAAA NM:i:0 MD:Z:107 AS:i:107 XS:i:19

    7: ST-43730 163 chr18 49545057 60 150M = 49543551 -1401 AACGAGATAGGTTCATGACAGAATTCACTATTTCTAGCACACCATGTCAGTATGTCATTAAGTGGAGGCTTTGTCAGACCTACTGGTAAAGTCTTATAGGCATGAACCGCTGCGTCCAGCCCTCCTGTCTGCTGAGAGCCCCACTCCAAG AAAFFKAFFFFKKFFKF<FAFFFKKFFKKKKKKKKKKFKKKF7KFKKAKKF<FKK<KKKKKKAKKKKK<7FAFAAFKF,,A<FFFKKKAKF7,,AFFKFKA7AAAKA7AFKK<FF<<FKKK,A,<<KFAAKFKFFFA,7,7AFFKKAF7A NM:i:1 MD:Z:107T42 AS:i:145 XS:i:20

    8: ST-49513 2177 chr3 34012850 0 106H44M chr9 46824220 0 TCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA K,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:3 MD:Z:5G29A3G4 AS:i:30 XS:i:29 SA:Z:chr14,66949070,+,64M86S,0,1;

    9:ST-49513 65 chr9 46824220 0 150M chr14 66949070 0 CCCAAATATCCCTTTGCCAATTCCACAAGAACTGTCTTAGCGAAAGGCTTCTTGAAGGGAAAGCTGTAACTCTGTGAGTTGATATCACAGAACACAAAGAAGTTTCTCAGAAAGCTTCTTTCTCTTTGTTATCGGAGGATATTTCCTTTG AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKK<AFKFKKKKKFKFKKK NM:i:4 MD:Z:55G22A4T15C50 AS:i:130 XS:i:129

    10:ST-25745 97 chrX 42742816 0 4S146M chr14 38147141 0 GGGGTGGATAGGCAAGACAATTTGGTTGACAAGGCACAGATCTTGAACTAACCTGTAAGCCTTGTCTGGTTTTTGGACAGGTAAAATGGGGGAATTGTAAGGAGAGTTTATAGGTTTTAAAAGGCCATGCTGTAGCAGGTGAGTGATAAC AAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKFKKKKKKKKKKKKKAKFKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKK<FAKKKFKK NM:i:7 MD:Z:11A13T5G15T7A31A47C10 AS:i:111 XS:i:110

    After run CollectInsertSizeMetrics, I got:
    insert_size All_Reads.fr_count All_Reads.rf_count
    379 1 0
    1401 0 1

    Following is my question:
    1) I think reads of ST-25745 and ST-49513 were discarded, since they were chimeric reads and map to different chromosome, am I right?
    2) Then I confirmed the 379 was the insert of ST-2047 by running CollectInsertSizeMetrics with these reads. I guess the first alignment with flag 2195 was discarded, then the insert size should be 15767553+69-15767443=179, I have no idea of the 379?
    3) For ST-43730, I think it should be 49545057+150-49423551=1656, even add the 43S, it should be 1656+43=1699, how 1401 was produced?
    4) For the orientation of reads, for ST-2042, the flag of second reads is 147 (128+16+2+1), the 16 means the SEQ was complemented, so the orientation is FR. For ST-43730, the flag of second reads is 163 (128+32+2+1), 32 means the paired reads (first reads) was complemented, so the orientation is RF, am I right?
    5) In fact, I had a library of 2K insert size, but after mapping with bwa and run with CollectInsertSizeMetrics, I got the insert size about 270~300bp, and the orientation is FR, I think the experiment was failed, that is I failed to link reads in 2K distance to a single fragment before sequencing, so I check the bam, then encountered the problem above, any suggestion about the potential reason why I got wrong insert size of the 2K library would be grateful.

    Thanks for advance!
    Best wishes!

Latest Articles

Collapse

  • seqadmin
    Pathogen Surveillance with Advanced Genomic Tools
    by seqadmin




    The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
    03-24-2025, 11:48 AM
  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM
  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-20-2025, 05:03 AM
0 responses
41 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-19-2025, 07:27 AM
0 responses
46 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-18-2025, 12:50 PM
0 responses
36 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
191 views
0 reactions
Last Post seqadmin  
Working...