Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem of insert size that was calculated by picard, thanks!

    Hi!

    I have made a test bam as follow, then I run CollectInsertSizeMetrics with this bam, however, I don't understand how the insert size was calculated, after google I still feel confused, so I look for help here. Any suggestion would be grateful!

    The bam is:
    1: ST-2047 2195 chr10 15766308 60 111H39M = 15767443 1098 AGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:0 MD:Z:39 AS:i:39 XS:i:19 SA:Z:chr10,15767753,-,69M81S,60,1;

    2: ST-2047 99 chr10 15767443 60 150M = 15767753 379 CATTAGTGGGCGTGAATCTATCATTGATACCTCTATTGATGGGGAACTTACTACCTTACAAGGTAGCCCCCTCTCTTGTGAGAAAGCTCCAAGTGGTGTAAGAATGGATTAATCCAAACAGTGGTCTCTTGCACAGATCCCGTAGGACTC AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFK NM:i:1 MD:Z:11A138 AS:i:145 XS:i:19

    3: ST-2047 147 chr10 15767753 60 69M81S = 15767443 -379 GTTTTCAGTACCATAGTATGTCTCTTTTGAACGTGACTCTATTCTAATTTATTAGGACAGTCTGTTCAGCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAAGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KFFFKF<KKK<KKKFKKAKFKFKFF<AFKFFKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:1 MD:Z:32A36 AS:i:64 XS:i:21 SA:Z:chr10,15766308,-,111S39M,60,0;

    4: ST-25745 145 chr14 38147141 42 6S133M11S chrX 42742816 0 TATTACGGTGAATAGGAGTATGGCTAGACAGAAGACAGTAGGGATGATAGTTTTTGGGGTGCAGTCCAAGCTGGTCTGGTGTCTGGAATGAGACTGGGACCTAATAAAAAGGAGTGTCCACACAGGAACTCAAATGGGCTGGAACCTGTA FAKKKKKFKKKKFKKAF<KKKFFFKKFA,KKFKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFAA NM:i:1 MD:Z:41A91 AS:i:128 XS:i:109 XA:Z:chr15,-103273055,4S45M2D101M,8;chrX,-42741252,4S47M1D99M,8;

    5: ST-49513 129 chr14 66949070 0 64M86S chr9 46824220 0 CAGATATTTCGAATCCCTTTGAAAACTATAGGGCCAAAGGAAATATCCTCCGATAACAAAGAGACGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA AAFFFKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK77AKKFAFAK<,F<FAK,,AFFFAFAF7<<<7,,A,<,<FAKK,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:1 MD:Z:11G52 AS:i:59 XS:i:59 SA:Z:chr3,34012850,+,106S44M,0,3;

    6: ST-43730 83 chr18 49543551 60 43S107M = 49545057 1401 GTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAGACCTAGGACACAAGTGGTCTTTCTCCCATAGCAAAGAAACAATAAATATTGCTCTAACTTCCGGGTTTCTGATGATTAGATCCTGTTTTCTCTCCAATATTCTCC <<A7FKKKKKA7KFFFKFKFKAAKKKFFKKKFAKFKKKKKKKKFKKKKFAFKKKKFKKKAFKKKFF,KAKKKKKKKKKKKFKKKKKKKKKKKKKA7KKKKKKKFAKKKKKKFKKKKKKKKKKKKFAKKKKKKKFKFKFKKKKFKKFFAAA NM:i:0 MD:Z:107 AS:i:107 XS:i:19

    7: ST-43730 163 chr18 49545057 60 150M = 49543551 -1401 AACGAGATAGGTTCATGACAGAATTCACTATTTCTAGCACACCATGTCAGTATGTCATTAAGTGGAGGCTTTGTCAGACCTACTGGTAAAGTCTTATAGGCATGAACCGCTGCGTCCAGCCCTCCTGTCTGCTGAGAGCCCCACTCCAAG AAAFFKAFFFFKKFFKF<FAFFFKKFFKKKKKKKKKKFKKKF7KFKKAKKF<FKK<KKKKKKAKKKKK<7FAFAAFKF,,A<FFFKKKAKF7,,AFFKFKA7AAAKA7AFKK<FF<<FKKK,A,<<KFAAKFKFFFA,7,7AFFKKAF7A NM:i:1 MD:Z:107T42 AS:i:145 XS:i:20

    8: ST-49513 2177 chr3 34012850 0 106H44M chr9 46824220 0 TCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA K,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:3 MD:Z:5G29A3G4 AS:i:30 XS:i:29 SA:Z:chr14,66949070,+,64M86S,0,1;

    9:ST-49513 65 chr9 46824220 0 150M chr14 66949070 0 CCCAAATATCCCTTTGCCAATTCCACAAGAACTGTCTTAGCGAAAGGCTTCTTGAAGGGAAAGCTGTAACTCTGTGAGTTGATATCACAGAACACAAAGAAGTTTCTCAGAAAGCTTCTTTCTCTTTGTTATCGGAGGATATTTCCTTTG AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKK<AFKFKKKKKFKFKKK NM:i:4 MD:Z:55G22A4T15C50 AS:i:130 XS:i:129

    10:ST-25745 97 chrX 42742816 0 4S146M chr14 38147141 0 GGGGTGGATAGGCAAGACAATTTGGTTGACAAGGCACAGATCTTGAACTAACCTGTAAGCCTTGTCTGGTTTTTGGACAGGTAAAATGGGGGAATTGTAAGGAGAGTTTATAGGTTTTAAAAGGCCATGCTGTAGCAGGTGAGTGATAAC AAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKFKKKKKKKKKKKKKAKFKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKK<FAKKKFKK NM:i:7 MD:Z:11A13T5G15T7A31A47C10 AS:i:111 XS:i:110

    After run CollectInsertSizeMetrics, I got:
    insert_size All_Reads.fr_count All_Reads.rf_count
    379 1 0
    1401 0 1

    Following is my question:
    1) I think reads of ST-25745 and ST-49513 were discarded, since they were chimeric reads and map to different chromosome, am I right?
    2) Then I confirmed the 379 was the insert of ST-2047 by running CollectInsertSizeMetrics with these reads. I guess the first alignment with flag 2195 was discarded, then the insert size should be 15767553+69-15767443=179, I have no idea of the 379?
    3) For ST-43730, I think it should be 49545057+150-49423551=1656, even add the 43S, it should be 1656+43=1699, how 1401 was produced?
    4) For the orientation of reads, for ST-2042, the flag of second reads is 147 (128+16+2+1), the 16 means the SEQ was complemented, so the orientation is FR. For ST-43730, the flag of second reads is 163 (128+32+2+1), 32 means the paired reads (first reads) was complemented, so the orientation is RF, am I right?
    5) In fact, I had a library of 2K insert size, but after mapping with bwa and run with CollectInsertSizeMetrics, I got the insert size about 270~300bp, and the orientation is FR, I think the experiment was failed, that is I failed to link reads in 2K distance to a single fragment before sequencing, so I check the bam, then encountered the problem above, any suggestion about the potential reason why I got wrong insert size of the 2K library would be grateful.

    Thanks for advance!
    Best wishes!

Latest Articles

Collapse

  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM
  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 05-24-2024, 07:15 AM
0 responses
198 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-23-2024, 10:28 AM
0 responses
220 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-23-2024, 07:35 AM
0 responses
229 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-22-2024, 02:06 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Working...
X