Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem of insert size that was calculated by picard, thanks!

    Hi!

    I have made a test bam as follow, then I run CollectInsertSizeMetrics with this bam, however, I don't understand how the insert size was calculated, after google I still feel confused, so I look for help here. Any suggestion would be grateful!

    The bam is:
    1: ST-2047 2195 chr10 15766308 60 111H39M = 15767443 1098 AGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:0 MD:Z:39 AS:i:39 XS:i:19 SA:Z:chr10,15767753,-,69M81S,60,1;

    2: ST-2047 99 chr10 15767443 60 150M = 15767753 379 CATTAGTGGGCGTGAATCTATCATTGATACCTCTATTGATGGGGAACTTACTACCTTACAAGGTAGCCCCCTCTCTTGTGAGAAAGCTCCAAGTGGTGTAAGAATGGATTAATCCAAACAGTGGTCTCTTGCACAGATCCCGTAGGACTC AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFK NM:i:1 MD:Z:11A138 AS:i:145 XS:i:19

    3: ST-2047 147 chr10 15767753 60 69M81S = 15767443 -379 GTTTTCAGTACCATAGTATGTCTCTTTTGAACGTGACTCTATTCTAATTTATTAGGACAGTCTGTTCAGCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAAGTCCTCTCCTGGGCCTTGGGTTGAGGCTGAGTGATCTG KFFFKF<KKK<KKKFKKAKFKFKFF<AFKFFKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKFFKFKKKKKKKKKKKKKKKKKFKKKKKFFFAA NM:i:1 MD:Z:32A36 AS:i:64 XS:i:21 SA:Z:chr10,15766308,-,111S39M,60,0;

    4: ST-25745 145 chr14 38147141 42 6S133M11S chrX 42742816 0 TATTACGGTGAATAGGAGTATGGCTAGACAGAAGACAGTAGGGATGATAGTTTTTGGGGTGCAGTCCAAGCTGGTCTGGTGTCTGGAATGAGACTGGGACCTAATAAAAAGGAGTGTCCACACAGGAACTCAAATGGGCTGGAACCTGTA FAKKKKKFKKKKFKKAF<KKKFFFKKFA,KKFKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFAA NM:i:1 MD:Z:41A91 AS:i:128 XS:i:109 XA:Z:chr15,-103273055,4S45M2D101M,8;chrX,-42741252,4S47M1D99M,8;

    5: ST-49513 129 chr14 66949070 0 64M86S chr9 46824220 0 CAGATATTTCGAATCCCTTTGAAAACTATAGGGCCAAAGGAAATATCCTCCGATAACAAAGAGACGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA AAFFFKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK77AKKFAFAK<,F<FAK,,AFFFAFAF7<<<7,,A,<,<FAKK,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:1 MD:Z:11G52 AS:i:59 XS:i:59 SA:Z:chr3,34012850,+,106S44M,0,3;

    6: ST-43730 83 chr18 49543551 60 43S107M = 49545057 1401 GTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGAGACCTAGGACACAAGTGGTCTTTCTCCCATAGCAAAGAAACAATAAATATTGCTCTAACTTCCGGGTTTCTGATGATTAGATCCTGTTTTCTCTCCAATATTCTCC <<A7FKKKKKA7KFFFKFKFKAAKKKFFKKKFAKFKKKKKKKKFKKKKFAFKKKKFKKKAFKKKFF,KAKKKKKKKKKKKFKKKKKKKKKKKKKA7KKKKKKKFAKKKKKKFKKKKKKKKKKKKFAKKKKKKKFKFKFKKKKFKKFFAAA NM:i:0 MD:Z:107 AS:i:107 XS:i:19

    7: ST-43730 163 chr18 49545057 60 150M = 49543551 -1401 AACGAGATAGGTTCATGACAGAATTCACTATTTCTAGCACACCATGTCAGTATGTCATTAAGTGGAGGCTTTGTCAGACCTACTGGTAAAGTCTTATAGGCATGAACCGCTGCGTCCAGCCCTCCTGTCTGCTGAGAGCCCCACTCCAAG AAAFFKAFFFFKKFFKF<FAFFFKKFFKKKKKKKKKKFKKKF7KFKKAKKF<FKK<KKKKKKAKKKKK<7FAFAAFKF,,A<FFFKKKAKF7,,AFFKFKA7AAAKA7AFKK<FF<<FKKK,A,<<KFAAKFKFFFA,7,7AFFKKAF7A NM:i:1 MD:Z:107T42 AS:i:145 XS:i:20

    8: ST-49513 2177 chr3 34012850 0 106H44M chr9 46824220 0 TCTGAAAACAGATATTTCGGATCTCTTTGAAGATTTTAGTGCCA K,A7<,<F7<<FAFA,A,,,,<,,,<<A<FAA7,,,,7,,,,,< NM:i:3 MD:Z:5G29A3G4 AS:i:30 XS:i:29 SA:Z:chr14,66949070,+,64M86S,0,1;

    9:ST-49513 65 chr9 46824220 0 150M chr14 66949070 0 CCCAAATATCCCTTTGCCAATTCCACAAGAACTGTCTTAGCGAAAGGCTTCTTGAAGGGAAAGCTGTAACTCTGTGAGTTGATATCACAGAACACAAAGAAGTTTCTCAGAAAGCTTCTTTCTCTTTGTTATCGGAGGATATTTCCTTTG AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKK<AFKFKKKKKFKFKKK NM:i:4 MD:Z:55G22A4T15C50 AS:i:130 XS:i:129

    10:ST-25745 97 chrX 42742816 0 4S146M chr14 38147141 0 GGGGTGGATAGGCAAGACAATTTGGTTGACAAGGCACAGATCTTGAACTAACCTGTAAGCCTTGTCTGGTTTTTGGACAGGTAAAATGGGGGAATTGTAAGGAGAGTTTATAGGTTTTAAAAGGCCATGCTGTAGCAGGTGAGTGATAAC AAFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKKFKKKKKKKKKKKKKAKFKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKK<FAKKKFKK NM:i:7 MD:Z:11A13T5G15T7A31A47C10 AS:i:111 XS:i:110

    After run CollectInsertSizeMetrics, I got:
    insert_size All_Reads.fr_count All_Reads.rf_count
    379 1 0
    1401 0 1

    Following is my question:
    1) I think reads of ST-25745 and ST-49513 were discarded, since they were chimeric reads and map to different chromosome, am I right?
    2) Then I confirmed the 379 was the insert of ST-2047 by running CollectInsertSizeMetrics with these reads. I guess the first alignment with flag 2195 was discarded, then the insert size should be 15767553+69-15767443=179, I have no idea of the 379?
    3) For ST-43730, I think it should be 49545057+150-49423551=1656, even add the 43S, it should be 1656+43=1699, how 1401 was produced?
    4) For the orientation of reads, for ST-2042, the flag of second reads is 147 (128+16+2+1), the 16 means the SEQ was complemented, so the orientation is FR. For ST-43730, the flag of second reads is 163 (128+32+2+1), 32 means the paired reads (first reads) was complemented, so the orientation is RF, am I right?
    5) In fact, I had a library of 2K insert size, but after mapping with bwa and run with CollectInsertSizeMetrics, I got the insert size about 270~300bp, and the orientation is FR, I think the experiment was failed, that is I failed to link reads in 2K distance to a single fragment before sequencing, so I check the bam, then encountered the problem above, any suggestion about the potential reason why I got wrong insert size of the 2K library would be grateful.

    Thanks for advance!
    Best wishes!

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin







    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has...
    12-02-2024, 01:49 PM
  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
150 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:06 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 08:03 AM
0 responses
42 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
74 views
0 likes
Last Post seqadmin  
Working...
X