Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cburke04
    Junior Member
    • Jan 2015
    • 2

    Breakpointer, Predicting Structural Variation from SE Illumina Reads + Reference

    Hello all,

    I am having some difficulty predicting were the breakpoints of a known large-scale inversion (~4-10 Mb, inferred by population genetics and QTL mapping) are using part of a reference genome and single-end 100 bp Illumina reads from a population sample. One complication is that the reference scaffold that I am using may not contain both inversion breakpoints as it is incomplete. My questions are:

    1) I am currently using the Breakpointer script from (https://github.com/ruping/Breakpointer), is there a better program to predict structural variants using single-end reads?

    2) Does anyone have experience with the output file for Breakpointer run with the script at (https://github.com/ruping/Breakpointer)? Are there particular portions of this output that would be most informative for IDing inversion breakpoints as I have 17,468 candidate regions (eg. less reads map to that particular location and the read is split?)? The first few lines of the output looks like this:

    chrscaffold935|size76515 Breakpointer Depth-Skewed 63195 63245 56.233 + . ID=chrscaffold935|size76515:63195;SIZE=51;DEPTH=29;EndsRatio=0.662;StartsRatio=1;BinomialScore=3.004;MIS=7;realMIS=2;MISRATE=0.364621;seedseq=AAAAGTGTTACCTTTAAACCCCCCT;MismatchScore=0.0161;SU=13;rank_SB=62.810;rank_SM=49.657
    chrscaffold348|size845032 Breakpointer Depth-Skewed 126749 126805 26.985 + . ID=chrscaffold348|size845032:126749;SIZE=57;DEPTH=24;EndsRatio=0.577;StartsRatio=0.448;BinomialScore=1.903;MIS=5;realMIS=1;MISRATE=0.361063;seedseq=TCAGTTGATGGACGAAACCAATTTA;MismatchScore=0.00158;SU=9;rank_SB=35.475;rank_SM=18.494
    chrscaffold600|size417131 Breakpointer Depth-Skewed 146682 146753 77.008 + . ID=chrscaffold600|size417131:146682;SIZE=72;DEPTH=117;EndsRatio=0.68;StartsRatio=0.235;BinomialScore=4.775;MIS=20;realMIS=0;MISRATE=0.251383;seedseq=CCACTTGATTTTAGCGATTCTGCGG;MismatchScore=0.097;SU=24;rank_SB=82.567;rank_SM=71.449
    chrscaffold641|size112415 Breakpointer Depth-Skewed 10518 10545 49.758 + . ID=chrscaffold641|size112415:10518;SIZE=28;DEPTH=6;EndsRatio=1;StartsRatio=0.238;BinomialScore=9.119;MIS=2;realMIS=0;MISRATE=0.333333;seedseq=TGTTGTGACGTGTTGTTTCCGCGGC;MismatchScore=2e-09;SU=1;rank_SB=99.474;rank_SM=0.041
    chrscaffold150|size493303 Breakpointer Depth-Skewed 179029 179104 60.675 + . ID=chrscaffold150|size493303:179029;SIZE=76;DEPTH=595;EndsRatio=0.499;StartsRatio=0.471;BinomialScore=1.611;MIS=90;realMIS=3;MISRATE=0.303127;seedseq=TTATTGCTAATTTAAATAAGGTTTT;MismatchScore=2.67;SU=1;rank_SB=22.620;rank_SM=98.729
    chrscaffold218|size453784 Breakpointer Depth-Skewed 285154 285184 26.029 + . ID=chrscaffold218|size453784:285154;SIZE=31;DEPTH=40;EndsRatio=0.475;StartsRatio=0.476;BinomialScore=1.146;MIS=3;realMIS=0;MISRATE=0.157895;seedseq=ACCTTTAAAACTGTTTTTCTCTTAA;MismatchScore=0.0158;SU=13;rank_SB=2.611;rank_SM=49.447
    chrscaffold35|size722381 Breakpointer Depth-Skewed 659136 659214 63.288 + . ID=chrscaffold35|size722381:659136;SIZE=79;DEPTH=26;EndsRatio=0.736;StartsRatio=0.195;BinomialScore=2.9;MIS=9;realMIS=0;MISRATE=0.470318;seedseq=TACCTATACATTTCCTAGGATATGT;MismatchScore=0.0567;SU=4;rank_SB=61.264;rank_SM=65.311
    chrscaffold621|size428421 Breakpointer Depth-Skewed 119257 119308 24.223 + . ID=chrscaffold621|size428421:119257;SIZE=52;DEPTH=34;EndsRatio=0.487;StartsRatio=0.546;BinomialScore=1.381;MIS=5;realMIS=1;MISRATE=0.301969;seedseq=ATCGAAAAAGCTAAGGCTAAAAACC;MismatchScore=0.00676;SU=5;rank_SB=11.607;rank_SM=36.838
    chrscaffold48|size965936 Breakpointer Depth-Skewed 18878 18964 22.985 + . ID=chrscaffold48|size965936:18878;SIZE=87;DEPTH=29;EndsRatio=0.662;StartsRatio=0.429;BinomialScore=2.195;MIS=4;realMIS=1;MISRATE=0.208355;seedseq=CAATTATTTTGTAAATGTTTACACG;MismatchScore=2e-09;SU=4;rank_SB=45.925;rank_SM=0.046
    chrscaffold621|size428421 Breakpointer Depth-Skewed 143393 143427 70.720 + . ID=chrscaffold621|size428421:143393;SIZE=35;DEPTH=20;EndsRatio=0.852;StartsRatio=0.908;BinomialScore=5.829;MIS=3;realMIS=1;MISRATE=0.176056;seedseq=AATTTAATACAGGTACGACTGTACC;MismatchScore=0.0177;SU=24;rank_SB=90.552;rank_SM=50.887
    chrscaffold348|size845032 Breakpointer Depth-Skewed 24245 24294 78.876 + . ID=chrscaffold348|size845032:24245;SIZE=50;DEPTH=29;EndsRatio=0.818;StartsRatio=0.466;BinomialScore=5.002;MIS=14;realMIS=3;MISRATE=0.590169;seedseq=TCTATATTTTGGTGCAGTCCTGTTG;MismatchScore=0.113;SU=1;rank_SB=84.565;rank_SM=73.187
    chrscaffold50|size611115 Breakpointer Depth-Skewed 570718 570765 46.314 + . ID=chrscaffold50|size611115:570718;SIZE=48;DEPTH=9;EndsRatio=0.739;StartsRatio=0.805;BinomialScore=3.418;MIS=3;realMIS=0;MISRATE=0.45106;seedseq=CCTAATCCTATGTCCTTCTCCTGGC;MismatchScore=0.00275;SU=5;rank_SB=68.357;rank_SM=24.271
    chrscaffold502|size248588 Breakpointer Depth-Skewed 100735 100761 54.273 + . ID=chrscaffold502|size248588:100735;SIZE=27;DEPTH=8;EndsRatio=0.785;StartsRatio=1;BinomialScore=3.581;MIS=3;realMIS=1;MISRATE=0.477707;seedseq=TGGTTCTAGGCCCTAAATCGTTAAT;MismatchScore=0.00748;SU=4;rank_SB=70.241;rank_SM=38.306
    chrscaffold546|size598492 Breakpointer Depth-Skewed 475284 475375 30.852 + . ID=chrscaffold546|size598492:475284;SIZE=92;DEPTH=44;EndsRatio=0.549;StartsRatio=0.414;BinomialScore=1.776;MIS=6;realMIS=1;MISRATE=0.248385;seedseq=AAACATGTTTACATTATTATGGTAC;MismatchScore=0.00474;SU=5;rank_SB=29.983;rank_SM=31.720
    chrscaffold15|size673834 Breakpointer Depth-Skewed 313308 313396 73.223 + . ID=chrscaffold15|size673834:313308;SIZE=89;DEPTH=112;EndsRatio=0.688;StartsRatio=0.275;BinomialScore=4.532;MIS=11;realMIS=2;MISRATE=0.142753;seedseq=GCCTTAATCCACGCGAATTCGATGG;MismatchScore=0.0603;SU=19;rank_SB=80.435;rank_SM=66.011
    chrscaffold621|size428421 Breakpointer Depth-Skewed 380187 380235 36.980 + . ID=chrscaffold621|size428421:380187;SIZE=49;DEPTH=64;EndsRatio=0.5;StartsRatio=0.574;BinomialScore=1.609;MIS=3;realMIS=0;MISRATE=0.09375;seedseq=AATGGTTTAATGCCCGTTTTCACCA;MismatchScore=0.0185;SU=3;rank_SB=22.510;rank_SM=51.450
    chrscaffold189|size410278 Breakpointer Depth-Skewed 264356 264391 83.797 + . ID=chrscaffold189|size410278:264356;SIZE=36;DEPTH=52;EndsRatio=0.666;StartsRatio=0.958;BinomialScore=3.636;MIS=22;realMIS=5;MISRATE=0.635251;seedseq=TGTAAGACTAGCGGCCGCCCGCGAC;MismatchScore=1.5;SU=14;rank_SB=70.978;rank_SM=96.616
    chrscaffold155|size350779 Breakpointer Depth-Skewed 191704 191757 11.213 + . ID=chrscaffold155|size350779:191704;SIZE=54;DEPTH=18;EndsRatio=0.533;StartsRatio=0.433;BinomialScore=1.285;MIS=2;realMIS=0;MISRATE=0.208464;seedseq=GCGTAAGTCCGTTGATTGGGATCAT;MismatchScore=0.00115;SU=3;rank_SB=7.363;rank_SM=15.064

    Any help would be greatly appreciated!

Latest Articles

Collapse

  • SEQadmin2
    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
    by SEQadmin2


    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
    ...
    06-02-2026, 10:05 AM
  • SEQadmin2
    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
    by SEQadmin2


    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


    Introduction

    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
    05-22-2026, 06:42 AM
  • SEQadmin2
    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
    by SEQadmin2

    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
    05-06-2026, 09:04 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, 06-02-2026, 12:03 PM
0 responses
21 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-02-2026, 11:40 AM
0 responses
14 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 05-28-2026, 11:40 AM
0 responses
29 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 05-26-2026, 10:12 AM
0 responses
31 views
0 reactions
Last Post SEQadmin2  
Working...