Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Misunderstanding breakdancer output

    I am running breakdancer for my samples and I get some results that I don't understand.
    For example I get these:

    1. scaffold_4 1084242 29+0- scaffold_4 13700966 1+16- DEL 12616730 77 11 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|11 1.992.
    2. scaffold_4 5580548 60+0- scaffold_4 11167464 0+23- DEL 5586874 99 23 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|23 1.88
    3. scaffold_4 6439582 8+0- scaffold_4 10304779 0+8- DEL 3864989 93 8 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|8 1.81
    4. scaffold_4 13059872 40+1- scaffold_4 14799329 0+38- DEL 1694903 99 39 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|39 1.35
    5. scaffold_4 17963169 23+9- scaffold_4 19018791 17+47- DEL 1055818 99 23 /scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam|23 2.16

    In the first result I get a deletion of 12616730, that is the half of the scaffold_4. I know, by seing with IGV and Savant Genome Browser the concret bam (/scratch/ralonso/ivia_130/ivia_130_mapped_singlehit_sorted.bam), that this "deletion" is full of coverage and there are no such deletions, I guess I am misunderstanding something about the output because the score is quite high.
    It happens the same for the others, for example in the result 4 we have an score of 99 and 39 reads supporting this,

    I am quite new in this field, so anyone could help me?


  • #2
    I believe Breakdancer determines deletions by inspecting the insert sizes of paired reads in an alignment (in the case of deletions, a greater distance between read pairs) rather than looking at the read depth along the alignment. So, in these cases where you aren't seeing any changes in coverage I'm guessing that perhaps something weird has happened in the alignment.

    I think the Breakdancer quality score comes from the mapping and quality scores for the reads in the bam file

    Perhaps the reference genome is not correctly assembled at scaffold 4...?

    You could try changing the upper and lower bounds of the 'acceptable' insert size and see if that helps?


    • #3
      Hi, I am also having problems interpreting the output from Breakdancer. I hope someone can help me. The breakdancer mailing list seems to be silent. I am using version 1.1_2011_02_21

      I found strange that, in most of my results, columns 3 and 6 are equal but then column 10 only shows a very small number of read pairs in comparison with columns 3-6. Please see the following examples:

      chr10 15253864 104+105- chr10 15256621 104+105- INV -84 73 4
      chr10 15257851 10+14- chr10 15258198 10+14- INV -170 49 2
      chr10 15561154 11+10- chr10 15561442 11+10- INV -188 51 2
      chr10 15614060 11+14- chr10 15614763 11+14- INV -131 44 2
      chr10 15645913 10+18- chr10 15646639 10+18- INV -127 95 4
      chr10 15649499 31+22- chr10 15650496 31+22- INV -209 99 5
      chr10 15685568 9+15- chr10 15686258 9+15- DEL 426 46 2
      chr10 24831499 103+64- chr10 24833457 103+64- DEL 460 99 7

      Is, for example, row1 saying that there are in total 209 reads aligning at chr10:15253864 and 209 reads aligning at chr10:15256621 but only 4 pairs are supporting the inversion? Is this because the SV is a short inversion and the reads detected are the same for both positions?

      Also, the last but one result has a confidence score of 99 but still only 5 pairs properly support the inversion but 53 reads aligning on each side of the SV. Am I interpreting it correctly?

      For the last row, a 460bp deletion with 99 confidence, the same is happening. Only 7 pairs are properly supporting the deletion. Is this correct?

      Any comments would be greatly appreciated.



      • #4
        I have encountered the same issue in the output files, using the same breakdancer version 1.1_2011_02_21, and am curious to know why is there such big discrepancy in the number of supporting reads given in different columns of the output file.

        This is the first time I've ran breakdancer, on a human genome sequenced paired-end on a single lane of HiSeq2000, from a patient with an inter-chromosomal translocation identified through cytogenetics. I've identified the breakpoint easily because I knew where to look, so I ran breakdancer to see if we could have identified it without prior cytogenetic work. Thanks to the previous posts on this forum it only took an hour or so to tweak the cpp and perl files and get it all running, and another 20min to generate the results (for trans-chromosomal rearrangements only)!

        My results: I got 140-odd CTX calls, including the real translocation, but nothing in the output file suggests that it's any more real than most of the other calls: 3/0 supporting reads on the +/- strands (although there are actually 7 supporting reads in total, 4/3 on +/- strands, displayed nicely in IGV with different colours for read pairs on discordant chromosomes), and confidence score of 43 (one of the lowest scores generated):

        chrA posXXX 3+0- chrB posYYY 3+0- CTX -364 43 2 myfile.bam|2

        So in addition to the above question, does anyone have an idea why the output file does not include all 7 supporting reads for this translocation - are there any options that I should change? Also, as per previous post, how should we interpret the confidence score?

        Many thanks.


        Latest Articles


        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin

          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin

          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM





        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        Last Post seqadmin