Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tez
    Junior Member
    • Jul 2011
    • 4

    Structural variation detection using BreakDancer on Whole Genome SOLiD data

    Hello,

    I have been struggling for the last few weeks to get Breakdancer to run accross some whole genome data. The data was sequenced on SOLiD machines and aligned using Bioscope.

    I have been able to get Breakdancer to build a configuration file using the parameters for SOLiD (the -C color space option), the actual command looks like:

    bam2cfg.pl -n 1000000 -g -h -C normal.bam tumor.bam > breakdancer.cfg

    I am then able to run breakdancer_max using that cofig file as such:

    breakdancer_max breakdancer.cfg -g output.GBrowse -d fast_q_evidence.o

    This command runs.. and runs.. and runs... and finally either runs out of memory or computation time.

    The last run I did ran for 100 hours, using 48GB of memory before the job was cancelled for running too long. The output of this was about 6.7 million "detected" structural variations. And it only just got up to chromosome 3!

    This leads me to believe it would need 1,000 hours or so of computation time to run fully, which is not feasible at the moment (42 days!). At that rate it would also find 67 million SV's, which doesn't quite seem right!

    Is this in line with anyone else's experience?

    The tumor and normal files are 120GB and 180GB each, so I don't expect it to be a fast process, but 40 days seems excessive.

    I have also attempted to run Breakdancer in single chromosome mode, but this fails with a segmentation fault immediately.

    Has anyone been able to get the single chromosome version to work? Or know why it would segfault?


    Thank you.
  • tez
    Junior Member
    • Jul 2011
    • 4

    #2
    I have now also seen that there is a "-r" option for setting the minimum number of read-pairs required to call an SV.

    There isn't much mention of this in the manual, but looking through the source code I see it is set to 2, which would explain the huge number of results, poor run time and memory usage.

    Does anyone have any experience with this parameter? Our data is supposed to be at ~30x depth. I am now giving it a try at min_read_pair=10, and I'll let you know how it goes.

    Cheers

    Comment

    • aquinom85
      Research Bioinformaticist
      • Dec 2011
      • 19

      #3
      How did things turn out by tweaking the results? I'm looking into BreakDancer but also there is no FAQ and it's rather hard to get a clear picture of the limitations of the software. Do you know if BreakDancer jointly calls samples or if you have to run it on each of your samples then cross-validate the results?

      Comment

      • tez
        Junior Member
        • Jul 2011
        • 4

        #4
        Hello,

        The results did not look good at all. Basically it called about 10,000 structural variations in the "normal" sample, and about 1,300 in the "tumour" sample.

        The only way I could get these results was to run break dancer with the -r 10 option, and then to break each whole genome down into chromosomes and run each chromosome separately. Even then it was still a 3-4 day process, running them all in parallel on fairly powerful cluster.

        Looks like the biggest issue is data quality. The alignment / mapping was not done by us, and it looks like it may contain quite a lot of noise. So we are now experimenting with different ways to "clean" up the data.

        Cheers

        Comment

        • P-Richmond
          Member
          • Oct 2010
          • 13

          #5
          Any luck in "cleaning up the data"? I have a similar problem, but I'm working in S. cerevisiae and keep running across artifacts of the alignements I'm using (read pairs that map to familial genes (genes with very high sequence identity on different chromosomes).

          One possible methodology would be to generate reads from a perfect genome, then run through breakdancer and call that the noise model. I have a system in place for this read generation if you are interested in trying that. Then by simply creating an intersect with the calls from your data, you could produce a set that is more likely to be structural variations that aren't simply artifacts of the alignment or the underlying sequence.

          -Phil

          Comment

          • aquinom85
            Research Bioinformaticist
            • Dec 2011
            • 19

            #6
            I just ran breakdancer on 1 human genome sample and got 29,500 SVs called, in my naive opinion this seems outrageously high. I think I'll try raising the -r value higher. Does anyone know what a normal range of SVs are in the human for comparison? Also, how should the confidence score be considered in general?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM
            • seqadmin
              Investigating the Gut Microbiome Through Diet and Spatial Biology
              by seqadmin




              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
              02-24-2025, 06:31 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            17 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            18 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            19 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            185 views
            0 reactions
            Last Post seqadmin  
            Working...