Hello,
I am trying to use breakdancer for the first time, and I am wondering about rational strategies for trying to filter down to the most likely candidates. I am hampered by not completely understanding some of the output. So I would appreciate any advice people have. But barring that, I would appreciate help on understanding the read support information that is given (more below), since it is not well explained.
More specifically, for the moment I have low-pass (10x coverage) WGS data on 5 samples (1 normal + 4 other samples from the same patient) and I ran breakdancer on all -- a pooled analysis. I have SNP array data on these patients, so I'd like to use that as a check, if possible.
An obvious candidate is to look at the number of reads supporting and reject very small or very large. Another thing I have been looking at is only those where the positions are fairly far apart. I've also seen masking repeat regions as a suggestion.
In addition, I would like to make use of the orientation information ("The orientation is a string that records the number of reads mapped to the plus (+) or the minus (-) strand in the anchoring regions."). For example, if the number of reads supporting the match is much smaller than the reads in orientation or if the +/- mapping is not roughly equal perhaps these might be problems (both difficult to gauge in low-pass since the numbers are small, but I still like to understand if these are reasonable ideas for filters). But I don't really know enough about these numbers to know if these are sensible, for example does the reads mapped mean all reads, or just the abherrant reads identified by the program? Does anyone have any insight into how these numbers are determined?
Thank you very much.
I am trying to use breakdancer for the first time, and I am wondering about rational strategies for trying to filter down to the most likely candidates. I am hampered by not completely understanding some of the output. So I would appreciate any advice people have. But barring that, I would appreciate help on understanding the read support information that is given (more below), since it is not well explained.
More specifically, for the moment I have low-pass (10x coverage) WGS data on 5 samples (1 normal + 4 other samples from the same patient) and I ran breakdancer on all -- a pooled analysis. I have SNP array data on these patients, so I'd like to use that as a check, if possible.
An obvious candidate is to look at the number of reads supporting and reject very small or very large. Another thing I have been looking at is only those where the positions are fairly far apart. I've also seen masking repeat regions as a suggestion.
In addition, I would like to make use of the orientation information ("The orientation is a string that records the number of reads mapped to the plus (+) or the minus (-) strand in the anchoring regions."). For example, if the number of reads supporting the match is much smaller than the reads in orientation or if the +/- mapping is not roughly equal perhaps these might be problems (both difficult to gauge in low-pass since the numbers are small, but I still like to understand if these are reasonable ideas for filters). But I don't really know enough about these numbers to know if these are sensible, for example does the reads mapped mean all reads, or just the abherrant reads identified by the program? Does anyone have any insight into how these numbers are determined?
Thank you very much.