Hi all,
I am currently looking for tools extracting phase information from NextGen data. I have found so far gatk readBacked Phasing algorithm, which performs nice on SNPs but not on indels. There is also samtools phase, however it produces two files which have to be brought together again which is kind of cumbersome (and don't know yet if it handles indels).
I have read something in the complete genomic data file format description of a "HapLink" column (denoting phase information) in masterVAR file - but did not find any tool in the cga tool suite which extracts phase information - does anybody know how it works? Is it integrated in the generateMASTERvar routine?
And, a second question which arised from the first one: How is the Complete Genomics var file generated? I also did not find any routine for this purpose, but this is needed as input in the generateMASTERvar routine. However I have read that the VAR file is delivered if the company sequenced a genome for you. But is there any possibility to generate a Complete Genomics VAR file on my own (and, even one more question: do Complete Genomics use BAM or SAM files at any stage of their data analysis pipeline?)
I have also read the User Guide and data format description pdfs (to some extent) but still did not figure out the very beginning of their pipeline - i.e. how do they do alignment (or mapping) and how is the variant calling done?
Thanks in advance!
I am currently looking for tools extracting phase information from NextGen data. I have found so far gatk readBacked Phasing algorithm, which performs nice on SNPs but not on indels. There is also samtools phase, however it produces two files which have to be brought together again which is kind of cumbersome (and don't know yet if it handles indels).
I have read something in the complete genomic data file format description of a "HapLink" column (denoting phase information) in masterVAR file - but did not find any tool in the cga tool suite which extracts phase information - does anybody know how it works? Is it integrated in the generateMASTERvar routine?
And, a second question which arised from the first one: How is the Complete Genomics var file generated? I also did not find any routine for this purpose, but this is needed as input in the generateMASTERvar routine. However I have read that the VAR file is delivered if the company sequenced a genome for you. But is there any possibility to generate a Complete Genomics VAR file on my own (and, even one more question: do Complete Genomics use BAM or SAM files at any stage of their data analysis pipeline?)
I have also read the User Guide and data format description pdfs (to some extent) but still did not figure out the very beginning of their pipeline - i.e. how do they do alignment (or mapping) and how is the variant calling done?
Thanks in advance!
Comment