Hi All.

I'm a mathematician, hoping to do a PhD on the data-analysis (statistics) of NGS-data, at the university of Ghent (Roche / Illumina).

Unfortunately, up to now, this has not been specified, so it is not yet clear to me what kind of data I will be presented with (ChiP, de novo,...)

This also implies that to this day, I have no data to work on, nor a clear sight on what will be expected.

I've simply been reading up on NGS and statistics (finding strangely little articles linking them). Even more so, I am quite new at biotechnology, so it is not easy to get a focus.

So here's my question: I would like to prepare myself somewhat for when the 'real' questions come (I expect these in the range of the next few months), so I'd like to emulate some data-analysis. Do any of you have pointers on:

* which type of analysis would be a good starter?

* where could I find sample data (ideally with a matching article on how somebody else analysed it)?

* what are the statistical challenges brought on by NGS (as opposed to classical sequencing), apart from sheer volume?

* which 'general' statistical subjects would be a good read (books/subjects welcome), e.g.: would bootstrap do me any good (and why)?

Thanks in advance for any suggestions!

## Comment