UNC Lineberger is on the forefront of cancer research and is closely involved with the Cancer Genome Atlas (TCGA) project. This cutting-edge, nationwide research project provides an extraordinary opportunity to study the molecular basis of cancer. Specifically, Lineberger faculty participate in the TCGA as both a Genome Characterization Center and a Genome Data Analysis Center. This provides extraordinary opportunities to analyze extremely interesting and relevant next generation sequencing cancer datasets and to interact with talented scientists both at UNC and across the nation.

30%: Apply data analysis, modeling, and common next generation sequencing tools (alingers, assemblers, variant callers, annotation tools etc) to sequence data to ask questions about the samples being sequenced as directed by the project PIs. This may include more analytical tasks where we tune existing software tools for optimal performance based on test data, novel algorithm development, and/or collaborations with other computational biologists/statisticians/computer scientists for methods development and testing.
35%: Work with developers and scientists to build robust systems for analyzing thousands of samples with sequencing technologies. This includes building “modules” based on the Java language in our groups workflow tool (the open source SeqWare project) and other technologies that “scale up” the analysis being done in the group.
20%: Import, analyze, and apply biologically relevant annotation information and sequence data from sources such as NCBI, UCSC, the Short Read Archive, and the TCGA Data Coordination Center. This includes bridging various datasets so they can be leveraged by a common workflow and query engine/database backend.
5%: Interact with various organizations outside of UNC to ensure our internal infrastructure is in sync with various data repositories that are part of, or work with, the TCGA project.
10%: Follow excellent coding practice for open course development including managing code in source control, writing unit tests, providing documentation on both public and private wikis, working with other developers on and off site, and both writing and following software/system specifications. Work will be performed in a Linux environment.

Ph.D. degree in an appropriate area.

Required Skills
Proficiency in both written and spoken English with excellent communication skills (presentations, reports, etc).

A solid understanding of Biology (genes, pathways, genomics data)

Experience analyzing next generation sequence data or other large biological datasets.

Strong analytical skills and a keen interest in the interpretation of biological data and to learn about new fields.

Excellent interpersonal skills and the ability to work with a diverse group of individuals both locally and remotely.

Excellent programming skills and the desire to learn new technologies, such as Hadoop/Hbase/MapReduce, and to participate in large, open source projects.

A dedication to follow software developemnt best practices and methods (testing, code reviews,etc).

Must be detail-oriented and focused.

2+ years of experience working as a Bioinformatics Researcher.

A Mastery of at least one programming language: Java, C, C++, Perl, Python, etc. Perl and Java experience are highly desired.

A familiarity with the Linux operating system and the Bash shell.

Working knowledge of a statistical tool such as R.

Familiarity with SQL database query language and relational databases.

Demonstrated experience developing IT solutions for genes, pathways, or –OMICS datasets.

Familiarity with annotations sources and data repositories commonly used in Bioinformatics such as the USCS gemone database, NCBI, Ensembl, etc.

To apply for this position please go to: