Skip to main content

This job has expired

Data Engineer

Employer
University of Texas MD Anderson Cancer Center
Location
Houston, Texas
Salary
Competitive
Closing date
Apr 15, 2023

View more

Discipline
Health Sciences
Organization Type
Healthcare/Hospital

Job Details

We seek a driven and collaborative data engineer to contribute to building the data infrastructure of our flagship platform A 3 D 3 a: Adaptive, AI-augmented, Drug Discovery and Development. With expertise in data architecture, the Data Engineer will directly contribute our mission to discover novel therapies for cancer patients.

Led by Prof. Bissan Al-Lazikani, Director of Therapeutics Data Science, the intelligent and ever-learning A3D3a platform is part of the new initiative in Therapeutics Data Science and part of our ambitious Institute for Data Science in Oncology at MD Anderson. A3D3a will accelerate the discovery and impact of novel therapies for cancer by enabling novel opportunities for optimized therapies for patients with a focus on rare and hard-to-treat cancers through the development of novel machine learning and AI technologies.

Central to this vision, the Data Engineer will build and maintain data infrastructure to enable the discovery of hidden therapeutic opportunities in integrated patient data and will work closely with data scientists, data engineers, bioinformaticians, and molecular modelers. The candidate must hold a Bachelor of Computer Science or related degree; experience in computing related to the natural sciences would be ideal.

JOB RESPONSIBILITIES
  • Work with lead data engineer on establishing architectural plan to encompass local, hybrid, and/or cloud infrastructure
  • Utilize a variety of tools (e.g. Spark, KNIME, Airflow, SQL) to merge and extract data from multiple sources and environments
  • Create data pipelines to validate and enrich data for use in ML models
  • Generate and maintain metadata for all stages of data pipeline
  • Work with a multidisciplinary team and stakeholders to define data requirements
  • Establish and maintain interfaces to the data (APIs)
  • Utilize industry standards for creating, storing, and documenting code

EXPECTED SKILLS
  • Programming
    • Strong Python programming experience is a must and candidates must have demonstrated skills in that area
    • Candidates having experience using Spark (PySpark) will be given preference
    • Solid understanding of CI/CD practices
    • Experience building and querying both relational and graph databases
    • Familiarity with No-SQL experience is a plus
  • Data Engineering
    • Solid knowledge of metadata creation and management
    • Experience with Airflow, Argo or equivalent workflow orchestration is required
    • Must have demonstrated experience working with APIs
    • Good understanding of Container based architectures (e.g. Docker/Kubernetes)
    • Candidate must have demonstrated experience working on data engineering tasks using one of the major cloud vendors. Preference will be given to those with experience with Microsoft Azure
    • Prefer candidates with demonstrated skills in building/deploying ML models
  • Other
    • Candidate must be self-motivated and able to work independently on tasks
    • Strong written and oral communication skills
    • Ability to work in a multidisciplinary team


EDUCATION:
Required: Bachelor's degree in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.

EXPERIENCE:
Required: Three years experience in scientific software development/analysis. With Master's degree, one years experience required. With PhD, no experience required.

Preferred: Experience includes computing related to the natural sciences.

It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. http://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html

Company

The University of Texas MD Anderson Cancer Center in Houston is one of the world's most respected centers focused on cancer patient care, research, education and prevention. It was named the nation's No. 1 hospital for cancer care in U.S. News & World Report's 2023 rankings. It is one of the nation's original three comprehensive cancer centers designated by the National Cancer Institute.

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert