Deine Aufgaben
Key responsibilities
- Design, implement, and operate scalable data pipelines for structured and unstructured data (batch and, where needed, streaming)
- Develop cloud-native architectures for data and AI workloads (primarily on Microsoft Azure)
- Build and evolve a materials data ecosystem linking physics-based modeling/simulation data and experimental laboratory data
- Handle large-scale scientific datasets (e.g., atomistic simulations, DFT/MD, high-throughput campaigns), including efficient storage, metadata, and performant access patterns
- Integrate data from HPC/simulation workflows and laboratory systems (instrument exports, LIMS/ELN where applicable) into curated, analysis-ready datasets
- Define and implement data models, metadata standards, and provenance to ensure traceability, reproducibility, and auditability across simulations and experiments
- Establish robust data quality practices (validation rules, unit consistency, schema controls) and data quality monitoring aligned with operational SLAs/SLOs
- Implement data governance foundations (cataloging, access control, lineage) and enable policy-driven data sharing across teams
- Enable production deployment of ML and NLP/LLMs/VLMs/VLAs etc. applications through strong MLOps/DataOps practices
- Integrate Azure OpenAI and LLM-based services into enterprise applications in a secure and maintainable way
- Implement Infrastructure-as-Code, CI/CD pipelines, and automation for data and AI systems
- Ensure reliability, security, GDPR compliance, monitoring/observability (logging, metrics, alerting), and cost efficiency of cloud platforms
- For LLM/RAG systems: manage embeddings and vector retrieval lifecycle, introduce evaluation and monitoring, and apply prompt/config versioning for repeatable releases
- Provide technical leadership through design reviews, documentation of standards, and mentoring where appropriate