data pipelines, retrieval infrastructure, and ML/LLMOps foundations that power our AI
initiatives. The resource will work on turning reference architectures and data contracts
into robust, production-grade implementations that serve conversational AI assistants,
dashboard copilots, autonomous agents, RAG applications, and predictive ML models.
Key Responsibilities:
Qualifications:
- 7+ years data engineering using Cloud services
- 2+ years production AI/ML or LLM-era data infrastructure. Proven experience building production pipelines at scale — batch and streaming, Snowflake,AWS/Azure.
- Deep expertise: Python, PySpark, Snowflake, Delta Lake, Kafka, Spark Structured Streaming.
- Hands-on with vector stores, embedding pipelines, and retrieval infrastructure in production RAG environments.
- Working knowledge of MLOps: MLflow, CI/CD for AI, automated evaluation, and production monitoring.
- Strong grounding in data governance, quality frameworks, and compliance-
aligned engineering.
Technical Skills:
- Primary skills: Python, SQL, PySpark, Kafka, Snowflake/DataBricks, Delta Lake, AWS (S3, Glue, Kinesis, EKS, Redshift), Docker, Kubernetes, GitHub Actions.
Secondary Skills : LangChain, LlamaIndex, LLM APIs (OpenAI, Bedrock, Claude, HuggingFace), Pinecone, FAISS, ChromaDB, OpenSearch, MLflow, FastAPI, Neo4j, LangGraph, prompt engineering, RLHF dataset prep, LLM fine-tuning workflows