Description:
In this role, you'll lead the design and delivery of production-grade ML and generative AI systems that solve complex product and business problems at scale. You'll own architecture and implementation across data pipelines, feature stores, training workflows, model-serving infrastructure, online experimentation, and observability.
You'll drive advanced use cases across deep learning, NLP, ranking, recommendation, forecasting, semantic retrieval, and LLM applications, including RAG, tool use, evaluation harnesses, and safety controls. Your work will directly shape performance, reliability, latency, and cost efficiency in live environments. You’ll guide technical direction on topics such as model selection, distributed training and inference, GPU utilization, model compression, prompt and retrieval optimization, drift detection, retraining strategy, and responsible AI controls. You’ll mentor other engineers and turn best practices into reusable patterns and platform capabilities.
What You Bring
- You bring expert-level programming skills in Python, along with strong software engineering expertise in languages such as Java or Go, enabling you to build scalable, production-grade systems
- You have deep knowledge of machine learning, deep learning, and optimization techniques across structured data, NLP, search, ranking, and recommendation problems
- You have extensive experience designing and operating end-to-end ML systems, from data ingestion and experimentation to deployment, observability, and lifecycle management
- You bring strong hands-on experience with modern ML and LLM tooling (e.g., PyTorch, TensorFlow, scikit-learn), including fine-tuning, evaluation, orchestration, and model serving
- You have practical experience building generative AI applications using embeddings, vector databases, RAG pipelines, agent workflows, prompt engineering, and guardrails
- You bring deep expertise in MLOps and platform engineering, including model registries, feature stores, CI/CD, infrastructure as code, experiment tracking, and automated validation
- You have a strong architectural understanding of distributed systems, event-driven services, streaming data, and cloud-native ML platforms, with the ability to optimize for performance, scalability, reliability, and cost
- You define robust evaluation and governance strategies, including offline benchmarking, online experimentation, hallucination analysis, model risk assessment, and responsible AI practices
About You
- You bring 7-9+ years of experience in software engineering and machine learning, with a track record of leading and delivering complex, production-scale AI systems
- You combine strong product judgment with technical depth, translating ambiguous business problems into scalable, high-impact AI solutions
- You thrive in complex, fast-moving environments and bring clarity, ownership, and strategic thinking to drive long-term platform success