Python ML Engineer at Activeloop (S18)
$120K - $200K  •  
Database for AI
Mountain View, CA, US / Remote (CA, US)
Full-time
6+ years
About Activeloop

We provide a simple API for creating, storing, versioning, and collaborating on multi-modal AI datasets of any size. With Activeloop's open-core stack, you can rapidly transform and stream data while training models at scale. Deep Lake powers foundational model training by acting as a vector database with significant benefits, such as (1) the ability to use multi-modal datasets to fine-tune your own LLM models, (2) storing both the embeddings and the original data with automatic version control, so no embedding re-computation is needed (3) truly serverless service with no vendor lock-in. How cool is that?

GitHub loves us - we're one of the fastest-growing libraries there, and we're used by little-known companies like Google, Waymo, and Intel. No big deal.

Our founding team hails from places like Princeton, Stanford, Google, and Tesla, and we're backed by Y Combinator & other Silicon Valley heavyweights.

Activeloop is hiring, and we want you! Check out our open roles on our YC page and join the fun.

10-min demo: https://activeloop.wistia.com/medias/aibvo0dst2 Whitepaper: https://www.deeplake.ai/whitepaper

About the role
Skills: Python

At Activeloop we are transforming the way organizations harness their data for AI with our Deep Lake and Multi-modal AI Search. Whether you're answering critical clinical questions or searching across vast repositories of scientific papers, we empower you to index, search and organize billions of documents, images, and videos intuitively using natural language powered by Large Language Models. Join us in making data more accessible and actionable than ever before.

About the Role

We are looking for a Python ML Engineer with a strong foundation in machine learning, large-scale data systems, and deep learning. The ideal candidate will have expertise in developing and optimizing ML pipelines, implementing efficient indexing techniques, and integrating state-of-the-art retrieval and organization methods. You will collaborate with software engineers, customers, and business stakeholders to develop ML solutions that deliver significant value to the organization and our clients.

Key Responsibilities

Machine Learning Pipeline Development: Design, implement, and optimize robust machine learning pipelines for large-scale datasets.

Algorithm Optimization: Develop and refine algorithms for semantic understanding, retrieval performance, and relevance ranking.

Data Integration : Work on integrating data storage and retrieval solutions within Deep Lake to support efficient data access for ML models.

Query Understanding and Processing: Build advanced pipelines for query processing, including contextual interpretation and intent recognition, to improve data interaction.

Model Development: Fine-tune and deploy machine learning models tailored to data organization and retrieval tasks.

Performance Evaluation: Establish metrics and testing frameworks to continuously evaluate and improve system performance.

Scalability and Efficiency: Optimize ML systems for high throughput, low latency, and large-scale dataset handling.

What We Need to See

  • A Master’s or PhD degree in Computer Science, Machine Learning, Statistics, or a related field.
  • Strong programming skills in Python and experience with ML libraries such as TensorFlow or PyTorch.
  • Proven experience in deploying machine learning models in production environments.
  • Knowledge of advanced machine learning algorithms, including deep learning, reinforcement learning, and ensemble methods.
  • Solid understanding of data pre-processing, feature engineering, and data quality assurance.
  • Excellent problem-solving skills and the ability to extract meaningful insights from complex datasets.

Ways to Stand Out from The Crowd

  • Familiarity with Retrieval-Augmented Generation (RAG) techniques.
  • Experience with distributed training of deep learning models.
  • Publications in top-tier ML and AI conferences like ICML, NeurIPS, or CVPR.
  • You’re a highly motivated and curious individual with a builder mindset.
  • You thrive in fast-paced environments and enjoy working on meaningful, impactful projects.
  • You have a passion for building scalable, impactful ML tools in a startup environment.
  • You’re excited to contribute to the startup journey of building an enduring, scalable business.

Why Join Activeloop?

Activeloop Deep Lake is at the forefront of transitioning from traditional software to AI, accelerating AI deployment across various industries. Our products empower advanced LLMs, generative models, and computer vision models. Trusted by industry leaders we are expanding our team to further advance AI applications. We pride ourselves on being an inclusive, equal opportunity workplace, committed to diversity and accessibility for all applicants.

Technology

We are building Deep Lake, the Data Lake for Deep Learning https://github.com/activeloopai/deeplake

The landscape of computation resources across different special hardware and cloud providers is becoming increasingly fragmented.

We're building a platform that unifies and abstracts away infrastructure for easier and highly efficient machine learning and deep learning.

Other jobs at Activeloop

fulltimeMountain View, CA, US / Remote (CA, US)Full stack$120K - $200K6+ years

fulltimeMountain View, CA, US / Remote (CA, US)Machine learning$120K - $200K6+ years

fulltimeMountain View / Remote (CA, US)Backend$120K - $200K6+ years

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›