At Encord, we're building the AI infrastructure of the future. Today, the biggest challenge companies face in getting an AI product to market is actually not half as glamorous as it may seem: it's all about data quality. In fact, the success of any AI application today relies on the quality of a model's training data — and for 95% of teams, this essential step is both the most costly and the most time-consuming.

As ex-computer scientists, physicists, and quants, we felt first-hand how the lack of tools to prepare quality training data was impeding the progress of building AI. AI today is what the early days of computing or the internet were like, where the potential of the technology is clear, but the tools and processes surrounding it are still primitive, preventing the next generation of applications. This is why we started Encord.

We’re a team of 60 working at the cutting edge of computer vision and deep learning, backed by top investors, including CRV and Y Combinator, leading industry executives like Luc Vincent, former VP of AI at Meta, and other prominent leaders in AI. We are one the fastest growing companies in our space, and consistently rated as the best tool in the market by our customers.

About Us

At Encord, we're building the AI infrastructure of the future. One of the biggest challenges AI companies face today is data quality. The success of any AI application relies heavily on the quality of its training data, yet for most teams, this crucial step is both the most costly and time-consuming. We’re here to change that.

As former computer scientists, physicists, and quants, we’ve experienced firsthand how a lack of tools to prepare quality training data impedes progress in building AI. We believe AI is at a stage similar to the early days of computing or the internet—where the potential is clear, but the surrounding tools and processes are still catching up. That's why we started Encord.

We are a talented and ambitious team of 60, working at the cutting edge of computer vision and deep learning. Backed by $30M in Series B funding from top investors like CRV and Y Combinator, we’re one of the fastest-growing companies in our space. Our platform is consistently rated the best by our customers, and we have big plans ahead. We’re looking for a Research Scientist to help our customers get the right data faster, easier, and cheaper.

The Role

As a Research Scientist focusing on generating synthetic data at Encord, you'll play a critical role in helping customers proliferate their datasets at ease. Although starting narrow with a single domain in mind, you'll progressively work across a variety of industries and domains such as healthcare, geospatial, sports analytics, and surveillance, ensuring that customers can efficiently harness synthetic data to improve their AI models. Example tasks range from building easily adaptable diffusion models for generating new data with particular properties to developing novel ways of conditioning generative models to obtain new data with specific properties; All to mitigate customers’ data problems.

You'll follow the latest research, push state-of-the-art technologies forward to empower customers in their data journeys. This role offers a great growth opportunity, with the potential to lead a team of scientists over time in our efforts to provide high-quality synthetic data.

What you will be doing:

Building, Fine-tuning, and experimenting with deep learning-based approaches for (conditional) synthetic data generation, like Stable Diffusion and GANs.
Developing scalable and novel ways to condition data generation based on information from our data development platform.
Follow the latest machine learning research to identify and apply new methods that improve outcomes.
Work on cutting-edge generative models, starting with text-to-image models and potentially expanding into more domains like video and audio.
Ensure our customers have the world’s best platform for expanding datasets synthetically.

Skills for the job:

A PhD or similarly strong academic background in machine learning with 2+ years of hands-on experience in synthetic data generation for images or video (e.g., Stable Diffusion, GANs, Normalizing Flows, etc.).
Proficiency with frameworks like PyTorch, Tensorflow, JAX, Pandas, and OpenCV.
A quick learner with a structured, organized approach to problem-solving.
Excellent communication skills with an ability to uncover use cases and solve problems efficiently.
Ambitious and self-motivated, with a proven track record of top performance in academic or professional settings.

Bonus skills:

Experience working with data in the order of millions.
Experience with PEFT techniques like QLoRA
Familiarity with cloud-based model training and inference

What We Offer

Competitive salary, commission, and equity in a high-growth business.
A collaborative, in-person culture with most of the team working in the office 3+ days a week (engineers typically work on-site Wednesdays).
25 days annual leave + public holidays.
An annual learning and development budget to help you grow your skills.
Company lunches twice a week and regular socials, including bi-annual off-sites.

At Encord, you’ll have the unique opportunity to be part of a fast-growing startup with a clear mission and vision. You’ll work on real-world AI use cases across a variety of industry verticals and get hands-on experience with cutting-edge computer vision and deep learning technologies. This is a role where you'll grow quickly, take ownership of projects, and help shape the future of our company.

The role will be exposed to a broad tech stack (e.g. ReactJS, Python, REST & GraphQL, OpenCV, PyTorch, GCP, AWS & CUDA, Kubernetes) and the cutting edge of computer vision and deep learning.

Other jobs at Encord

Hundreds of YC startups are hiring on Work at a Startup.