Member of Technical Staff, Full-stack/Product Engineer at sync. (W24)

$130K - $200K • 0.30% - 1.00%

AI lipsync tool for video content creators

San Francisco, CA, US / Remote

Full-time

6+ years

Apply now

About sync.

at sync. we're making video as fluid and editable as a word document.

how much time would you save if you could record every video in a single take?

no more re-recording yourself because you didn't like what you said, or how you said it.

just shoot once, revise yourself to do exactly what you want, and post. that's all.

this is the future of video: AI modified >> AI generated

we're playing at the edge of science + fiction.

our team is young, hungry, uniquely experienced, and advised by some of the greatest research minds + startup operators in the world. we're driven to solve impossible problems, impossibly fast.

our founders are the original team behind the open sourced wav2lip — the most prolific lip-sync model to date w/ over 9k+ GitHub stars.

we’re at a stage today in computer vision where we were w/ NLP two years ago — have a bunch of disparate, specialized models (eg. Sentiment classification, translation, summarization, etc), but LLMs (a generalized large language model) displaced them.

we’re taking the same approach – curating high quality datasets + training a series of specialized models to accomplish specific tasks, while building up to towards a more generalized approach for one model to rule them all.

post batch our growth is e^x – we need help asap to scale up our infra, training, and product velocity.

we look for the following: [1] raw intelligence [2] boundless curiosity [3] exceptional resolve [4] high agency [5] outlier hustle

About the role

Skills: React, TypeScript, Amazon Web Services (AWS)

About sync.

We’re a team of artists, engineers, and researchers building tools to understand and modify people in video — in the last year we graduated at the top of our YC batch (W24), raised a $5.5M seed backed by GV, won the AI grant from Nat Friedman and Daniel Gross, and scaled to millions in ARR from $0.

We’re building a zero-shot generalized video model to understand, generate, and gain fine-grained control over any human in any video. We’ve already released a state-of-the-art generalized lip-syncing model for content translation and word-level video editing — you can try our models for free through our developer playground and API.

We had a research breakthrough — we learned a highly accurate generalized representation of a human face. tldr; this unlocks many new editing tasks that haven’t ever been possible before.

As a we build out these model capabilities, we need to expand our product team to bring this magic into the hands of users.

What are we working on?

We live in an extraordinary time.

Video generation is world modeling. Deep learning unlocked the ability to decipher the world around us, to understand it, to compress its data and information into a 70b param network plus weights.

By simply changing these underlying numbers – these latent representations — we can reimagine and reconstruct reality in any way we see fit.

This is profound. A high schooler can craft a masterpiece with an iPhone. A studio can produce a movie at a tenth of the cost 10x faster. Every video can be distributed worldwide in any language with perfect preservation of meaning, instantly. Video becomes as malleable as text.

But we have two fundamental problems to tackle before this is a reality:
[1] Large models are great at generating entirely new scenes and worlds, but struggle with precise control and fine grained edits. The ability to make subtle, intentional adjustments – the kind that separates good content from great content – doesn’t exist.

[2] If video generation is world modeling, each human is a world unto themselves. We each have our idiosyncrasies that make us unique — creating primitives to capture, express, and modify them is the key to breaking through the uncanny valley.

While our focus in research is to push the boundary on what’s possible through new models and pipelines, our focus in product is to design and ship intuitive experiences that simply delight users and provide maximal utility.

Key Responsibilities

Architect and build intuitive experience to create and edit video with AI – from magical UX to scalable APIs
Own complete user journeys: ideation, prototyping, shipping, and rapid iteration based on user data
Interface seamlessly between model capabilities and intuitive user workflows
Design and implement product features that become industry standards
Champion performance, reliability and developer experience as we scale

Required Skills and Experience

Exceptional senior+ level full-stack engineer who has built consumer products users love
Deep expertise in React ecosystem, modern API design, and real-time systems
Strong product and design sensibilities - you know what makes an experience feel like magic
Track record of shipping and owning 0 to 1 features that drove massive impact
Experience with video manipulation, creative tools, or ML interfaces

Preferred Skills

Built and scaled systems handling millions of daily active users
Background implementing complex billing systems and user monetization
Strong opinions on developer tooling and engineering productivity
Experience with WebGL, Canvas, or video processing
Comfort with ambiguity and rapid iteration

Outcomes

Build breakthrough features that define the future of AI video creation
Create abstractions and APIs that accelerate entire team's velocity
Drive 10x improvements in key metrics through technical innovation
Set new standards for performance and reliability at scale
Help us grow exponentially by building things users can't live without

Our goal is to keep the team lean, hungry, and shipping fast.

These are the qualities we embody and look for:

[1] Raw intelligence: we tackle complex problems and push the boundaries of what's possible.

[2] Boundless curiosity: we're always learning, exploring new technologies, and questioning assumptions.

[3] Exceptional resolve: we persevere through challenges and never lose sight of our goals.

[4] High agency: we take ownership of our work and drive initiatives forward autonomously.

[5] Outlier hustle: we work smart and hard, going above and beyond to achieve extraordinary results.

[6] Obsessively data-driven: we base our decisions on solid data and measurable outcomes.

[7] Radical candor: we communicate openly and honestly, providing direct feedback to help each other grow.

Technology

next.js nest.js python pytorch aws/gcp/azure kubernetes

Interview Process

We’re a small team who works hard to create outsized impact. our interview process is grounded in reality:

We expect whoever we hire into this role to have a high degree of agency and maniacal urgency.

Our hiring process is grounded in reality — its really hard to get a sense of how well we'd work together from a simple interview question or take home test.

Here is our process:

[1] 30 mins, intro call to understand goals, evaluate mutual fit, and set up next steps.

[2] 3 hrs, technical assessment and interview loop. Can solve p2p live, or offline, you choose.

We understand everyone is different, and solving a real world problem in a similar environment and pace as you would normally in the role will be the best way of giving you the best chance to succeed.

[3] 4 hrs, irl on-site interview. we’ll fly you out to SF where we’ll work on a problem together.

From there, you’ll get an offer or decision in <24 hrs.

Fe want to set expectations – we work hard, work fast, and do a lot with very little. You'd be joining an outlier team at the ground floor, and a culture of curious obsession is what we care about most.

Apply now

Other jobs at sync.

Hundreds of YC startups are hiring on Work at a Startup.