Researcher: Audio (Data)

Company : Cartesia

Job Type : Full Time

San Francisco, California

Number of Applicants

000+

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications

Activate JobCopilot

Job Description - Researcher: Audio (Data)

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

The Role

• Lead the design and creation of high-quality datasets tailored for training cutting-edge audio models, focusing on tasks such as speech recognition, enhancement, separation, synthesis, and speech-to-speech systems.

• Develop strategies for curating, augmenting, and labeling audio datasets to address challenges like noise, variability, and diverse use cases.

• Design innovative data augmentation and synthetic data generation techniques to enrich training datasets and improve model robustness.

• Create datasets specifically for speech-to-speech systems, focusing on alignment, phonetic variability, and cross-linguistic considerations.

• Collaborate closely with researchers and engineers to understand model requirements and ensure datasets are optimized for specific architecture and task needs.

• Build tools and pipelines for scalable data processing, labeling, and validation to support both research and production workflows.

What We’re Looking For

• Deep expertise in audio data processing, with a strong understanding of the challenges involved in creating datasets for tasks like ASR, TTS, or speech-to-speech modeling.

• Experience with audio processing libraries and tools, such as librosa, torchaudio, or custom pipelines for large-scale audio data handling.

• Familiarity with data augmentation techniques for audio, including time-stretching, pitch-shifting, noise addition, and domain-specific methods.

• Strong understanding of dataset quality metrics and techniques to ensure data sufficiency, coverage, and relevance to target tasks.

• Programming skills in Python and experience with frameworks like PyTorch or TensorFlow for integrating data pipelines with model training workflows.

• Comfortable with large-scale data processing, distributed file systems for audio data storage and processing.

• A collaborative mindset, with the ability to work closely with researchers and engineers to align data design with model objectives.

Nice-to-Haves

• Experience in creating synthetic datasets using generative models or simulation frameworks.

• Background in multimodal data curation, integrating audio with text, video, or other modalities.

• Early-stage startup experience or experience building datasets for cutting-edge research.

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Our perks

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Original job Researcher: Audio (Data) posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.

Auto-Apply to Similar Jobs

Share Job

Get your Resume Reviewed for Free

Automate Job Applications for Similar Jobs

Auto-Apply to Researcher Jobs with your AI JobCopilot

Auto-Apply with AI

Similar Researcher Jobs in the US

Get your Resume Reviewed for Free

Email address

Why are you reporting this job?

I think it’s a discriminatory or offensive

I think it’s fraudulent or a scam

I think it’s trying to sell something unrelated to the job / it’s asking for money

I think it contains incorrect or broken information

Other

All Job Ads are subject to GrabJobs’s Terms of Service. We allow users to flag postings that may be in violation of those terms. Job Ads may also be flagged by GrabJobs moderation team. However, no moderation system is perfect, and flagging a posting does not ensure that it will be removed.

Setup your job alert:

Frequency

By activating job alerts, I agree to GrabJobs Terms & Privacy Policy. I can unsubscribe to job alerts anytime. Skip

Researcher: Audio (Data)

Job Description - Researcher: Audio (Data)

About Cartesia

Similar Researcher Jobs in the US

Mobile Apps