S

Research Intern Multimodal Foundation Model for Vision

salary Salary :

$50 monthly

icon building Company : Sony
icon briefcase Job Type : Internship

Number of Applicants

 : 

000+

Click to reveal the number of candidates who applied for this job.
icon loader
Apply Now
icon loader Apply Now

Let AI Supercharge Your Job Hunt!

JobCopilot scans 500,000+ company career sites daily to find jobs for you

Never miss an opportunity Save hours by auto-filling applications forms Land more interviews with tailored applications
happy man
thunder iconActivate JobCopilot

Job Description - Research Intern Multimodal Foundation Model for Vision



Sony AI America, a branch of Sony AI, is a remotely distributed organization spread across the U.S. and Canada. Sony AI is Sony’s new research organization pursuing the mission to use AI to unleash human creativity. Sony AI works closely with Sony’s other business units, including Sony Interactive Entertainment LLC., Sony Pictures Entertainment Inc., and Sony Music Entertainment. With some 900 million Sony devices in hands and homes worldwide today, a vast array of Sony movies, television shows and music, and the PlayStation Network, Sony creates and delivers more entertainment experiences to more people than anyone else on earth. To learn more: https://ai.sony/

Research Intern – Multimodal Foundation Model for Vision 

Sony AI is seeking research interns to join us. Our team mainly focuses on fundamental and applied research, with a focus on building next-generation foundation models for vision in a responsible manner. The role of a research intern is to develop efficient and effective methodologies and prototype solutions. You will work with a productive team of world-class scientists and engineers to tackle the most challenging problems in foundation models and generative AI, including low-cost yet powerful vision foundation models (VFM), vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge. You will see your ideas not only published in papers, but also improve the experience of billions of customers. 

Roles and Responsibilities 

  • Conduct fundamental and innovative development in low-cost yet powerful vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge. 

  • Design or implement state-of-the-art techs on model compression, inference speedup, deployement on harwares, tool automation. 

  • PoC for various vision+text, generation relevant tasks (VQA, captioning, understanding, etc) and hardwares. 

  • Contribute to library and tool development to support business; or Publish influential research in top-tier conferences and journals. 

Required Qualifications and Skills 

  • Currently has, or is in the process of obtaining, a master/PhD degree in computer science or related field. 

  • Be very self-motivated and capable of proposing and implementing innovative ideas. 

  • Solid presentation and communication skills to internal and external audiences. 

  • Publications or expertise in compact foundation model development and deployment. Influential open-source projects or paper publication at top conferences, e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ACL, etc. 

  • Better to have front-end development experience. 

  • Solid coding skills in Python, Pytorch, etc.

 

Working Location 

Location flexible (Tokyo, Europe , US) 

 

The target hourly rate for this internship is $50.00 per hour. The individual will be paid hourly and eligible for overtime.   

#LI-AS1

All qualified applicants will receive consideration for employment without regard to any basis protected by applicable federal, state, or local law, ordinance, or regulation.

Disability Accommodation for Applicants to Sony Corporation of America Sony Corporation of America provides reasonable accommodation for qualified individuals with disabilities and disabled veterans in job application procedures. For reasonable accommodation requests, please contact us by email at [email protected] or by mail to: Sony Corporation of America, Human Resources Department, 25 Madison Avenue, New York, NY 10010. Please indicate the position you are applying for.

We are aware that unauthorized individuals or organizations may attempt to solicit personal information or payments from job applicants by impersonating our company through fraudulent job postings.  We take these matters seriously but cannot control third-party websites. To protect your personal information, please verify that any job posting you respond to also appears on our official Careers page: www.sonyjobs.com.  Please also be advised that we never request personal identifying information (such as Social Security numbers, bank details, or copies of identification documents) during the initial stages of our application process.  If you have any doubts about the authenticity of a job posting or communication, please contact [email protected] before submitting any information.

Right to Work (English/Spanish)

E-Verify Participation (English/Spanish)

Original job Research Intern Multimodal Foundation Model for Vision posted on GrabJobs ©. To flag any issues with this job please use the Report Job button on GrabJobs.
Apply Now
Share Job
Share Job

Auto-Apply to Research Intern Jobs with your AI JobCopilot

thunder icon Auto-Apply with AI

Similar Research Intern Jobs in the US

GrabJobs is the no1 job portal in the US, connecting you to thousands of jobs fast! Find the best jobs in the US, apply in 1 click and get a job today!

Mobile Apps

Copyright © 2026 Grabjobs Pte.Ltd. All Rights Reserved.