Internship: AI-assisted matcher to classify Namespaces/Packages to Technologies
Duration: 6-months
Keywords: AI-assisted Classification, Semantic Matching, Data Processing
About CAST
CAST is the market leader in Software Intelligence. Its unique technology thoroughly examines the structure of complex software systems, with MRI-like precision. It delivers accurate, actionable, and automated views of software architecture, critical flaws, quality grades, sizing metrics, open-source usage, and cloud readiness levels. Hundreds of companies rely on CAST for greater objectivity in crucial business decisions, faster application modernization for Cloud, raising the quality and security of their custom software.
Founded more than 25 years ago and backed by nearly $200 million in R&D, CAST’s rapid analysis technology and its advanced ‘MRI for Software’ drive IT transformation and enable automation at the world’s largest systems integrators, hundreds of Global 2000 enterprises, and government agencies, in North America, Europe, India and in China.
CAST provides two sets of products of different technologies, implementation models, and usage:
- CAST Highlight is a SaaS product capable of performing rapid application portfolio analysis. It analyzes source code of applications to measure their cloud readiness, to analyze their composition, resiliency, and technical debt.
- CAST Imaging is an on-premises product (also cloud version since 2025) that can run for reverse-engineers all database structures, code components, and interdependencies in custom-built applications. It provides interactive and accurate architecture blueprints, data-call graphs, and end-to-end transaction flows.
We work at the intersection of Software Intelligence and Artificial Intelligence, helping organizations understand which technologies compose their applications. Our mission is to transform raw software artifacts (packages, namespaces, SBOMs, repositories) into structured insights that reveal how technologies are related, used, and evolving.
The project aims to design and implement an AI-assisted classifier that maps incoming software packages and namespaces to the correct technology labels using semantic similarity, embeddings, and learned heuristics.
The goal is to ensure high-quality matching even when input data is inconsistent, incomplete, or ambiguous.
Proposed work
This project invites you to explore how AI can bring structure to messy software ecosystems. Your milestones will include:
- Exploring and understanding the existing reference database (technologies, packages, namespaces).
- Building an ingestion pipeline to process external inputs (SBOMs, package managers, Git repositories).
- Designing embeddings or LLM-based matching strategies to classify items into known technologies.
- Handling ambiguous or low-confidence matches via AI + human-in-the-loop workflows.
- Improving accuracy through feedback loops, fine-tuning, and evaluation metrics.
- Prototyping visualizations or tools to assist validation and reporting.
Team:
Interns will be joining the CAST R&D team, a dynamic and innovative group of professionals specializing in software research and development. The team consists of experienced software engineers, data scientists, and industry experts who are passionate about pushing the boundaries of software technology. Working in this team offers:
- Exposure to cutting-edge research and development in software technology.
- Opportunities for mentorship and learning from seasoned professionals.
- A collaborative environment where creativity and innovation are encouraged.
- Involvement in projects that have a tangible impact on the industry.
Required Skills
- A Bachelor’s degree (or currently pursuing) in Computer Science, Data Science, Engineering, or a related field.
- Good knowledge of Python and common libraries (pandas, numpy, scikit-learn).
- Familiarity with machine learning, NLP, or semantic similarity.
- Strong analytical and problem-solving skills.
- Interest in software technologies, package ecosystems, and software intelligence.
- Ability to communicate findings clearly, both written and verbal.
Main Technology
The internship will involve working with various technologies, including but not limited to:
- Python
- LLMs and embeddings (OpenAI, Hugging Face)
- Similarity search / vector databases
- Data ingestion pipelines (package managers, Git, SBOM)
- Hybrid AI + human validation workflows
When: Flexible
Where:
Position is located at CAST’s France office at Meudon, ïle-de-France: 3 Rue Marcel Allégot, 92190 – Meudon
What we offer you
Lunch - Each employee benefits from a Swile card and access to FoodChéri
Remote - the possibility of remote work up to 3 days a week.
An exceptional working environment - we are well settled in a former mansion ideally located in Meudon with a beautiful garden (10 min by train from Montparnasse).
Feedback-Friendly Culture - In CAST we believe in effective feedback, we have been since day one normalizing feedback by incorporating it in our routine and creating safe space for employees to debate about what is and isn’t working.
Career prospects - In addition to our internal mobility policy that encourages employees to move between teams and subsidiaries.
At CAST, employees are encouraged to take on more and more responsibilities during their journey.
We are always looking for talented people who want to grow together with us. Would you like to join a truly entrepreneurial company and to be a part of our exciting journey? Apply today!