Job Description

Join a leading AI lab''s cutting-edge GenAI team to be at the core of the AI revolution, where your expertise fuels the development of the most advanced Large Language Models. Overview

Professors and PhD students across all academic disciplines, STEM (ML, Coding, Data Science, CS, Physics, Mathematics, Engineering, Statistics) as well as professional and quantitative domains (Finance, Accounting, Economics, Law, Business), are invited to contribute to a project supporting a frontier-model evaluation effort focused on coding and agentic workflows. You will design and validate challenging benchmark tasks to help surface and diagnose reasoning and problem-solving gaps in a target model. The work centers on building robust, real-world tasks with executable Python tests and analyzing model/agent behavior. All applicants are expected to have working proficiency in Python.

This is a W2 employment position with Cincinnatus LLC, with the opportunity to be placed at a leading AI Lab as part of their extended workforce. You will join a team of domain experts and together, you will guide the next generation of frontier AI tools.

Key Responsibilities

Task Design and Development: Design challenging, real-world domain-specific problems drawn from your area of expertise (e.g., financial modeling, legal reasoning, econometrics, ML, coding, scientific computation) that serve as the foundation for agentic tasks. Problems should be constructed to target specific core capability loss failures identified in a frontier AI model.
Spec & Golden Solution Generation: Integrate the problems into an Agentic development environment, preparing all necessary components using Python.
Evaluation and Analysis: Evaluate the target model''s performance on the tasks.
Headroom Identification: Identify tasks where the target model fails to pass all tests, specifically classifying the failure as a logical reasoning failure.

Core Qualifications

Current or retired professor, OR PhD student, in any of the following areas:

STEM: ML, Coding, Data Science, CS, Physics, Mathematics, Engineering, Statistics, Biology, Chemistry
Professional / Quantitative: Finance, Accounting, Economics, Law, Business

Degree (or PhD in progress) from a top university in your field.
Working proficiency in Python, applied in research, industry, GitHub, or coursework (not theoretical familiarity).
Ability to engage reliably for at least 30 hours/week during weekdays (i.e., at least 6 hours/day during weekdays).
Past experience in AI training, model evaluation, and data annotation is preferred.
Basic ability to work independently and manage one''s time.
Verbal and written communication skills, problem-solving skills, and interpersonal skills.

About Cincinnatus LLC

Cincinnatus LLC is an enterprise staffing company that partners with leading technology companies to source and employ highly skilled professionals for contingent and contract-based opportunities. Cincinnatus serves as the employer of record for these engagements, providing W-2 employment, payroll, benefits, and compliance, while placing employees directly within client teams to work on high-impact initiatives.

Roles hired through Cincinnatus are not project-based or freelance engagements. They are structured, role-based positions that typically involve part-time or full-time commitments, close collaboration with a client''s internal teams, and integration into standard enterprise workflows.

Cincinnatus is a legal entity separate from any platform. While opportunities may be discovered through various channels, employment, onboarding, payroll, and benefits for these roles are administered by Cincinnatus LLC.

Equal Employment Opportunity

Cincinnatus is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy), age, disability, sexual orientation, gender identity, or any other characteristic protected by applicable law.

Job Tags

Full time, Contract work, Part time, Freelance, Work at office, Weekday work

Similar Jobs

Maleda Tech

Brand Copywriter Job at Maleda Tech

...technology developments, and content best practices. Key Details Start Date: May 12, 2026 (or ASAP) Duration: 1 year, with a possibility of extension Location: Remote (U.S.-based) Schedule: Full-time (40 hours/week) Compensation: $65-$75 per hour...

S Universe

Junior NetSuite Project Manager Job at S Universe

...The NetSuite Project Manager is responsible for planning, executing, and finalizing NetSuite implementation projects within scope, budget, and timeline. This role involves coordinating with internal teams, clients, and stakeholders to ensure successful project delivery...

C3 Trucking

CDL A Local Drivers--Home Every Day Job at C3 Trucking

...with a manual pallet jack. HOME EVERY DAY Shift: Mon-Fri Start time: 02:30am-... ...and urine drug test and a DOT physical C3 Trucking stands for safe, secure, and reliable... ...disability, veteran status, or any other protected factor under federal, state, or local law....

Pfizer

Development Head, RSV, RSV Combination, and Early Vaccine Programs Job at Pfizer

...business needs and/or eligibility. Candidates must be authorized to be employed in the U.S. by any employer. U.S. work visa sponsorship (such as TN, O-1, H-1B, etc.) is not available for this role now or in the future. Sunshine Act Pfizer reports...

Evolution United States

In-Studio Casino Dealer- Night Shift 11pm-7am (Customer Service / Receptionist Alternative) - up to $25/hr+ Job at Evolution United States

...Roulette, Baccarat on camera from a professional broadcast studio while players join online from around the world. No casino experience needed. No dealing experience required. We train you. You show up. We handle the rest. If you naturally prefer nights,...

Academic Research Collaborator for AI Model Evaluation Job at SaidGig, United States

bkJtMVF3c0UydHVLRGg1Vjhxc0htK09Qc2c9PQ==