AI/ML Security and Safety

Model integrity & adversarial testing

Overview

We assess AI/ML systems end-to-end — training data and MLOps pipelines, model artifacts, inference hardware, and deployed agent loops. We red-team deployed models, evaluate model capabilities against expert baselines, threat-model novel AI architectures, and stress-test the joins between disciplines where threat models routinely fall apart.

Our team combines ML researchers with application-security, systems, and cryptography engineers. AI failure modes live in the seams between these disciplines — a single-track review misses them.

Why work with Trail of Bits

01

Multidisciplinary experts on every project

Every AI/ML engagement pulls in cryptographers, application security engineers, and systems specialists alongside our ML team. The seams between disciplines are where real failures hide — and where a single-track review misses them.
02

We publish everything

Methodologies, capability benchmarks, and findings end up in public reports, papers, and open-source repos. Our DARPA AIxCC work (Buttercup), Leftoverlocals, and the Hugging Face safetensors review are all open — your team inherits the reasoning, not just the fix.
03

Deliverables your team can run with

Every engagement ships fixes you can drop into CI — Semgrep and CodeQL rules tuned to your model-serving stack, fuzzing and capability-evaluation harnesses, and short- and long-term SDLC recommendations your team can act on after we leave.

Read our assessment of Hugging Face

Services & deliverables

Security & Safety Training

Service

We offer custom training solutions based on specific client needs. Our courses cover comprehensive security training for understanding and evaluating AI-based system risks, including AI failure modes, adversarial attacks, AI safety, data provenance, pipeline threats, and risk mitigation.

Learn more about our training

ML-Ops and Pipeline Assessment

Service

Our assessments address the entire AI/ML pipeline. Machine learning operations (MLOps) introduce novel attack vectors that differ from traditional software backdoors and vulnerabilities that impact ML-based systems and their operations. This service uncovers categories of vulnerabilities that can lead to ML-specific failure modes and degraded model performance or implicit and explicit access to and changes in data, model parameters, and the IP, increasing the system's overall attack surface.

01: Software & ML architecture components (e.g., PyTorch)
02: CI/CD processes & data provenance analysis
03: Hardware stack security assessment (e.g., GPUs)

AI Risk Assessment

Service

Our offerings include threat modeling, applying operational design domains, and analyzing scenarios to identify functional risks. We also assess existing risk frameworks associated with AI adoption.

01: Comprehensive threat modeling for AI systems
02: Operational design domain analysis
03: Risk framework evaluation for AI adoption

Model Capabilities Evaluation

Service

We help organizations measure and validate the capabilities of the AI models their systems employ (both first- and third-party). Specifically, we specialize in assessing models' offensive and defensive cyber capabilities by benchmarking their performance against experts, state-of-the-art tools, and novices using AI/ML tools.

Our services are informed by our first-hand experience assessing cybersecurity threats posed by models (AI red teaming) and building automated, AI-based systems for detecting and patching software vulnerabilities (as part of DARPA's AI Cyber Challenge). We help our customers integrate only the most effective AI tools into their internal software security processes.

01: AI model performance benchmarking & validation
02: Offensive & defensive capabilities assessment
03: AI red teaming & security threat analysis
04: Integration guidance for AI security tools

What ships with every engagement

Most pen-test firms hand you a PDF and walk away. Every Trail of Bits engagement ships a deliverable set your engineering team can plug into their workflow on day one and keep using long after we're gone.

Deliverable	Trail of Bits	Status Quo
Written findings report Severity, difficulty, and exploit scenario for every finding.	✓	✓
Short- and long-term SDLC recommendations Not just bug fixes — process changes that prevent the next class of bug.	✓	—
Codebase + pipeline maturity evaluation Structured review of MLOps, data provenance, testing, and supply-chain hygiene.	✓	—
Exploit PoCs + code artifacts Runnable demonstrations for each finding so your engineers can reproduce and verify fixes.	✓	Sometimes
CI-ready Semgrep / CodeQL rules Custom static-analysis rules tuned to the model-serving and agent code we reviewed.	✓	—
Capability-evaluation + adversarial harnesses Drop-in benchmark and red-team harnesses your team keeps running after we leave.	✓	—
LLM and Claude-skill harnesses Agent skills and prompts to help your team triage findings and pre-flight the next review.	✓	—
Live walkthrough + fix-review retest We read out findings in person and re-test patches when they land.	✓	Sometimes
Open publication of generalizable findings Novel issues turn into public research so the whole industry benefits.	✓	—

Comparison based on the standard published deliverables of the major application-security firms as of May 2026.

Public work

Public AI/ML assessments

Browse library →

Public engagements: 2
Person-weeks logged: 6
Distinct groups: 1
With effort reported: 2

Recent public engagements

Date	Engagement	Client / group	Effort
Oct 2023	YOLOv7	AI/ML Reviews	4 wks
Mar 2023	SafeTensors	AI/ML Reviews	2 wks

Get in touch

Book a technical office hours session

Book a complimentary one-hour meeting with one of our engineers to dive into a challenging technical issue, explore tooling options, and gain valuable insights directly from our experts. This session is purely technical — no sales talk, just a focused discussion that showcases our depth, talent, and capabilities.

Book a Session

AI/ML Security and Safety

Why work with Trail of Bits

Multidisciplinary experts on every project

We publish everything

Deliverables your team can run with

Security & Safety Training

ML-Ops and Pipeline Assessment

AI Risk Assessment

Model Capabilities Evaluation

What ships with every engagement

Public AI/ML assessments

How we approach an AI/ML engagement

Scope and threat model.

Pipeline and artifact review.

Adversarial testing and capability evaluation.

Root-cause analysis.

Reporting and remediation.

Tools

Blogs

Research

Book a technical office hours session