Trail of Bits

AI/ML Security and Safety

Model integrity & adversarial testing

Overview

We assess AI/ML systems end-to-end — training data and MLOps pipelines, model artifacts, inference hardware, and deployed agent loops. We red-team deployed models, evaluate model capabilities against expert baselines, threat-model novel AI architectures, and stress-test the joins between disciplines where threat models routinely fall apart.

Our team combines ML researchers with application-security, systems, and cryptography engineers. AI failure modes live in the seams between these disciplines — a single-track review misses them.

Why work with Trail of Bits

  • 01

    Multidisciplinary experts on every project

    Every AI/ML engagement pulls in cryptographers, application security engineers, and systems specialists alongside our ML team. The seams between disciplines are where real failures hide — and where a single-track review misses them.

  • 02

    We publish everything

    Methodologies, capability benchmarks, and findings end up in public reports, papers, and open-source repos. Our DARPA AIxCC work (Buttercup), Leftoverlocals, and the Hugging Face safetensors review are all open — your team inherits the reasoning, not just the fix.

  • 03

    Deliverables your team can run with

    Every engagement ships fixes you can drop into CI — Semgrep and CodeQL rules tuned to your model-serving stack, fuzzing and capability-evaluation harnesses, and short- and long-term SDLC recommendations your team can act on after we leave.

Services & deliverables

Security & Safety Training

Service

We offer custom training solutions based on specific client needs. Our courses cover comprehensive security training for understanding and evaluating AI-based system risks, including AI failure modes, adversarial attacks, AI safety, data provenance, pipeline threats, and risk mitigation.

ML-Ops and Pipeline Assessment

Service

Our assessments address the entire AI/ML pipeline. Machine learning operations (MLOps) introduce novel attack vectors that differ from traditional software backdoors and vulnerabilities that impact ML-based systems and their operations. This service uncovers categories of vulnerabilities that can lead to ML-specific failure modes and degraded model performance or implicit and explicit access to and changes in data, model parameters, and the IP, increasing the system's overall attack surface.

01
Software & ML architecture components (e.g., PyTorch)
02
CI/CD processes & data provenance analysis
03
Hardware stack security assessment (e.g., GPUs)

AI Risk Assessment

Service

Our offerings include threat modeling, applying operational design domains, and analyzing scenarios to identify functional risks. We also assess existing risk frameworks associated with AI adoption.

01
Comprehensive threat modeling for AI systems
02
Operational design domain analysis
03
Risk framework evaluation for AI adoption

Model Capabilities Evaluation

Service

We help organizations measure and validate the capabilities of the AI models their systems employ (both first- and third-party). Specifically, we specialize in assessing models' offensive and defensive cyber capabilities by benchmarking their performance against experts, state-of-the-art tools, and novices using AI/ML tools.

Our services are informed by our first-hand experience assessing cybersecurity threats posed by models (AI red teaming) and building automated, AI-based systems for detecting and patching software vulnerabilities (as part of DARPA's AI Cyber Challenge). We help our customers integrate only the most effective AI tools into their internal software security processes.

01
AI model performance benchmarking & validation
02
Offensive & defensive capabilities assessment
03
AI red teaming & security threat analysis
04
Integration guidance for AI security tools

What ships with every engagement

Most pen-test firms hand you a PDF and walk away. Every Trail of Bits engagement ships a deliverable set your engineering team can plug into their workflow on day one and keep using long after we're gone.

Deliverable Trail of Bits Status Quo

Written findings report

Severity, difficulty, and exploit scenario for every finding.

Short- and long-term SDLC recommendations

Not just bug fixes — process changes that prevent the next class of bug.

Codebase + pipeline maturity evaluation

Structured review of MLOps, data provenance, testing, and supply-chain hygiene.

Exploit PoCs + code artifacts

Runnable demonstrations for each finding so your engineers can reproduce and verify fixes.

Sometimes

CI-ready Semgrep / CodeQL rules

Custom static-analysis rules tuned to the model-serving and agent code we reviewed.

Capability-evaluation + adversarial harnesses

Drop-in benchmark and red-team harnesses your team keeps running after we leave.

LLM and Claude-skill harnesses

Agent skills and prompts to help your team triage findings and pre-flight the next review.

Live walkthrough + fix-review retest

We read out findings in person and re-test patches when they land.

Sometimes

Open publication of generalizable findings

Novel issues turn into public research so the whole industry benefits.

Comparison based on the standard published deliverables of the major application-security firms as of May 2026.

Public work

Public AI/ML assessments

Browse library →
Public engagements
2
Person-weeks logged
6
Distinct groups
1
With effort reported
2

Recent public engagements

Date Engagement Client / group Effort
Oct 2023 YOLOv7 AI/ML Reviews 4 wks
Mar 2023 SafeTensors AI/ML Reviews 2 wks

Get in touch

Book a technical office hours session

Book a complimentary one-hour meeting with one of our engineers to dive into a challenging technical issue, explore tooling options, and gain valuable insights directly from our experts. This session is purely technical — no sales talk, just a focused discussion that showcases our depth, talent, and capabilities.