DisclosureIndependent directory. Not a CPA firm. Nothing here is legal, audit, or tax advice. Methodology.

SOC 2 for AI Companies: MLOps Controls and Auditor Picks

By Editorial team · Published 2026-04-22 · Last updated 2026-04-22

AI and ML companies face non-standard SOC 2 evidence requirements around model versioning, training data, and inference access controls. Here's how to handle them.

A standard SOC 2 program addresses the controls most SaaS companies run — cloud infrastructure security, access management, change management, vendor management. AI and ML companies have additional operational surfaces that don't map cleanly to the Trust Services Criteria: model training pipelines, training data custody, model versioning and rollback, prompt injection risk in LLM-based products, and the access control boundary between inference APIs and production data. Auditors without AI/ML experience will either miss these surfaces or apply generic controls that don't reflect your actual risk.

Where standard SOC 2 controls fall short for AI systems

The AICPA's Trust Services Criteria were designed for general-purpose software systems. They don't include AI-specific criteria. However, the existing TSC categories — especially CC6 (Logical Access), CC7 (System Operations), CC8 (Change Management), and CC9 (Risk Mitigation) — can be applied to AI-specific surfaces if the auditor understands them. The issue is that most auditors frame these controls around traditional software artifacts (code deploys, admin user lists) rather than ML artifacts (training runs, model registry, dataset versions).

CC6.1 — Access to training data: Who can read, modify, or delete training datasets? This is a logical access control question, but the answer involves data pipeline permissions (e.g., S3 bucket policies, Databricks workspace ACLs) rather than just application-level roles.
CC8.1 — Change management for model versions: Model promotion from development to production is a change management event. Auditors need to see a model registry (MLflow, Weights & Biases, SageMaker Model Registry) with approval gates and rollback capability.
CC7.2 — Logging and monitoring for inference: Inference API calls that process sensitive user data should be logged. LLM prompt and completion logging raises its own privacy questions — this interplay needs to be scoped explicitly in the system description.
CC3.2 — Risk assessment for model behavior: Adversarial inputs, prompt injection, and model output unpredictability are novel risk vectors. Document these in your risk assessment as AI-specific risks with mitigating controls.

Controls AI companies should implement before fieldwork

Training data access controls: Implement and document least-privilege access to training datasets. Separate raw data, processed data, and training-ready datasets into distinct access tiers with separate IAM policies.
Model versioning and registry: Use a formal model registry with versioned artifacts and a documented promotion process (dev → staging → production). Model registry commits should have the same change control rigor as code deploys.
Inference API logging: Log all inference requests and responses that process personal or confidential data. Define retention periods and access controls for inference logs separately from application logs.
Data lineage documentation: For models trained on customer data or third-party datasets, document data provenance, licensing terms, and any data deletion/correction obligations.
Model performance monitoring: Deploy drift detection and anomaly monitoring on production models. Undetected model degradation is a Processing Integrity risk under PI1 TSC.
Third-party AI service inventory: If you use OpenAI, Anthropic, Google Vertex, or AWS Bedrock in production, include these as subservice organizations in your SOC 2 system description. The carve-out vs. inclusive method question applies.

Finding an auditor with AI/ML experience

The key interview question is: 'How have you scoped the system description for a client with a machine learning training pipeline? What control evidence did you request for model deployment?' If the auditor gives a generic answer about change management without referencing ML-specific artifacts, they haven't done it before.

Auditors who have published public guidance on AI controls or served AI-native clients include Schellman (has published AI security assessment frameworks), A-LIGN, and Prescient Assurance. When evaluating boutique firms, ask for a reference from an AI or ML company they've audited — not just a SaaS client. The underlying technical knowledge needs to be in the engagement team, not just the firm brand.

The system description is the hardest part

For AI companies, the SOC 2 system description — Section 3 of the report — requires more care than for a standard SaaS product. The system boundary must include the training pipeline, model serving infrastructure, and any third-party AI APIs in scope. If your system description omits the training pipeline, your report has a gap that sophisticated buyers will identify. Work with your auditor to draft the system description before fieldwork begins, not after.

What enterprise buyers ask AI companies specifically

Enterprise security teams reviewing AI vendors' SOC 2 reports are beginning to ask questions that go beyond the standard vendor risk questionnaire. Being prepared for these is part of running an AI-aware SOC 2 program.

'Is customer data used to train your models?' — This is a data use question, not strictly a security question, but it belongs in your system description and your data processing agreement. If the answer is no, document the technical controls (model serving without training data feedback loops) that enforce this. If the answer is yes, describe the consent mechanism and data segregation.
'Do your models process customer PII?' — Inference systems that process names, email addresses, financial data, or medical information have different logging and data minimization obligations than models processing aggregated or anonymized inputs. Your SOC 2 system description should reflect this boundary.
'What is your model rollback procedure if a model behaves unexpectedly in production?' — This maps to CC8.1 (change management) and CC7.3 (response to identified anomalies). Your model registry should have a documented rollback process with a defined time objective.
'Have you had any AI-specific security incidents (prompt injection, model exfiltration, training data poisoning)?' — Your incident log should be reviewed for AI-specific events before fieldwork. Document your detection and response capability for these vectors explicitly.