This two-part series examines how enterprise-grade artificial intelligence is reshaping commercial litigation, focusing on how existing legal doctrines must evolve when AI performance is at the center of a dispute. It originally appeared in The PLI Current: The Journal of PLI Press, Vol. 10 (2026), available at plus.pli.edu.
Corporations across the globe are incorporating artificial intelligence into every aspect of their enterprise. IBM forecasts AI to drive up to “USD 4.4 trillion in annual productivity gains” by 2030. Forbes estimates that the global AI market will have an annual growth rate over 36% by 2027. And McKinsey reports that 78% of global companies use AI. Others currently project 99% use and adoption of AI amongst the Fortune 500.
Although expected, these statistics reflect a seismic change in the way business operates today. The rush to incorporate AI anywhere and everywhere suggests that AI will be deeply and broadly folded into core business workflows sooner rather than later. According to some, this will make profitability skyrocket. Recent reporting that AI “will replace literally half of all white-collar workers” suggests that this deep and broad adoption is already happening.
Despite this progress, some have decried a lack of “vertical adoption” of AI into core business functions like supply chain management, contract negotiation, and “general [automation of] enterprise-wide workflows.” McKinsey argues the lack of deep AI adoption is responsible for AI’s failure “to translate into tangible economic results.” Hence, McKinsey, and others, have called for “deep” adoption of AI into core business functions. McKinsey desires AI use to move beyond simple chatbots to enterprise-affecting deployments such as corporate strategy development, marketing drives, and financial decision making.
A deep adoption of AI begs a related question: What impact, if any, will this use of AI have on future litigation? It is not difficult to imagine deep AI adoption into core workflows generating the lawsuits of the future—particularly when those deployments fail to perform. Yet the issue of AI-driven litigation has not been addressed in a meaningful way: An internet search for “How will AI impact business litigation?” yields articles and papers on how AI will affect the practice of law, guides for law firms on how to use AI, and prognostications on the future of the legal profession. Tellingly, there is barely any content analyzing what impact artificial intelligence will have on how the lawsuits of the future are litigated.
The law’s substantive treatment of artificial intelligence is a nascent field. As with any technological innovation, the law is the last to catch up. Fifty-state surveys confirm as much. As of September 2025, a Westlaw search for “breach of contract” in the same paragraph as “artificial intelligence” across all state and federal jurisdictions yielded sixteen cases. A search for “warranty” in the same paragraph as “artificial intelligence” yielded only five cases. Fraud cases are more prevalent, yielding a mere fifty-seven cases nationwide.
Undoubtedly, deep, enterprise-grade adoption of AI—the type that will manage and guide core workflows —will certainly make the number of reported decisions skyrocket, forcing courts to grapple with how legal principles will interact with AI. This paper is intended to assist that process. Specifically, this series attempts to articulate how commercial litigation might adapt to challenges posed when AI models and their performance lie at the center of a commercial dispute. The intent here is to express how claims should be thought of, pled, discovered, and prosecuted with an eye towards the novelties and complexities created by enterprise-grade AI usage.
We do wish to highlight the risk of overselling the impact AI may have on the prosecution and defense of commercial claims. It is certainly in vogue to hyperbolize the effect AI will have on the business world. While AI will no doubt revolutionize the ways in which we communicate, create, and work, that certainly does not mean that AI will completely upend the practice of law or longstanding jurisprudence. We still believe that, regardless of what AI can do, litigation over its performance will follow this longstanding framework: 1) a product or service offered; 2) did not perform as promised; and 3) caused damages.
One could go even further and say that litigation over AI models is no different than software performance disputes—which have been litigated for three decades, if not more. We do not entirely agree with this view. Indeed, litigation over AI performance in commercial contexts will require a rearticulation of how breaches are defined, theories of causation are formed, and evidence is collected. While AI might not upend centuries-old legal doctrines, it will require long-standing legal frameworks to learn to speak a new language. We anticipate a reformation, not a revolution.
To that end, this series will begin by discussing some of the relevant features of AI that will directly impact core legal principles like causation and intent. It will then discuss a series of claims usually pled in commercial litigation—breach of contract, warranty, and fraud. It will conclude with brief remarks on what to expect in future lawsuits involving issues of AI performance. Again, the hope behind and purpose of this series is to offer ideas and theories on how to speak that language in ways that will be intelligible to judges, juries, and arbitrators.
Despite some overlap, there are meaningful differences in how AI models and traditional software operate. These differences could considerably impact the ways in which AI claims are pled, discovered, and ultimately tried. We provide a brief primer on some of these differences to illustrate how litigation over AI may have to be altered.
AI models are stochastic, generating outputs by calculating probabilities based on the interaction of several parameters as opposed to running on traditional computer code. This stochasticity makes AI models nondeterministic, generating “different outputs even when given the same input.” This is a critical difference from traditional software code whose deterministic algorithms “always yield the same [output] given identical inputs.”
AI stochasticism enables “creative and diverse” outputs, driving adaptability—AI models can both 1) deal with a variety of prompts while also 2) going “off script,” potentially generating novel and innovative outputs to the same type of inputs. This ability has led commentators to claim that the development of AI has been one of the most significant paradigm shifts in the history of computer science.
Yet it is not difficult to imagine the business challenges created by AI’s stochasticity. As one commentator noted, “[T]his very unpredictability also presents challenges, especially in scenarios demanding consistency, transparency, and efficiency.” Indeed, many commercial applications like compliance, supply chain management, and critical systems safety require predictable and repeatable outcomes. Stochasticity threatens these functions. Indeed, several commentators have already noted the risks AI poses to these types of business operations. That is to say nothing of the challenges to litigation posed by model stochasticity. For example, how can and will parties collect and preserve evidence from models that are varied and probabilistic rather than static and deterministic?
But there is an even deeper problem here. How does one qualitatively evaluate model performance when the whole purpose of the model is to generate different responses to the same prompts? As one Google working paper explains:
ML (Machine Learning) system testing is also more complex a challenge than testing manually coded systems, due to the fact that ML system behavior depends strongly on data and models that cannot be strongly specified a priori. One way to see this is to consider ML training as analogous to compilation, where the source is both code and training data. By that analogy, training data needs testing like code, and a trained ML model needs production practices like a binary does, such as debuggability, rollbacks and monitoring. So, what should be tested and how much is enough?
Because AI models produce a range of acceptable outputs, a static standard for benchmarking performance may very well be unusable. Such a fixed standard may not adequately grade performance because models are expected to produce a range of responses that can exceed, meet, or fall below a set standard. Rather, these models, which also evolve over time, may require an evaluative range, not an unchanging standard, that can grade an acceptable range of outputs.
Further complicating this problem is the desire to scale models. Developers are eager to scale as larger models perform better. Scaling model size creates a related complexity. As models scale, they are expected to produce additional output variety. In this sense, it is not difficult to envision how model updates could further complicate, or even obsolesce, pre-existing benchmarks.
Compounding the problems created by model stochasticity is the black box nature of AI. AI is thought to be a black box because the specific calculations that lead to a specific output are not readily ascertainable. That is because AI models operate through the interaction of a variety of parameters and contextual factors. These include seed (a number that determines the sequence of random choices the model makes during generation), temperature (which governs how freely the model samples from lower-probability options), model version (the specific iteration of the model in use), top-p and top-k (how wide a range of possible words the model considers when generating text, controlling whether to follow likely choices or allowing more variety) and, in retrieval-augmented systems, the retrieval index (the search layer that determines which external documents or data the model can “see” when generating a response). Because many of these parameters interact in complex and often undocumented ways, it’s nearly impossible to trace the exact steps that produce any specific output. As a result, the internal processes behind a given result function like a black box. And when the same prompt can produce different responses based on the minutia of how the model is calibrated, no one—neither the model proprietor nor the end user—can fully trace why a particular answer emerged. In practical terms, that means the model’s reasoning process cannot be audited the way a human decision can. The parameters that shape its “thought process” are scattered across codebases, servers, and dynamic indexes, many of which change over time. Without complete logs of 1) those variables and 2) the state of the retrieval layer (the component that dictates what the model looks at before generating an output) created at the moment of generation, the output becomes effectively unexplainable.
We do not intend to overstate the stochastic or black box nature of AI models. Model randomness can be reduced, or even eliminated, by lowering the temperature and constraining sampling. But at that point, the system behaves less like a generative model and more like traditional rule-based software. The promise of AI lies precisely in its ability to generate creative, non-obvious solutions—an ability that diminishes as the model becomes more deterministic. The black box problem in turn can be mitigated through comprehensive logging or recording of the variables that shape a model’s behavior at a given moment, including model weights, retrieval index states, prompts and responses, runtime parameters, and tool interactions. But whether enterprises will invest in resources to preserve that level of traceability—which come at a cost and require a complicated process of storing such logs for every inference—remains an open question.
Every law student is familiar with Summers v. Tice, which involved injuries suffered by Charles Summers while quail hunting with two friends, Harold Tice and Ernest Simonson. As the three hunters approached a quail covey in a triangular formation, Simonson and Tice fired at a quail at the same time and struck Summers, who stood in the bird’s path. Summers suffered serious injuries to his eyes and lip. In the lawsuit that followed, the court dealt with a difficult causation problem: How could the cause of Summers’ injuries be proven when it would be impossible to disentangle the shotgun pellets from Simonson and Tice’s guns and thus determine who caused the injuries and to what extent?
While quail hunting is far afield from AI, the two do share the same problem of apportioning causation when two or more actors are responsible for harm but were not acting jointly. As with Summers, establishing causation of the harms caused by model performance will require apportioning responsibility among multiple actors acting separately.
Model indeterminacy will further complicate this analysis. Traditional apportionment of responsibility presumes that the causal chain of events can be accurately reconstructed, allowing the trier of fact to assign percentages of fault. However, such reconstruction may not be possible with AI models. Aside from the difficulty of recording the interplay of the various parameters that go into generating an output, there is the additional problem of ascribing responsibility to others, like human prompters. Because of model indeterminacy, it is possible that one prompt might get the answer “right” and another might get it ”wrong.”
The foregoing issues highlight some but not all of the doctrinal complications created by AI. There will no doubt be more in the future. Nevertheless, these factors implicate the type of claims that will be asserted in commercial litigation growing out of AI performance. The following section outlines what those claims will look like in terms of theory and evidence.
At the outset, we provide the following survey of the case law as of this writing to illustrate the dearth of reported decisions that directly evaluate breach of contract claims involving AI performance. While each of the following decisions involved AI companies or AI technology, none dealt with whether the model’s performance breached a contractual obligation:
• Bees360 Inc. v. Cai, which involved an AI startup, only held that the plaintiff’s claims failed for failure to join indispensable parties and abide by a contractual forum selection clause.
• Estate of Lokken v. UnitedHealth Group, Inc. involved a claim for breach of contract growing out of the use of AI in making post-acute care coverage decisions and dealt solely with an exhaustion of remedies defense.
• Estate of Barrows v. Humana, Inc. dealt with a similar preemption issue, stating “[T]he question is not whether the use of AI to make coverage opinions is prohibited under the Medicare Act, but whether insurance companies use of AI is in violation of its contract with insureds. Therefore, Plaintiffs' claim for breach of contract and breach of implied covenant of good faith and fair dealing are not preempted.”
• Nuance Communications, Inc. v. Int'l Bus. Machines Corp. only resolved a statute of limitations accrual issue.
• Andersen v. Stability AI Ltd. held no contract existed in a class action growing out of Stable Diffusion’s use of “billions of copyrighted images.”
• Stratesphere LLC v. Kognetics Inc. only dealt with whether the plaintiff’s pleadings survived application of the economic loss rule, stating, “Defendant seeks similar monetary damages on its fraudulent inducement claim; however, Defendant also alleges the fraud caused a ‘loss in value of the Artificial Intelligence Platform and its associated business.’ Discovery may show that this loss in value is not separate from the breach-of-contract damages, but Defendant's allegations are sufficient at this stage.”
• Lehrman v. Lovo, Inc. dealt with the same issue as Kognetics.
• Chemimage Corp. v. Johnson & Johnson dealt with whether claims against J&J for breach or as a third-party beneficiary properly stated a claim under Rule 12.
• MillerKing, LLC v. DoNotPay, Inc. involved an AI model (that claimed to be the “World’s First Robot Lawyer”) but only evaluated whether the plaintiff had Article III standing to bring a Lanham Act claim.
• Hangzhou Jicai Procurement Co., Ltd v. Anaconda, Inc. similarly only evaluated whether the plaintiff had Article III standing.
• Doe 1 v. GitHub, Inc. revolved around the question of ownership of source code and not the operation of that code in conjunction with a business transaction.
Taken together, these cases show that there is a near-complete absence of reported decisions that evaluate whether a model’s performance breached a contractual obligation. This absence underscores the need to address the nuances and complexities growing out of breach of contract claims dealing with the operation of AI in business.
As detailed below, however, we do not believe that the adoption of and reliance on AI models in business will significantly affect theories of breach in the same way as other types of commercial claims. Contractual claims involving traditional software applications for business are usually generic, claiming that the software did not perform or function as contractually promised or in accord with “the contract documents” or industry standards. And claims that do involve more detail are usually tied to demonstrable and objective milestones such as generating specific types of reports, performing certain types of calculations, running overbudget, or failing to implement in line with a specified timeframe.
Regardless of the nuances and differences between traditional software and AI models, alleging a claim for breach of contract need not delve into those nuances and differences. In other words, a plaintiff could plausibly plead that a defendant breached a contract by failing to deliver an AI model that conformed to the contract documents and provided the promised functionality, or by failing to implement the system on the agreed-upon timeline or within the agreed-upon budget.
Furthermore, a well-plead contractual claim does not turn on the technical nuances of “old” software vs. “new” AI. Indeed, where the breach of contract claim revolves around an evaluation of how an AI model performed, it is difficult to see a qualitative difference between these types of claims versus, for example, those arising from a failed software implementation. Critically, as we detail below, we do see a distinction between a breach of contract and breach of warranty claim involving AI performance. Nevertheless, where AI models are used to improve other goods and services, the main contractual issues will revolve around whether those goods and services met a contractually defined standard. This inquiry would not necessarily focus on the AI model itself. Thus, while claims regarding the benefits provided to those goods and services by AI will certainly be made and could be actionable, the more appropriate framework for evaluating them is fraud, which turns on whether AI-related representations induced the contract in the first place. We discuss this issue at length later in this series.
This is not to say that breach of contract claims involving AI will not involve complicated questions of model performance vis-à-vis contractual promises. Indeed, the main challenge in these cases will be assessing model performance against a contractually defined standard to determine whether a breach occurred. But the core question will be how those contracts define the applicable standards, frameworks, and benchmarks for model performance.
While relevant to the breach of contract context, these issues of evaluating model performance are put into the spotlight with warranty claims. That is because warranty claims require a qualitative assessment of model functionality or performance. Indeed, as the US Supreme Court has noted, the essence of a warranty action is a qualitative assessment of the product or service’s functionality. We discuss this further in the following section.
Warranties are contractual obligations guaranteeing that a good or service will be of a particular standard, character, or quality. In the words of the US Supreme Court, the “maintenance of product value and quality is precisely the purpose of express and implied warranties.”
Guarantees of quality make sense for goods and services that should have a uniform and predictable character: A car is expected to be capable of taking a person from point A to point B and otherwise function in all the ways that a car normally should. But a qualitative assessment of what an AI model should be capable of doing may not be appropriate. Such a qualitative assessment will require articulation of a standard or benchmark from which to evaluate model performance. Yet qualitatively benchmarking AI performance will pressure-test how warranties are drafted and enforced because of the challenges that arise out of warranting the model’s qualitatively consistent performance. That is because AI models’ stochastic characteristics (especially when coupled with model drift, e.g., changes in performance over time due to updates, user interactions, and newly incorporated data) make establishing stable and measurable performance frameworks difficult. And without such frameworks, it is difficult, if not impossible, to evaluate a model’s quality and value—the whole purpose behind contractual warranties.
A typical contractual warranty in a software license or software maintenance agreement typically reads as follows:
1. Software Warranty. Seller warrants for a period of ninety (90) days from date of delivery of the Software, as well as for any period during which Maintenance is provided by Seller, that the Software shall be free from material program errors and defects in materials and workmanship and that the Software shall function substantially in accordance with the Software manuals. Seller does not warrant that the Software is completely error free and Seller does not warrant that the Software conforms to or satisfies any federal, national, state or local laws. The warranty applies to the standard Software only. If the Software is modified in any way, then the warranty applies only to the unmodified Software as distributed by Seller. Other than as provided in this paragraph, all other warranties, express or implied, are disclaimed and the Software is sold to buyer “AS-IS,” except as otherwise stated in this paragraph.
The evolving nature and use of AI models make these sorts of traditional performance warranties problematic. Along with their stochasticity, AI models evolve and therefore may over time meet, exceed, or fall short of a warranty’s performance benchmarks. Deep questions arise when this performance-variability must be reconciled with contractual language warranting that the model will “function substantially in accordance with” manuals and other contractual documents. The obvious question here is: What does “function substantially in accordance with contract documents” mean when that functioning is variable and prone to change? How do parties measure “material program errors” when model outputs vary even with the same prompting? What is a “material program error” when models are expected to be fine-tuned over time to better performance? Moreover, warranties stating that AI performance will “substantially conform to the contract documents or specifications,” “be out-of-the-box ready,” “be fully functional,” or “perform in accordance with the customer’s needs” appear to be so devoid of specificity as to provide little, if any, actual value. With software contracts, this language is already problematic as to meaning and scope; with AI, it is useless.
Notably, contractual provisions regarding material errors and defects in software have been heavily litigated already, although they have generated more complication than clarity. And these preexisting issues are magnified in light of the AI black box problem. As discussed above, AI models are black boxes because “the deep learning systems that power these models are so complex that even the creators themselves do not understand exactly what happens inside them.” Indeed, IBM noted that:
AI developers broadly know how data moves through each layer of the [neural] network, and they have a general sense of what the models do with the data they ingest. But they don’t know all the specifics. For example, they might not know what it means when a certain combination of neurons activates, or exactly how the model finds and combines vector embeddings to respond to a prompt. Even open-source AI models that share their underlying code are ultimately black boxes because users still cannot interpret what happens within each layer of the model when it’s active.
It is not difficult to see the fundamental challenges to establishing causation and warranting performance when model proprietors cannot strictly say they know what their model is actually doing. And this is to say nothing of implied warranties, with concepts such as “fitness for a particular purpose” and the attendant evaluative problems those concepts engender. But the overall complexity remains: Given how AI models function, how do you evaluate whether the AI “worked” as promised?
These issues beg the question: How can parties draft meaningful warranties that provide a benchmark to evaluate the quality of the model itself? Because warranties are guarantees of a particular type of character or quality, warranty provisions themselves must have a qualitative dimension. There are several ways for contract drafters and the litigators to achieve these goals.
Outcome-based warranties offer perhaps the most intuitive framework for aligning AI performance with real-world business expectations—especially in commercial environments where the model’s purpose is to streamline operational workflows or replace human-driven processes. These warranties create qualitative expectations as to the model’s outputs. That quality can be defined in terms of timing (how quickly outputs are generated), accuracy (how reliably they reflect correct or expected information), completeness (whether the output contains all required elements), or even robustness (how well the model performs across a range of scenarios or inputs). These provisions should be tied to tangible business results or outputs.
These warranties can be variegated—tied to stated expectations of different classes of business outputs or functions. Thus, these provisions will almost always require bespoke drafting and may need to evolve over time. That is because the challenge to implementing these types of warranties will be one of drafting: Parties must use terms broad enough to warrant what the model should be providing to the business but also be specific enough to be intelligible and actionable. But at bottom, tying warranty obligations to clearly defined outcomes—rather than the model’s inner workings—is the most practical way to manage risk. After all, if the outputs consistently meet the agreed-upon quality benchmarks, then the variability in how the model gets there becomes far less relevant.
Outcome-based warranties also provide the benefit of being tied to defined business metrics. Consider a model used to automate invoice processing. A warranty that specifies, “The model shall extract vendor name, invoice number, invoice date, line-item totals, and applicable taxes with a minimum 98.5% field-level accuracy over a 30-day rolling period” provides a framework that clearly articulates the parties’ expectations through an enforceable standard. In the event of litigation, this allows for a targeted analysis of system logs, benchmark reports, and exception-handling data to determine whether the model met or breached the contractual threshold.
Drafters should consider including the following provisions in outcome-based warranties:
• Scope: The specific tasks, workflows, and data categories covered by the warranty.
• Measurement Protocols: Rolling evaluation windows, benchmark data sets, and validation rubrics.
• Drift Management: Procedures for handling model updates, retraining events, or performance regression.
• Exclusions: Clear carveouts for data quality issues, integration failures, or other externalities.
• Observability: Logging and auditability requirements, including access to prompt templates, model versioning, and retrieval indices.
• Remedies: Tiered service credits, cure rights, fee adjustments, or termination rights based on the severity and persistence of performance shortfalls.
In addition, robust change of control provisions are essential. If the model operator materially modifies the model (e.g., through a version update, prompt schema change, or retraining of the retrieval corpus), the contract should require notice, rebenchmarking, and formal acceptance by the purchaser. This ensures that the warranty remains aligned with the model's actual behavior over time.
Process-oriented warranties offer a different approach by focusing not on the results generated by the model, but rather on the mechanism or workflow through which those results are produced. Instead of promising a specific outcome—such as a correct answer, accurate classification, or timely response—these warranties guarantee that the model will follow a defined process, such as responding to categories of inputs or routing data through a prescribed pipeline.
The appeal of this approach lies in its relative simplicity: The focus is on whether the model is functioning as designed, rather than whether the outputs meet qualitative standards. Further, this warranty framework is closely related to those already employed in SaaS contracts. However, this structure may offer limited value to users, particularly when the usefulness of the model is tied directly to the quality or reliability of its outputs. If a model generates consistently poor results while still technically adhering to its prescribed process, the user may find themselves without meaningful recourse. Thus, while process-oriented warranties can provide a contractual foothold in otherwise uncertain terrain, they may be best used in conjunction with other warranty frameworks—especially in high-stakes implementations where output quality has operational or legal significance.
Moreover, use of process-oriented warranties may run directly into the black box problem. The conceptual problem here is: How can vendors, proprietors, and enterprises promoting, selling, and using AI models to perform core business functions warrant their performance will meet a process-based warranty when those same vendors, proprietors, and enterprises cannot say for sure what that process even is? What does it mean to warrant a process hidden behind a black-box and would anyone feel comfortable doing so?
Industry-standard warranties offer a framework for shifting focus away from specific outputs or internal mechanics by tying performance obligations to compliance with recognized evaluation frameworks or development norms. Rather than guaranteeing that an AI model will produce a particular result, these warranties instead promise that the model has been built, tested, or validated in accordance with externally accepted benchmarks—such as Holistic Evaluation of Language Models (HELM) or other technical assessment tools designed to measure factors like accuracy, bias, or calibration. This approach can be useful in complex or evolving implementation environments, where it is difficult to define performance in absolute terms and where buyers may rely on a model for multiple use cases. But problems abound. Many of these “standards” are still in early development stages, often rooted in academic contexts that don’t cleanly map onto real-world commercial risk. Moreover, a model’s strong performance on a general-purpose benchmark may say very little about its reliability when applied to tasks like contract analysis, invoice reconciliation, or inventory optimization.
Nevertheless, for enterprise buyers conducting pre-sale diligence—or for lawyers drafting warranty clauses—this kind of warranty can serve as a baseline assurance that the model was not developed in an ad hoc or otherwise substandard manner. It also offers procedural clarity, which is especially valuable in negotiations where the technical asymmetry between vendor and purchaser is steep. While an industry-standard warranty will rarely be sufficient on its own to appropriately allocate performance risk, it can serve as one tier of a layered warranty strategy—bolstering more specific outcome- or process-based terms and offering purchasers a minimum level of transparency and structure in an otherwise unpredictable landscape.
We do not express an ultimate opinion as to which type of warranty or warranties should be used for enterprise applications of AI models. As the foregoing shows, each type of warranty has its own advantages and disadvantages. Nor does there appear to be a “one size fits all” solution; different models performing different operations supporting different functions will require different types of warranties. Moreover, where models are used across business functions or are performing a multitude of tasks, a tiered or multi-part warranty might be appropriate. This will also depend on what the model’s uses and the parties’ expectations are.
Ultimately, the success (or failure) of deep, enterprise-grade AI adoption will depend on whether contracting parties are able to tie AI functionality to clear, yet broad, contractual language that provides adequate remedies should the model fail to perform as intended. Drafting these provisions will likely prove to be a difficult iterative balancing act.
And when litigation arises, the enforceability of AI-related warranties—and the ability to effectively assess breach—will depend almost entirely on how warranties are drafted. If the contract lacks specificity, the resulting dispute will likely involve complex, expert-driven debates over vague terms and industry custom and practice. But with well-crafted language operationally grounded in the real world, parties can significantly reduce ambiguity, streamline discovery, and improve predictability in the event of a dispute.
Part II of this series turns to how these challenges surface in specific commercial claims and the evidentiary issues they present.
Footnotes
1 IBM, Artificial Intelligence (AI) Consulting Services, (last visited Jan. 30, 2026).
2 Rieva Lesonsky, 2025 AI Predictions for Small Businesses, FORBES (Feb. 11, 2025),
3 The State of Generative AI Report by McKinsey – Summary & Insights, PMWARES (June 21, 2025),
4 Naveen Kumar, How Many Companies Use AI? (Dec. 15, 2025),
5 PMWARES, supra note 3.
6 Samantha Subin, AI Is Already Taking White-Collar Jobs. Economists Warn There’s ‘Much More in the Tank’, CNBC (Oct. 22, 2025),
7 Alexander Sukharevsky et al., Seizing the Agentic AI Advantage, MCKINSEY & CO. (June 13, 2025),
8 Id.
9 IBM, Think: Topics, (last visited February 18, 2026) (detailing specific core business operations anticipated to be governed by AI).
10 Jason Brownlee, What Does Stochastic Mean in Machine Learning, MACHINE LEARNING MASTERY (July 24, 2020),
11 Sitation, LLC, Output from AI LLMs Is Non-Deterministic. What That Means and Why You Should Care (May 12, 2023),
12 Statsig, LLC, What Are Non-Deterministic AI Outputs? (Jan. 30, 2024),
13 Id.
14 Michael Lanham, From Fixed to Fluid: The Evolution from Deterministic Code to Stochastic AI, MEDIUM (Apr. 7, 2025), (explaining that the move from deterministic code to stochastic AI represents major paradigm shift in computing).
15 Sean Beard, Managing the Non-Deterministic Nature of Generative AI, PARIVEDA (Mar. 27, 2024),
16 Id. (noting that non-deterministic AI systems introduce unacceptable risk in contexts requiring consistent, predictable outcomes).
17 Eric Breck et al., The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, IEEE INTERNATIONAL CONFERENCE ON BIG DATA (2017),
18 Statsig, LLC, What Are Non-Deterministic AI Outputs? (Jan. 30, 2024), (explaining that non‑deterministic AI outputs require quality‑assurance and testing approaches that evaluate a range of acceptable results rather than a single expected outcome).
19 Ashish Vaswani et al., Attention Is All You Need, ARXIV (originally submitted June 12, 2017; revised Aug. 2, 2023),
20 Jason Wei et al., Emergent Abilities of Large Language Models, ARXIV (originally submitted June 15, 2022; revised Oct. 26, 2022), (observing that emergent abilities hinge on data and scaling conditions and that present training may be suboptimal as methods continue to evolve).
21 Id.
22 Matthew Kosinski, What Is Black Box Artificial Intelligence (AI)? IBM (Oct. 29, 2024),
23 Summers v. Tice, 33 Cal. 2d 80, 82 (1948).
24 Id.
25 Id.
26 Id. at 83.
27 See, e.g., TEX. CIV. PRAC. & REM. CODE § 33.003(a).
28 See infra Sec. II.B.
29 Perhaps the clearest expression of this issue is seen in the debates over who owns AI-generated works: the user or the model proprietor? See Logan Kugler, Who Owns AI’s Output? COMMUNICATIONS OF THE ACM (Oct. 4, 2024), (surveying unsettled questions around ownership of AI‑generated works and inventions, with differing national approaches and evolving rules on output protection and training‑data use).
30 Bees360 Inc. v. Cai, No. 4:22-CV-01035, 2025 WL 963082, at *6 (S.D. Tex. Mar. 31,
2025).
31 Estate of Lokken v. UnitedHealth Group, Inc., 766 F. Supp. 3d 835, 843 (D. Minn. Feb. 13, 2025).
32 Estate of Barrows v. Humana, Inc., No. 3:23-CV-654-RGJ, 2025 WL 2375645, at *11 (W.D. Ky. Aug. 15, 2025).
33 Nuance Commc’ns, Inc. v. Int'l Bus. Machines Corp., 544 F. Supp. 3d 353, 378 (S.D.N.Y. 2021), aff'd, No. 21-1758-CV, 2022 WL 17747782 (2d Cir. Dec. 19, 2022).
34 Andersen v. Stability AI Ltd., 700 F. Supp. 3d 853, 877 (N.D. Cal. 2023).
35 Stratesphere LLC v. Kognetics Inc., No. 2:20-CV-2972, 2021 WL 11457420, at *2 (S.D. Ohio May 4, 2021).
36 Lehrman v. Lovo, Inc., No. 24-CV-3770 (JPO), 2025 WL 1902547, at *27 (S.D.N.Y. July
10, 2025).
37 Chemimage Corp. v. Johnson & Johnson, No. 24-CV-2646 (JMF), 2024 WL 3758814, at
*1 (S.D.N.Y. Aug. 12, 2024).
38 MillerKing, LLC v. DoNotPay, Inc., 702 F. Supp. 3d 762, 767 (S.D. Ill. 2023).
39 Hangzhou Jicai Procurement Co., Ltd v. Anaconda, Inc., No. CV 24-1092-CFC, 2025 WL 2403147, at *7 (D. Del. Aug. 19, 2025).
40 Doe 1 v. GitHub, Inc., 672 F. Supp. 3d 837, 847 (N.D. Cal. 2023).
41See, e.g., Nasuni Corp. v. ownCloud GmbH, 607 F. Supp. 3d 82, 97 (D. Mass. 2022) (“Nasuni contends that ownCloud breached the PBA by failing to supply software that operated in accordance with the documentation and specifications set forth therein and failing to integrate MS 365 and MS Teams within the required [sic] timeframes.”); Challenge Printing Co., Inc. v. Elecs. for Imaging Inc., 2021 WL 3616766, at *1 (N.D. Cal. Aug. 16, 2021) (“The breach of contract claim is based on, among other things, an allegation that EFI failed to provide Professional Services in ‘good and workmanlike manner consistent with generally accepted industry standards.’”); O'keeffe's Inc. v. Access Info. Techs. Inc., 2015 WL 6089418, at *1 (N.D. Cal. Oct. 16, 2015) (“O'Keeffe's alleges that AIT failed to provide fully functional software and charged for license fees for a product that Plaintiff ‘never used, and was not able to use.’”); Hodell-Natco Indus., Inc. v. SAP Am., Inc., 13 F. Supp. 3d 786, 797 (N.D. Ohio Mar. 31, 2014) (construing breach of contract claim involving provision stating, “SAP warrants that the Software will substantially conform to the functional specifications contained in the documentation for six months following delivery.”); BHC Dev., L.C. v. Bally Gaming, Inc., 985 F. Supp. 2d 1276, 1285 (D. Kan. Dec. 4 2013) (“The pretrial order sets forth plaintiffs' claim that Bally breached its contract ‘by failing to deliver the Software in good working order and by failing to use reasonable efforts to repair errors and defects to restore the Software to good working order’”); W. Dermatology v. Vital Works, Inc., 2012 WL 2334567, at *3 (Conn. Super. Ct. May 18, 2012) (“Pages 7 through 12 of the MOD set forth the factual and legal conclusions of the court regarding breach of contract by Vital Works. Although defendants refused to accept responsibility for the nonfunctional software programs and blamed plaintiff, they did not present any facts to support application of the contractual provisions of limitation of liability and limited contract remedies.”); Dena' Nena' Henash, Inc. v. Oracle Corp., 2006 WL 8438428, at *2 (D. Alaska Dec. 15, 2006) (“TCC ties its claimed contract damages directly to these alleged functional defects in the software product it received pursuant to the License Agreement: ‘Because of the defects noted above and because the underlying software release licensed to TCC was incapable of providing the required functionality, TCC has spent and will need to spend considerable time and resources in upgrading its system’”).
42 See Grouse River Outfitters Ltd v. NetSuite, Inc., 2016 WL 5930273, at *1 (N.D. Cal. Oct. 12, 2016) (“According to Grouse River, the software was not installed on time, costs overran substantially, and the system never became fully capable of performing even the ‘core’ functions described in the contracts.”); Ronpak, Inc. v. Elecs. for Imaging, Inc., 2015 WL 179560, at *6 (N.D. Cal. Jan. 14, 2015) (“Specifically, Plaintiff has not received and Defendant has not implemented for Plaintiff in a reasonable and timely fashion functional software licensed by Plaintiff.”); Dena' Nena' Henash, Inc. v. Oracle Corp., 2006 WL 8438428, at *2 (“TCC alleges that Oracle misrepresented that its ‘out of the box’ software had the required functionality, including the ability to provide requested reports, to calculate and properly apply the proper indirect rates, to exchange data between the purchased modules, and to otherwise manage and control TCC's organization.”).
43 See supra notes
44 See infra Section C., Fraud.
45 BHC Dev., L.C. v. Bally Gaming, Inc., 985 F. Supp. 2d 1276, 1286 (D. Kan. 2013) (treating claims that software failed to function as warranted as warranty claims seeking the benefit of the bargain).
46 Id.
47 As mentioned above, a Westlaw search for “warranty” and “artificial intelligence” yields only five results: NewWave Telecom & Techs., Inc. v. Jiang, No. N20C-09-215 VLM CCLD, 2024 WL 4564150, at *1 (Del. Super. Ct. Oct. 24, 2024), reviewed the appropriate amount of fees owed to the prevailing party. Al-Hamim v. Star Hearthstone, LLC, 2024 COA 128, ¶ 13, 564 P.3d 1117, 1121 (Col. Court of Appeals, Div. 1) and Ferris v. Amazon.com Servs., LLC, No. 3:24-CV-304-MPM-JMV, 2025 WL 1091939, at *3 (N.D. Miss. Apr. 7, 2025), dealt with requests for sanctions due to AI hallucinations of caselaw. Nuance Commc’ns, 544 F. Supp. 3d at 378, dealt with a statute of limitations issue. And Perrine v. Sega of Am., Inc., No. 13-CV-01962-JD, 2015 WL 2227846, at *2 (N.D. Cal. May 12, 2015), a 2015 case, only resolved a class certification issue. Again, none of the reported cases construed the intricacies of warranties as they pertain to AI model performance.
48 E. River S.S. Corp. v. Transamerica Delaval, Inc., 476 U.S. 858, 872, 106 S. Ct. 2295, 2303, 90 L. Ed. 2d 865 (1986) (emphasizing that warranty law concerns a product’s failure to meet promised quality or performance) (emphasis added); see also Ellis v. Riddick, 34 Tex. Civ. App. 256, 261–62, 78 S.W. 719, 722–23 (1904) (explaining that when goods are not yet in existence, contractual quality specifications constitute enforceable promises rather than mere opinion).
49 E. River, 476 U.S. at 872.
50 While dealing with implied warranties, the Northern District of California has addressed the substantive issues in crafting warranties that deal with the vagaries of performance when the product of service is performing some, but perhaps not all, of the functions originally promised. In re Sony PS3 Litig., 2010 WL 3324941, at *2 (N.D. Cal. Aug. 23, 2010) (holding that partial or degraded performance may support an implied‑warranty claim only if the defects are sufficiently severe to render the product effectively unusable, even if not totally inoperable).
51 NewWave Telecom & Techs., Inc. v. Jiang, No. N20C-09-215 VLM CCLD, 2024 WL 4564150, at *1 (Del. Super. Ct. Oct. 24, 2024) (claim for breach of “fully functional” representation in a warranty provided by software purchase agreement).
52 Courts have already faced questions surrounding what standards to employ in qualitatively assessing whether a software defect violates contractual promises. See, e.g., IFS N. Am., Inc. v. EMCOR Facilities Servs., Inc., 2024 WL 4300348, at *6 (N.D. Ill. Sept. 26, 2024). These cases turn on highly specific determinations of the degree to which a defect impairs the plaintiff’s ability to use the software. Id. (addressing how to assess whether software defects excuse performance, and recognizing that disputes over whether alleged defects amount to nonperformance are often fact‑intensive and better resolved at summary judgment). The courts appear to be at some undefined location on the continuum between completely free from defect and completely non-functioning software. AI will only complicate these already difficult determinations.
53 Kosinski, supra note 3.
54 Id.
55 Yavar Bathaee, The Artificial Intelligence Black Box and the Failure of Intent and Causation, 31 HARVARD J. OF LAW & TECH. 890, 892-93 (2018) (arguing that modern ML systems make autonomous, opaque decisions based on patterns in data, undermining traditional intent‑and‑causation analyses); see also GIBRALTAR SOLUTIONS, Navigating the AI Black Box Problem (June 11, 2024), (explaining that the opacity of deep-learning systems makes it difficult to diagnose errors and remediate unwanted outcomes).
56 And while these articles are correct in identifying the problems created by opaque AI processes as they apply to business, these concerns have been articulated in general terms. See Backwell Tech Corp., The Explainability Imperative: Why Black Box AI Is Dead in Enterprise (July 18, 2025),
(“Opaque, unexplainable artificial intelligence systems, which are commonly called “Black Box” systems, present a serious challenge for enterprises, especially for those operating in regulated, high-stakes environments. From healthcare and insurance to logistics and finance, decision-making processes must be accurate, auditable, and interpretable.”); see also Emmanuel Raj, AI Has a Black Box Problem, Here’s How to Avoid It, INBOUND LOGISTICS (Oct. 2023), (noting that black‑box AI can complicate routine operational decision‑making across business functions despite delivering rapid outputs). As these models continue to develop and be used by business, further attention must be paid to the granular problems of reconciling AI performance and operability with contractual concepts such as those recited in the warranty provisions discussed above.
57 Further evidence of the natural fit between outcome-based warranties and AI models is seen in the enterprise software industry’s shift from traditional, subscription-based pricing models to “outcome based pricing.” Monetizely, How is Outcome-Based Pricing Revolutionizing Enterprise Software? (Aug. 12, 2025), (describing a shift from subscription-based pricing to models that tie payment to measurable outcomes and delivered value).
58 Ignacio Graglia, Software Warranty: What is it and Why is it Essential for Reliable Service Management, INVGATE (Oct. 17, 2024), (explaining that in SaaS models, warranties focus on uptime and service reliability rather than static defects, with remedies often structured as service credits for missed performance benchmarks).
59 At least one commentator has noted the attendant problems of proving causation engendered by the black box issue. See supra note
60 See, e.g., Percy Liang et al., Holistic Evaluation of Language Models, ARXIV (originally submitted Nov. 16, 2022; revised Oct. 1, 2023),