Thought Leadership

California's AB 2013: Challenges and Opportunities in Generative AI Compliance

Client Updates

California’s AB 2013, the “Generative Artificial Intelligence Training Data Transparency Act,” is poised to reshape the landscape for developers of Generative AI (GenAI) systems. Signed into law on September 28, 2024, it sets forth comprehensive requirements for transparency in AI training datasets, reflecting growing public demand for accountability in artificial intelligence. As organizations grapple with compliance, the law presents both challenges and strategic opportunities.

The Law
AB 2013 requires developers of GenAI systems to publicly disclose detailed information about the datasets used in their development on their website. The law’s transparency mandate applies to all GenAI systems and services made available to Californians, regardless of whether compensation is involved, provided the systems were released on or after January 1, 2022. Compliance becomes mandatory by January 1, 2026, with updates required for substantial modifications to existing systems.

The statute defines GenAI broadly, encompassing systems that generate synthetic content, such as text, images, or audio, modeled after training data. Documentation to be posted on the developer’s website must include (among other things):

  • A high-level summary of datasets used in the system’s development.
  • Information on dataset sources, ownership, and intended purpose.
  • Descriptions of data types, including whether they include data protected by copyright, trademark, or patents, and whether the data includes personal information or aggregated consumer data.
  • Details on synthetic data usage, dataset cleaning or processing, and whether datasets were purchased or licensed.
  • The timeframes for data collection and the dates datasets were first used.
  • Whether synthetic data generation was used in developing the GenAI system or service.

The Challenges
Complying with AB-2013 will not be straightforward. For many developers, the law introduces significant interpretive and logistical hurdles. The requirement to disclose datasets “used in the development” of GenAI systems is a prime example of ambiguity. Developers often work with iterative models, testing multiple datasets before selecting the most effective ones for training final versions. The lack of clarity around whether datasets excluded during development must also be disclosed creates uncertainty. Section 3111(a)(11), which requires information about the dates datasets were “first used during development,” suggests that the scope of disclosure could extend beyond final training datasets. No matter how this interpretative challenge is resolved, the focus on the use of data sets in development underlines the immediate need to begin documenting the datasets used to develop AI models to prepare for compliance.

Another challenge lies in the IP implications of dataset disclosures. The law mandates the identification of datasets containing data protected by copyright, trademark, or patent rights. While copyrighted material has been the focus of much public discussion and litigation, trademark and patent protections introduce additional complexity. It is unclear how trademarks (which protect the identification of the source of a good or service) or patent rights (intended to protect inventions) would protect datasets.

Moreover, disclosing the datasets used to train GenAI systems could expose developers to substantial litigation risk. As unresolved legal battles over AI and copyright proceed, disclosing the contents of training data may expose developers to claims from rights holders. Numerous lawsuits remain pending and are unlikely to be fully resolved by the January 1, 2026 enforcement deadline. These forced disclosures may raise questions regarding admissions of wrongdoing, which could give rise to constitutional challenges to AB-2013 and theoretically may delay its enforcement.

Finally, at present, the contents of training data used to train GenAI systems are widely considered to be some of the most sensitive proprietary and trade secret information held by AI developers. Preserving that trade secret protection will be a significant challenge in view of these disclosure requirements. Careful consideration will need to be given to how the disclosures are made to minimize the risk of unnecessarily disclosing trade secret information.

The Opportunities
Despite its challenges, AB-2013 also offers opportunities for developers to gain a competitive edge. Transparency requirements can serve as a platform to showcase the robustness, ethical foundations, and exclusivity of a developer’s training data. By highlighting the use of licensed, high-quality datasets, developers can differentiate their systems from competitors, building consumer trust and market credibility.

However, seizing this opportunity may be challenging in view of the risks discussed above. The key challenge for AI developers will be crafting disclosures that comply with the law’s requirements, touting any competitive advantages in the datasets used for training while minimizing the risk that such disclosures could reveal trade secrets or expose the developers to the risk of new litigation.

Limited Exclusions
Exemptions from the obligation to post documentation regarding the data used to train a generative artificial intelligence system are available for the following GenAI systems or services:

  • those developed for national security, military, or defense purposes that is made available only to a federal entity;
  • those whose sole purpose is the operation of aircraft in the national airspace; or
  • those whose sole purpose is to help ensure security and integrity (meaning the ability of networks or information systems to detect security incidents that compromise the availability, authenticity, integrity, and confidentiality of stored or transmitted personal information; businesses to detect security incidents, resist malicious, deceptive, fraudulent, or illegal actions and to help prosecute those responsible for those actions; or businesses to ensure the physical safety of natural persons).

Conclusion
AB-2013 reflects California’s leadership in AI regulation, pushing the industry toward greater transparency and accountability. For developers of AI systems, the law presents a dual imperative: ensuring compliance while leveraging its requirements as an opportunity to build consumer trust and differentiate in a competitive market.

The path to compliance is fraught with interpretive challenges. Nonetheless, with strategic planning, developers can use transparency as a tool to demonstrate the quality and integrity of their systems, positioning themselves as leaders in the responsible development of generative AI.

As the January 1, 2026, compliance deadline approaches, organizations should begin planning now to document data sets used in training to position them to prepare appropriate disclosures by the deadline. Doing so will not only mitigate legal risk but also enable developers to turn transparency into a strategic advantage.

In an industry where innovation and trust are paramount, those who navigate these complexities successfully will be well-positioned to thrive in the evolving AI ecosystem. Your team at Baker Botts is ready and able to assist in planning for compliance as the deadline approaches and assisting in drafting appropriate disclosures by the enforcement deadline.

ABOUT BAKER BOTTS L.L.P.
Baker Botts is an international law firm whose lawyers practice throughout a network of offices around the globe. Based on our experience and knowledge of our clients' industries, we are recognized as a leading firm in the energy, technology and life sciences sectors. Since 1840, we have provided creative and effective legal solutions for our clients while demonstrating an unrelenting commitment to excellence. For more information, please visit bakerbotts.com.

Related Professionals