The CMA’s Initial Report (Report), published on 18 September 2023, considers three levels of the AI Foundation Model (FM) value chain (although it focuses predominantly on the first two levels):
- Development of FMs
- How FMs are used in user applications and other markets
- The consumer experience when using AI tools (whether standalone or incorporated into other services).
The Report characterises FMs as large, general machine learning models trained on vast amounts of data, noting that approximately 160 FMs have been released to date (although some have already been become obsolete and been replaced). It also acknowledges that, while some FMs have been released by established players (Google, Meta, Microsoft and NVIDIA), new entrants include OpenAI, Anthropic, Stability AI and Midjourney.
It goes on to address training, fine-tuning and deployment of FMs that are deployed in user-facing applications (including direct deployment, access through APIs (AI-as-a-service) or building plug-ins to work with FM applications). In the context of deployment, the Report differentiates between “open source” FM models (that are freely shared, enabling other RM developers to build on them, such as the UAE Technology Innovation Institute’s Falcon model), models that are available (in that licensing restrictions limit commercial use, like the licence required for Llama-2 if the app/service into which it is deployed has more than 700 million monthly users), and “closed source” models that are not publicly shared and are either only used by the developer itself (e.g., BloombergGPT) or licensed for use through APIs (e.g., OpenAI’s GPT-3).
The CMA views FM development, training and deployment as being structured as follows:
Development of FMs
The CMA identifies four key issues that it believes will determine whether upstream development of FMs will remain competitive or whether only a handful of leading models will be created and maintained.
Access to Data. The CMA notes that, while some FMs were pre-trained using only publicly available data (e.g., Meta’s Llama 2 and Stability AI’s Stable Diffusion), others have used unpublished data (e.g., academic journals, image repositories and content websites). However, it then extrapolates from this to conclude that “freely available data may be fully exploited” (or grow at a slow rate), such that FM developers without access to proprietary data that they produce themselves (i.e., because they are not vertically integrated or active in related markets) will have to purchase it, thereby increasing their costs. It is not clear how the CMA has concluded that, despite the exponential growth of the Internet, scraping will not produce a broad and deep data set for FM training, nor has it considered that sources like image repositories are particularly relevant for Image Generative AI models (rather than their use reflecting a dearth of public data for FM training). Nor does it consider the potential impact of data portability rights (under the EU Digital Markets and Data Acts, for example) that have the potential to increase the amount of user-generated data that might be available to FM developers. As a result, it is not clear why the CMA is suggesting that the viability of synthetic data is central to training of new FMs or that proprietary data would become less available or more expensive, thereby risking tipping of FM development to vertically integrated developers.
Access to Computing Power. The CMA rightly notes that popular open-source FMs have tens of billions of parameters, such that they need access to large, distributed computing. Having referenced compute costs that range between $4 million (Meta’s LLaMA) and $100 million (Megatron-Turing NLG), the CMA states that it has heard that FM developers with existing arrangements with computing providers are more likely to get access to the computing power that they need. However, it does not suggest that cloud infrastructure providers have refused (or are likely to refuse) to supply access to computing – the Ofcom cloud report which it references considered issues relating to switching and multi-homing, not excessive pricing or refusals to supply. As a result, while the CMA is correct that competitive FM development requires access to computing power on fair commercial terms, without undue restrictions, there does not appear to be some evidence that such access is not being provided.
First mover advantage. While early movers in certain markets have enjoyed advantages, it is not yet clear whether prominence and brand recognition (both cited by the CMA) are crucial in the relation to MFs – we don’t yet know whether or how the success of directly deployed FMs or FMs deployed to improve existing or new services will turn on prominence and brand recognition. As the CMA itself notes, a number of the 160 FMs developed to date are now obsolete, and it seems unlikely that FM used as an input will be successful largely as a result of prominence or brand recognition. Further, it is not clear that new entrant FM developers will be unable to access funding – as recently as July 2023 start-up AI developers were raising hundreds of millions in funding in a matter of weeks. Finally, the CMA notes that “feedback loops” have the potential to benefit FM developers that are active in other markets. However, this point does not appear to be related to the timing of entry; rather, it appears to be another limb of the “access to data” concerns addressed above.
The CMA’s analysis of FM development concludes by identifying five “key uncertainties” framed as having the potential to make it harder for “some firms” to compete, thereby stifling innovation and limiting diversity:
- The need for proprietary data for training FMs
- The need for larger FMs
- The need for cutting-edge performance
- The funding, data, technical expertise and resources advantages of large technology companies
- The challenges facing open-source FM models, including licensing restrictions and funding uncertainty.
However, the Report provides no evidence that any of these uncertainties are likely to happen. However, it does at least make clear what the CMA will explore further in its post-Report engagement with market players.
Impact of FMs in Other Markets
The CMA notes that FM deployment is still at an early stage, but it is clear that it can be deployed both to improve existing services (whether at the user-facing level or further up the value chain) and in new services. Interestingly, it characterises FM deployed in search and virtual assistants as “new” (rather than “existing”) services, but notes that such services have the potential to either undermine or reinforce existing market positions (i.e., to compete with existing services). This has the potential to create disconnects between ex ante regulation of digital services, on the one hand, and ex post application of competition law, on the other.
The CMA has once again identified four key issues that it believes will determine whether the deployment of FMs in other markets will potentially have a “concerning market outcome”.
Effective Choice and the Ability to Switch. The CMA’s investigation has made it clear that there are currently a broad range of models to deploying FMs: beyond developing an FM from scratch (e.g., Bloomberg, Pfizer and Adobe), service providers can partner with an FM provider to enhance an existing FM, buy access through an API to third-party models and deployment tools (e.g., ChatGPT API, Google Bard API, NVIDIA AI Foundation and IBM Watson x), or use a third-party plug-in (e.g., Shutterstock, Expedia and Duolingo). FM service developers told the CMA that it is relatively easy to switch between models (and FMs). Despite this, the CMA states that it is uncertain whether such switching will remain easy and affordable, largely on the basis that it is not clear that the market for development of FMs is competitive. Displaying a rather circular logic, the Report then notes that competition in FM development will be more effective if FM service developers are able to easily switch between FMs.
Customer Preferences. The Report focuses on the potential for consumers to prefer to be able to access an “ecosystem” of FM services and non-FM services at once (e.g., integrating productivity software and operating systems), enjoying the convenience of having a single, integrated ecosystem that can learn about the user from engagement with other services to customise services integrating FM. While the CMA acknowledges that such customisation will improve the user experience, it notes that consumers will only be able to switch to a rival provider if they do not lose the customisation benefit, creating the potential for lock-in. In this context, the Report notes the importance of data portability. Of course, since regulation is being introduced in a number of jurisdictions that will enable such portability (e.g., the DMCC in the UK, Digital Markets Act in the EU), data portability is something of a red herring. The CMA’s concern seems to be that it doubts that FM services will have access to the necessary FM inputs (despite the market feedback to the contrary). Beyond that, it appears that the CMA is hinting that conventional “interoperability” will not be sufficient to enable competition between and with “portfolios” (or “ecosystems”) of FM services, although it does not tease this out of the various references in the Report to ecosystems, the impact of vertical-integration and leveraging of market power in related markets.
Vertical Integration and Partnerships. The CMA notes that there is already significant vertical integration, from the cloud layer through FMs to FM services, and a number of partnerships. Strangely, it characterises platform services, like Amazon Bedrock (offering customers access to Amazon’s own FMs and Anthropic and Stability AI FMs) as “integrated”, rather than as an illustration of the broad access being provided to FMs (which the Report itself notes is the case in its earlier discussion of FMs). The Report goes on to note the potential for a vertically integrated FM player to impose restrictions that impede competition downstream in FS services, despite noting (in the next section) the strong incentives of FM developers to have their models used widely, to facilitate ongoing training of those models.
Data Feedback Effects. The CMA takes the uncontroversial view that the greater the feedback effects, the faster FM services will be able to improve their services. However, it then blurs the distinction between improvements to FMs and downstream FM services to imply that vertically integrated FM and FM service providers will enjoy a competitive advantage, ignoring the incentives of all FM providers to have their models deployed into a broad cross section of FM services, precisely to increase the relevant feedback data. It may be that the CMA’s real concern is “cross-use” (to borrow from the DMA) of data between FMs and multiple FM services, but it does not say that.
It appears that the CMA’s preliminary view that significant data feedback effects, a lack of real consumer choice, and vertical integration and partnerships create a risk of anticompetitive effects in FM services markets that warrants further consideration.
Conclusions and Next Steps
The CMA has proposed the following “guiding principles” to ensure competitive development and deployment of FMs, noting that they could be undermined by M&A activity, “leading” players blocking innovation, restrictions on switching between and multi-homing on FM providers, ecosystems unduly restricting choice and interoperability, and tying or bundling:
The CMA is now starting a significant programme of global engagement with leading FM developers, major deployers of FMs, innovators, challengers and new entrants to discuss its proposed guiding principles and the issues flagged for further consideration.