Europe is scrambling to be relevant in the Age of Artificial Intelligence
Sliding-Scale Openness in Artificial Intelligence: Models that are Open or Open Source are just ‘Open Weight’
The researchers found that many models that claim to be open or open source — including Llama from Meta and Google DeepMind’s Gemma — are, in fact, just ‘open weight’. This means that outside researchers can access and use the trained models, but cannot inspect or customize them. They can’t understand how they were fine-tuned for specific tasks using human feedback. “You don’t give a lot away … then you get to claim openness credits,” says Dingemanse.
It is not yet clear how many of these models will fit the EU’s definition of open source. This refers to models released with a “free and open” licence that allows users to modify a model, but says no about access to training data. A single pressure point will be targeted by corporate lobbies and big companies when refining this definition.
Dingemanse says the small players, with relatively few resources, went the extra mile to create a league table that identifies the most and least open models. The findings were published in the conference proceedings of the conference.
An adviser on Artificial Intelligence accountability to a non-profit organization based in Mountain View, California, says that the study cut through a lot of the hype and fluff around the open-sourced debate.
This sliding-scale approach to analysing openness is a useful and practical one, says Amanda Brock, chief executive officer of OpenUK, a London-based not-for-profit company that focuses on open technology.
The authors are worried that the models are not open about the data they training on. Around half of the models that they analysed do not provide any details about data sets beyond generic descriptors, they say.
Scientific papers detailing the models are very rare, the pair found. Peer review is almost completely fallen out of fashion, as it is replaced by cherry-picked examples, or corporate preprints that are low on detail. A lot of companies have websites that look very technical and may release a nice paper on their website. But if you pore over it, there is no specification whatsoever of what data went into that system”, says Dingemanse.
And openness matters for science, says Dingemanse, because it is essential for reproducibility. “If you can’t reproduce it, it’s a hard sell to call it science,” he says. The only way for researchers to innovate is by tinkering with models, and to do this they need enough information to build their own versions. Not only that, but models must be open to scrutiny. If we can’t see how the sausage is made, then we don’t know whether we’ll like it. For example, it might not be an achievement for a model to pass a particular exam if it was trained on many examples of the test. No one knows whether the data is copyrighted or not without data accountability.
The couple hope that by helping fellow scientists to avoid falling into the same traps they did, they will not have to look for models to use in teaching and research.
The European tech landscape is not so different from the US: How European tech companies operate their products and their influence on the Internet and mobile phone networks
The founder and CEO of a Europe’s largest independent artificial intelligence lab worries that the social nuances of Europe will start to disappear as a result of the use of Artificial Intelligence. As chatbots and large language models mostly derived from North American data become ubiquitous, the understanding of what normal conversation looks like “converges toward one,” he says.
European preoccupations with the power of American tech aren’t new. The products of big US companies have become embedded into Europe’s social and economic infrastructure, as generation after generation of technology has dominated. The businesses in Europe use Microsoft Office and Amazon Web Services, as well as Apple and the app stores. European politics happens on WhatsApp, and its news media happens on Facebook, Instagram, and Twitter. Even the French watch Netflix. And US tech companies operate on a different scale. Only two of the 10 most valuable public European corporations are in tech: German business software provider SAP and Dutch semiconductor equipment maker ASML. The US tech companies are some of the most valuable in the world. Microsoft and NVIDIA are both worth more than 15 times the company’s annual revenue.
Many in Europe think of “AI sovereignty” as making sure that the core “digital infrastructure” behind the machine learning and artificial intelligence revolution isn’t controlled by private companies outside of the continent. Europe is spending a lot of money to try and catch up with the US and create domestic champions. Europe’s competitors are starting from a very long way behind. The continent lags a long way behind the US and China in the availability of capital and computing power. It lacks some of the big tech companies that are vital for linking artificial intelligence products to users.
“What is sovereignty when you don’t have any champions?” says Raluca Csernatoni, a research fellow specializing in emerging technologies at Carnegie Europe, a think tank.