If the king of Sweden wants help drafting his annual Christmas speech this year, he can ask for the same AI model available to his 10 million nationals.
As a test, researchers asked the model, called GPT-SW3, to draft one of the royal messages, and it did quite well, according to Magnus Sahlgren, head of natural language understanding research at AI Sweden, a consortium that kickstarts the country’s journey into the age of machine learning.
“Later, our minister of digitization visited us and asked the model to generate arguments for political positions and it came up with some really smart ones — and he intuitively understood how to get the model to generate good text,” Sahlgren said.
Early successes inspired work on an even bigger and more powerful version of the language model that they hope will serve every citizen, business or government agency in Scandinavia.
A multilingual model
The current version contains 3.6 billion parameters and is smart enough to do some cool things in Swedish. Sahlgren’s team aims to train a state-of-the-art model with no less than 175 billion parameters that can handle a variety of language tasks in the Scandinavian languages of Swedish, Danish, Norwegian and, he hopes, Icelandic as well.
For example, a startup can use it to automatically generate product descriptions for an e-commerce website using only the names of the products. Government agencies can use it to quickly classify and route questions from citizens.
Companies can ask to quickly summarize reports so that they can respond quickly. Hospitals can run distilled versions of the model privately on their own systems to improve patient care.
“It’s a foundational model that we will provide as a service for all the tasks that people want to solve,” said Sahlgren, who has completed his Ph.D. in computational linguistics in 2006.
Permission to speak freely
It’s a skill that is increasingly seen as a strategic asset, a cornerstone of digital sovereignty in a world that speaks thousands of languages in nearly 200 countries.
Most language services today focus on Chinese or English, the two most widely spoken languages in the world. They are usually made in China or the USA, and they are not free.
“It’s important for us to have models built in Sweden for Sweden,” Sahlgren said.
Small team, super system
“We are a small country and a core team of about six people, yet we can build a state-of-the-art tool like this that people can use,” he added.
That’s because Sweden has a powerful engine in BerzeLiUs, a 300-petaflops AI supercomputer at Linköping University. It trained the first GPT-SW3 model with only 16 of the 60 nodes in the NVIDIA DGX SuperPOD†
The following model can exercise all nodes of the system. Such super-sized tasks require super software like the NVIDIA NeMo Megatron Framework†
“It allows us to scale our training to the full supercomputer, and we’ve been lucky enough to have access to experts on the NeMo development team — without NVIDIA it would have been so much more complicated to get this far,” he said. .
A workflow for every language
NVIDIA engineers have created a recipe based on NeMo and an emerging process called p-tuning that quickly optimizes massive models, and is tailored to work with any language.
In an early test, a model nearly doubled its accuracy after NVIDIA engineers applied the techniques.
In addition, it requires a tenth of the data, reducing the need for tens of thousands of hand-labeled records. That opens the door for users to fine-tune a model with the relatively small, industry-specific data sets they have at hand.
“We hope to inspire many industry entrepreneurship, startups and the public by using our technology to develop their own apps and services,” said Sahlgren.
Writing the next chapter
Meanwhile, NVIDIA developers are already working on ways to improve the support software.
One test bodes well for training new capabilities using widely available English datasets in models designed for each language. In another attempt, they use the p-tuning techniques in inference tasks so that models can learn on the fly.
Zenodia Charpy, senior solution architect at NVIDIA in Gothenburg, shares the enthusiasm of the AI Sweden team she supports. “We’ve only just started trying out new and better methods of addressing these major language challenges — there’s a lot more to come,” she said.
The GPT-SW3 model will be made available through an early access program by the end of the year. Please contact [email protected] to apply.