Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Thomas Wolf, co -funder and none Scientist on the face hug, Think we may need new ways to measure AI models.
Wolf told the audience at Brainstorm and In London, as AI models become more advanced, it is becoming increasingly difficult to know which is best.
“It’s hard to know what is the best model,” he said, noting the nominal differences between Openai’s recent launches and Google. “Everyone seems to be, in fact, very close.”
“The world of reference points has evolved a lot. We used to have this academic point of reference in which we mostly measured the knowledge of the model; I think the most famous was MMLU (massive understanding of the multitasque language), which was basically a set of questions at the graduate or doctoral level that the model had to respond.” “These reference points are mainly saturated right now.”
During the last year, there has been a growing heart of voices in the academic field, industry and politics that claim that the common reference points of the IA, such as MMLU, GLUE and HELLASWAG, have come to saturation, can be affected and no longer reflect the real utility.
In a study published in February, researchers at the European Commission’s Mixed Research Center published an article called “Can we trust AI’s reference points? An interdisciplinary review of current problems in AI evaluation” This found “systemic defects in current reference practices”, including desalinated incentives, construction validity failure, results game and data story.
From now on, Wolf said that the IA industry should rely on two main types of reference points in 2025: one to evaluate the models agency, where the LLM is expected to do tasks and the other adapted to each use case for models.
Hugging your face is already working on the latter.
The company’s new program, “Your Bank”, aims to help users determine which model to use for a specific task. Users feed some documents in the program, which automatically generate a specific point of reference for the type of work that users can apply to different models to see which one is best for the use case.
“Just because these models work the same at this academic reference point does not mean that they are exactly the same,” said Wolf.
Founded by Wolf, Clément Delangue and Julien Chaumond in 2016, Hugging Face has long been an open source IA champion.
It is often known as Machine Learning’s Github, the company provides an open source platform that allows developers, researchers and companies to create, share and deploy models, data sets and machine applications. Users can also browse models and data sets that others have uploaded.
Wolfe told The Brainstorm Ai Ai Audience that the “Face’s hug business model is really aligned with the open source” and that the company’s goal is “the goal of having the maximum number of people participating in these types of open communities and sharing models.”
Wolfe predicted that the Font Oberta AI would continue to prosper, especially after Deepseek’s success earlier this year.
Following its launch at the end of last year, Deepseek R1’s AI model sent the crash waves through the AI world when the testers found that it coincided or even exceeded the North -American Fountain Ai models.
Wolf said Deepseek was a “Chatgpt moment” for Open source IA.
“Like Chatgpt, it was the time everyone discovered Ai, Deepseek was the moment everyone discovered that there was a kind of open society,” he said.
This story originally presented to Fortune.com