For the reason that launch of ChatGPT on the finish of November 2022, LLMs (Giant Language Fashions) have, virtually, turn out to be a family identify.
There’s good purpose for this; their success lies of their structure, significantly the consideration mechanism. It permits the mannequin to check each phrase they course of to each different phrase.
This offers LLMs the extraordinary capabilities in understanding and producing human-like textual content that we’re all aware of.
Nonetheless, these fashions aren’t with out flaws. They demand immense computational sources to coach. For instance, Meta’s Llama 3 mannequin took 7.7 million GPU hours of coaching[1]. Furthermore, their reliance on monumental datasets — spanning trillions of tokens — raises questions on scalability, accessibility, and environmental influence.
Regardless of these challenges, ever because the paper ‘Consideration is all you want’ in mid 2017, a lot of the latest progress in AI has centered on scaling consideration mechanisms additional, fairly than exploring basically new architectures.