Earlier than an organization or a developer adopts generative synthetic intelligence (GenAI), they typically marvel the best way to get enterprise worth from the mixing of AI into their enterprise. With this in thoughts, a basic query arises: Which strategy will ship one of the best worth on funding — a big all-encompassing proprietary mannequin or an open supply AI mannequin that may be molded and fine-tuned for a corporation’s wants? AI adoption methods fall inside a large spectrum, from accessing a cloud service from a big proprietary frontier mannequin like OpenAI’s GPT-4o to constructing an inner answer within the firm’s compute setting with an open supply small mannequin utilizing listed firm knowledge for a focused set of duties. Present AI options go effectively past the mannequin itself, with a complete ecosystem of retrieval programs, brokers, and different practical elements similar to AI accelerators, that are useful for each massive and small fashions. Emergence of cross-industry collaborations just like the Open Platform for Enterprise AI (OPEA) additional the promise of streamlining the entry and structuring of end-to-end open supply options.
This primary selection between the open supply ecosystem and a proprietary setting impacts numerous enterprise and technical choices, making it “the AI developer’s dilemma.” I imagine that for many enterprise and different enterprise deployments, it is sensible to initially use proprietary fashions to study AI’s potential and reduce early capital expenditure (CapEx). Nevertheless, for broad sustained deployment, in lots of circumstances corporations would use ecosystem-based open supply focused options, which permits for a cheap, adaptable technique that aligns with evolving enterprise wants and {industry} developments.
GenAI Transition from Client to Enterprise Deployment
When GenAI burst onto the scene in late 2022 with Open AI’s GPT-3 and ChatGPT 3.5, it primarily garnered client curiosity. As companies started investigating GenAI, two approaches to deploying GenAI shortly emerged in 2023 — utilizing large frontier fashions like ChatGPT vs. the newly launched small, open supply fashions initially impressed by Meta’s LLaMa mannequin. By early 2024, two primary approaches have solidified, as proven within the columns in Determine 1. With the proprietary AI strategy, the corporate depends on a big closed mannequin to supply all of the wanted expertise worth. For instance, taking GPT-4o as a proxy for the left column, AI builders would use OpenAI expertise for the mannequin, knowledge, safety, and compute. With the open supply ecosystem AI strategy, the corporate or developer could go for the right-sized open supply mannequin, utilizing company or personal knowledge, custom-made performance, and the mandatory compute and safety.
Each instructions are legitimate and have benefits and drawbacks. It isn’t an absolute partition and builders can select elements from both strategy, however taking both a proprietary or ecosystem-based open supply AI path supplies the corporate with a method with excessive inner consistency. Whereas it’s anticipated that each approaches can be broadly deployed, I imagine that after an preliminary studying and transition interval, most corporations will comply with the open supply strategy. Relying on the utilization and setting, open supply inner AI could present vital advantages, together with the flexibility to fine-tune the mannequin and drive deployment utilizing the corporate’s present infrastructure to run the mannequin on the edge, on the shopper, within the knowledge middle, or as a devoted service. With new AI fine-tuning instruments, deep experience is much less of a barrier.
Throughout all industries, AI builders are utilizing GenAI for quite a lot of functions. An October 2023 ballot by Gartner discovered that 55% of organizations reported rising funding in GenAI since early 2023, and lots of corporations are in pilot or manufacturing mode for the rising expertise. As of the time of the survey, corporations have been primarily investing in utilizing GenAI for software program growth, adopted intently by advertising and marketing and customer support features. Clearly, the vary of AI functions is rising quickly.
Massive Proprietary Fashions vs. Small and Massive Open Supply Fashions
In my weblog Survival of the Fittest: Compact Generative AI Fashions Are the Future for Value-Efficient AI at Scale, I present an in depth analysis of huge fashions vs. small fashions. In essence, following the introduction of Meta’s LLaMa open supply mannequin in February 2023, there was a virtuous cycle of innovation and fast enchancment the place the academia and broad-base ecosystem are creating extremely efficient fashions which might be 10x to 100x smaller than the massive frontier fashions. A crop of small fashions, which in 2024 have been largely lower than 30 billion parameters, may intently match the capabilities of ChatGPT-style massive fashions containing effectively over 100B parameters, particularly when focused for explicit domains. Whereas GenAI is already being deployed all through industries for a variety of enterprise usages, using compact fashions is rising.
As well as, open supply fashions are largely lagging solely six to 12 months behind the efficiency of proprietary fashions. Utilizing the broad language benchmark MMLU, the development tempo of the open supply fashions is quicker and the hole appears to be closing with proprietary fashions. For instance, OpenAI’s GPT-4o got here out this yr on Might 13 with main multimodal options whereas Microsoft’s small open supply Phi-3-vision was launched only a week in a while Might 21. In rudimentary comparisons executed on visible recognition and understanding, the fashions confirmed some related competencies, with a number of exams even favoring the Phi-3-vision mannequin. Preliminary evaluations of Meta’s Llama 3.2 open supply launch counsel that its “imaginative and prescient fashions are aggressive with main basis fashions, Claude 3 Haiku and GPT4o-mini on picture recognition and a spread of visible understanding duties.”
Massive fashions have unbelievable all-in-one versatility. Builders can select from quite a lot of massive commercially obtainable proprietary GenAI fashions, together with OpenAI’s GPT-4o multimodal mannequin. Google’s Gemini 1.5 natively multimodal mannequin is obtainable in 4 sizes: Nano for cell gadget app growth, Flash small mannequin for particular duties, Professional for a variety of duties, and Extremely for extremely advanced duties. And Anthropic’s Claude 3 Opus, rumored to have roughly 2 trillion parameters, has a 200K token context window, permitting customers to add massive quantities of data. There’s additionally one other class of out-of-the-box massive GenAI fashions that companies can use for worker productiveness and artistic growth. Microsoft 365 Copilot integrates the Microsoft 365 Apps suite, Microsoft Graph (content material and context from emails, information, conferences, chats, calendars, and contacts), and GPT-4.
Most massive and small open supply fashions are sometimes extra clear about utility frameworks, instrument ecosystem, coaching knowledge, and analysis platforms. Mannequin structure, hyperparameters, response high quality, enter modalities, context window measurement, and inference value are partially or absolutely disclosed. These fashions typically present info on the dataset in order that builders can decide if it meets copyright or high quality expectations. This transparency permits builders to simply interchange fashions for future variations. Among the many rising variety of small commercially obtainable open supply fashions, Meta’s Llama 3 and three.1 are primarily based on transformer structure and obtainable in 8B, 70B, and 405B parameters. Llama 3.2 multimodal mannequin has 11B and 90B, with smaller variations at 1B and 3B parameters. Inbuilt collaboration with NVIDIA, Mistral AI’s Mistral NeMo is a 12B mannequin that options a big 128k context window whereas Microsoft’s Phi-3 (3.8B, 7B, and 14B) presents Transformer fashions for reasoning and language understanding duties. Microsoft highlights Phi fashions for instance of “the stunning energy of small language fashions” whereas investing closely in OpenAI’s very massive fashions. Microsoft’s numerous curiosity in GenAI signifies that it’s not a one-size-fits-all market.
Mannequin-Integrated Information (with RAG) vs. Retrieval-Centric Technology (RCG)
The following key query that AI builders want to deal with is the place to seek out the information used throughout inference — inside the mannequin parametric reminiscence or outdoors the mannequin (accessible by retrieval). It may be exhausting to imagine, however the first ChatGPT launched in November 2022 didn’t have any entry to knowledge outdoors the mannequin. It was skilled on September 21, 2022 and notoriously had no inclination of occasions and knowledge previous its coaching date. This main oversight was addressed in 2023 when retrieval plug-ins the place added. At present, most fashions are coupled with a retrieval front-end with exceptions in circumstances the place there is no such thing as a expectation of accessing massive or constantly updating info, similar to devoted programming fashions.
Present fashions have made vital progress on this problem by enhancing the answer platforms with a retrieval-augmented technology (RAG) front-end to permit for extracting info exterior to the mannequin. An environment friendly and safe RAG is a requirement in enterprise GenAI deployment, as proven by Microsoft’s introduction of GPT-RAG in late 2023. Moreover, within the weblog Information Retrieval Takes Heart Stage, I cowl how within the transition from client to enterprise deployment for GenAI, options ought to be constructed primarily round info exterior to the mannequin utilizing retrieval-centric technology (RCG).
RCG fashions might be outlined as a particular case of RAG GenAI options designed for programs the place the overwhelming majority of knowledge resides outdoors the mannequin parametric reminiscence and is generally not seen in pre-training or fine-tuning. With RCG, the first position of the GenAI mannequin is to interpret wealthy retrieved info from an organization’s listed knowledge corpus or different curated content material. Somewhat than memorizing knowledge, the mannequin focuses on fine-tuning for focused constructs, relationships, and performance. The standard of knowledge in generated output is predicted to strategy 100% accuracy and timeliness.
OPEA is a cross-ecosystem effort to ease the adoption and tuning of GenAI programs. Utilizing this composable framework, builders can create and consider “open, multi-provider, sturdy, and composable GenAI options that harness one of the best innovation throughout the ecosystem.” OPEA is predicted to simplify the implementation of enterprise-grade composite GenAI options, together with RAG, brokers, and reminiscence programs.
All-in-One Common Goal vs. Focused Personalized Fashions
Fashions like GPT-4o, Claude 3, and Gemini 1.5 are basic goal all-in-one basis fashions. They’re designed to carry out a broad vary of GenAI from coding to talk to summarization. The newest fashions have quickly expanded to carry out imaginative and prescient/picture duties, altering their perform from simply massive language fashions to massive multimodal fashions or imaginative and prescient language fashions (VLMs). Open supply basis fashions are headed in the identical course as built-in multimodalities.
Nevertheless, somewhat than adopting the primary wave of consumer-oriented GenAI fashions on this general-purpose kind, most companies are electing to make use of some type of specialization. When a healthcare firm deploys GenAI expertise, they might not use one basic mannequin for managing the availability chain, coding within the IT division, and deep medical analytics for managing affected person care. Companies deploy extra specialised variations of the expertise for every use case. There are a number of completely different ways in which corporations can construct specialised GenAI options, together with domain-specific fashions, focused fashions, custom-made fashions, and optimized fashions.
Area-specific fashions are specialised for a specific subject of enterprise or an space of curiosity. There are each proprietary and open supply domain-specific fashions. For instance, BloombergGPT, a 50B parameter proprietary massive language mannequin specialised for finance, beats the bigger GPT-3 175B parameter mannequin on numerous monetary benchmarks. Nevertheless, small open supply domain-specific fashions can present a wonderful different, as demonstrated by FinGPT, which supplies accessible and clear assets to develop FinLLMs. FinGPT 3.3 makes use of Llama 2 13B as a base mannequin focused for the monetary sector. In latest benchmarks, FinGPT surpassed BloombergGPT on quite a lot of duties and beat GPT-4 handily on monetary benchmark duties like FPB, FiQA-SA, and TFNS. To grasp the large potential of this small open supply mannequin, it ought to be famous that FinGPT might be fine-tuned to include new knowledge for lower than $300 per fine-tuning.
Focused fashions concentrate on a household of duties or features, similar to separate focused fashions for coding, picture technology, query answering, or sentiment evaluation. A latest instance of a focused mannequin is SetFit from Intel Labs, Hugging Face, and the UKP Lab. This few-shot textual content classification strategy for fine-tuning Sentence Transformers is quicker at inference and coaching, reaching excessive accuracy with a small variety of labeled coaching knowledge, similar to solely eight labeled examples per class on the Buyer Critiques (CR) sentiment dataset. This small 355M parameter mannequin can finest the GPT-3 175B parameter mannequin on the varied RAFT benchmark.
It’s vital to notice that focused fashions are unbiased from domain-specific fashions. For instance, a sentiment evaluation answer like SetFitABSA has focused performance and might be utilized to numerous domains like industrial, leisure, or hospitality. Nevertheless, fashions which might be each focused and area specialised might be more practical.
Personalized fashions are additional fine-tuned and refined to satisfy explicit wants and preferences of corporations, organizations, or people. By indexing explicit content material for retrieval, the ensuing system turns into extremely particular and efficient on duties associated to this knowledge (personal or public). The open supply subject presents an array of choices to customise the mannequin. For instance, Intel Labs used direct choice optimization (DPO) to enhance on a Mistral 7B mannequin to create the open supply Intel NeuralChat. Builders can also fine-tune and customise fashions through the use of low-rank adaptation of huge language (LoRA) fashions and its extra memory-efficient model, QLoRA.
Optimization capabilities can be found for open supply fashions. The target of optimization is to retain the performance and accuracy of a mannequin whereas considerably lowering its execution footprint, which might considerably enhance value, latency, and optimum execution of an supposed platform. Some methods used for mannequin optimization embody distillation, pruning, compression, and quantization (to 8-bit and even 4-bit). Some strategies like combination of specialists (MoE) and speculative decoding might be thought-about as types of execution optimization. For instance, GPT-4 is reportedly comprised of eight smaller MoE fashions with 220B parameters. The execution solely prompts components of the mannequin, permitting for way more economical inference.
Generative-as-a-Service Cloud Execution vs. Managed Execution Setting for Inference
One other key selection for builders to contemplate is the execution setting. If the corporate chooses a proprietary mannequin course, inference execution is completed by way of API or question calls to an abstracted and obscured picture of the mannequin working within the cloud. The dimensions of the mannequin and different implementation particulars are insignificant, besides when translated to availability and the fee charged by some key (per token, per question, or limitless compute license). This strategy, typically known as a generative-as-a-service (GaaS) cloud providing, is the precept approach for corporations to eat very massive proprietary fashions like GPT-4o, Gemini Extremely, and Claude 3. Nevertheless, GaaS will also be provided for smaller fashions like Llama 3.2.
There are clear constructive features to utilizing GaaS for the outsourced intelligence strategy. For instance, the entry is normally instantaneous and simple to make use of out-of-the-box, assuaging in-house growth efforts. There’s additionally the implied promise that when the fashions or their setting get upgraded, the AI answer builders have entry to the newest updates with out substantial effort or modifications to their setup. Additionally, the prices are nearly fully operational expenditures (OpEx), which is most well-liked if the workload is preliminary or restricted. For early-stage adoption and intermittent use, GaaS presents extra assist.
In distinction, when corporations select an inner intelligence strategy, the mannequin inference cycle is included and managed inside the compute setting and the prevailing enterprise software program setting. This can be a viable answer for comparatively small fashions (roughly 30B parameters or much less in 2024) and doubtlessly even medium fashions (50B to 70B parameters in 2024) on a shopper gadget, community, on-prem knowledge middle, or on-cloud cycles in an setting set with a service supplier similar to a digital personal cloud (VPC).
Fashions like Llama 3.1 8B or related can run on the developer’s native machine (Mac or PC). Utilizing optimization methods like quantization, the wanted person expertise might be achieved whereas working inside the native setting. Utilizing a instrument and framework like Ollama, builders can handle inference execution regionally. Inference cycles might be run on legacy GPUs, Intel Xeon, or Intel Gaudi AI accelerators within the firm’s knowledge middle. If inference is run on the mannequin at a service supplier, it is going to be billed as infrastructure-as-a-service (IaaS), utilizing the corporate’s personal setting and execution selections.
When inference execution is completed within the firm compute setting (shopper, edge, on-prem, or IaaS), there’s a greater requirement for CapEx for possession of the pc gear if it goes past including a workload to current {hardware}. Whereas the comparability of OpEx vs. CapEx is advanced and is dependent upon many variables, CapEx is preferable when deployment requires broad, steady, steady utilization. That is very true as smaller fashions and optimization applied sciences permit for working superior open supply fashions on mainstream gadgets and processors and even native notebooks/desktops.
Operating inference within the firm compute setting permits for tighter management over features of safety and privateness. Decreasing knowledge motion and publicity might be invaluable in preserving privateness. Moreover, a retrieval-based AI answer run in a neighborhood setting might be supported with tremendous controls to deal with potential privateness considerations by giving user-controlled entry to info. Safety is ceaselessly talked about as one of many high considerations of corporations deploying GenAI and confidential computing is a major ask. Confidential computing protects knowledge in use by computing in an attested hardware-based Trusted Execution Setting (TEE).
Smaller, open supply fashions can run inside an organization’s most safe utility setting. For instance, a mannequin working on Xeon might be absolutely executed inside a TEE with restricted overhead. As proven in Determine 8, encrypted knowledge stays protected whereas not in compute. The mannequin is checked for provenance and integrity to guard in opposition to tampering. The precise execution is protected against any breach, together with by the working system or different functions, stopping viewing or alteration by untrusted entities.
Abstract
Generative AI is a transformative expertise now below analysis or energetic adoption by most corporations throughout all industries and sectors. As AI builders think about their choices for one of the best answer, some of the vital questions they should tackle is whether or not to make use of exterior proprietary fashions or depend on the open supply ecosystem. One path is to depend on a big proprietary black-box GaaS answer utilizing RAG, similar to GPT-4o or Gemini Extremely. The opposite path makes use of a extra adaptive and integrative strategy — small, chosen, and exchanged as wanted from a big open supply mannequin pool, primarily using firm info, custom-made and optimized primarily based on explicit wants, and executed inside the current infrastructure of the corporate. As talked about, there may very well be a mixture of selections inside these two base methods.
I imagine that as quite a few AI answer builders face this important dilemma, most will finally (after a studying interval) select to embed open supply GenAI fashions of their inner compute setting, knowledge, and enterprise setting. They’ll experience the unbelievable development of the open supply and broad ecosystem virtuous cycle of AI innovation, whereas sustaining management over their prices and future.
Let’s give AI the ultimate phrase in fixing the AI developer’s dilemma. In a staged AI debate, OpenAI’s GPT-4 argued with Microsoft’s open supply Orca 2 13B on the deserves of utilizing proprietary vs. open supply GenAI for future growth. Utilizing GPT-4 Turbo because the choose, open supply GenAI gained the controversy. The successful argument? Orca 2 referred to as for a “extra distributed, open, collaborative way forward for AI growth that leverages worldwide expertise and goals for collective developments. This mannequin guarantees to speed up innovation and democratize entry to AI, and guarantee moral and clear practices by way of neighborhood governance.”
Study Extra: GenAI Collection
Have Machines Simply Made an Evolutionary Leap to Converse in Human Language?
References
- Hey GPT-4o. (2024, Might 13). https://openai.com/index/hello-gpt-4o/
- Open platform for enterprise AI. (n.d.). Open Platform for Enterprise AI (OPEA). https://opea.dev/
- Gartner Ballot Finds 55% of Organizations are in Piloting or Manufacturing. (2023, October 3). Gartner. https://www.gartner.com/en/newsroom/press-releases/2023-10-03-gartner-poll-finds-55-percent-of-organizations-are-in-piloting-or-production-mode-with-generative-ai
- Singer, G. (2023, July 28). Survival of the fittest: Compact generative AI fashions are the long run for Value-Efficient AI at scale. Medium. https://towardsdatascience.com/survival-of-the-fittest-compact-generative-ai-models-are-the-future-for-cost-effective-ai-at-scale-6bbdc138f618
- Introducing LLaMA: A foundational, 65-billion-parameter language mannequin. (n.d.). https://ai.meta.com/weblog/large-language-model-llama-meta-ai/
- #392: OpenAI’s improved ChatGPT ought to delight each skilled and novice builders, & extra — ARK Make investments. (n.d.). Ark Make investments. https://ark-invest.com/newsletter_item/1-openais-improved-chatgpt-should-delight-both-expert-and-novice-developers
- Bilenko, M. (2024, Might 22). New fashions added to the Phi-3 household, obtainable on Microsoft Azure. Microsoft Azure Weblog. https://azure.microsoft.com/en-us/weblog/new-models-added-to-the-phi-3-family-available-on-microsoft-azure/
- Matthew Berman. (2024, June 2). Open-Supply Imaginative and prescient AI — Shocking Outcomes! (Phi3 Imaginative and prescient vs LLaMA 3 Imaginative and prescient vs GPT4o) [Video]. YouTube. https://www.youtube.com/watch?v=PZaNL6igONU
- Llama 3.2: Revolutionizing edge AI and imaginative and prescient with open, customizable fashions. (n.d.). https://ai.meta.com/weblog/llama-3-2-connect-2024-vision-edge-mobile-devices/
- Gemini — Google DeepMind. (n.d.). https://deepmind.google/applied sciences/gemini/#introduction
- Introducing the following technology of Claude Anthropic. (n.d.). https://www.anthropic.com/information/claude-3-family
- Thompson, A. D. (2024, March 4). The Memo — Particular version: Claude 3 Opus. The Memo by LifeArchitect.ai. https://lifearchitect.substack.com/p/the-memo-special-edition-claude-3
- Spataro, J. (2023, Might 16). Introducing Microsoft 365 Copilot — your copilot for work — The Official Microsoft Weblog. The Official Microsoft Weblog. https://blogs.microsoft.com/weblog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/
- Introducing Llama 3.1: Our most succesful fashions thus far. (n.d.). https://ai.meta.com/weblog/meta-llama-3-1/
- Mistral AI. (2024, March 4). Mistral Nemo. Mistral AI | Frontier AI in Your Palms. https://mistral.ai/information/mistral-nemo/
- Beatty, S. (2024, April 29). Tiny however mighty: The Phi-3 small language fashions with large potential. Microsoft Analysis. https://information.microsoft.com/supply/options/ai/the-phi-3-small-language-models-with-big-potential/
- Hughes, A. (2023, December 16). Phi-2: The stunning energy of small language fashions. Microsoft Analysis. https://www.microsoft.com/en-us/analysis/weblog/phi-2-the-surprising-power-of-small-language-models/
- Azure. (n.d.). GitHub — Azure/GPT-RAG. GitHub. https://github.com/Azure/GPT-RAG/
- Singer, G. (2023, November 16). Information Retrieval Takes Heart Stage — In direction of Information Science. Medium. https://towardsdatascience.com/knowledge-retrieval-takes-center-stage-183be733c6e8
- Introducing the open platform for enterprise AI. (n.d.). Intel. https://www.intel.com/content material/www/us/en/developer/articles/information/introducing-the-open-platform-for-enterprise-ai.html
- Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., & Mann, G. (2023, March 30). BloombergGPT: A big language mannequin for finance. arXiv.org. https://arxiv.org/abs/2303.17564
- Yang, H., Liu, X., & Wang, C. D. (2023, June 9). FINGPT: Open-Supply Monetary Massive Language Fashions. arXiv.org. https://arxiv.org/abs/2306.06031
- AI4Finance-Basis. (n.d.). FinGPT. GitHub. https://github.com/AI4Finance-Basis/FinGPT
- Starcoder2. (n.d.). GitHub. https://huggingface.co/docs/transformers/v4.39.0/en/model_doc/starcoder2
- SetFit: Environment friendly Few-Shot Studying With out Prompts. (n.d.). https://huggingface.co/weblog/setfit
- SetFitABSA: Few-Shot Side Primarily based Sentiment Evaluation Utilizing SetFit. (n.d.). https://huggingface.co/weblog/setfit-absa
- Intel/neural-chat-7b-v3–1. Hugging Face. (2023, October 12). https://huggingface.co/Intel/neural-chat-7b-v3-1
- Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021, June 17). LORA: Low-Rank adaptation of Massive Language Fashions. arXiv.org. https://arxiv.org/abs/2106.09685
- Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023, Might 23). QLORA: Environment friendly Finetuning of Quantized LLMS. arXiv.org. https://arxiv.org/abs/2305.14314
- Leviathan, Y., Kalman, M., & Matias, Y. (2022, November 30). Quick Inference from Transformers through Speculative Decoding. arXiv.org. https://arxiv.org/abs/2211.17192
- Bastian, M. (2023, July 3). GPT-4 has greater than a trillion parameters — Report. THE DECODER. https://the-decoder.com/gpt-4-has-a-trillion-parameters/
- Andriole, S. (2023, September 12). LLAMA, ChatGPT, Bard, Co-Pilot & all the remainder. How massive language fashions will turn into enormous cloud companies with large ecosystems. Forbes. https://www.forbes.com/websites/steveandriole/2023/07/26/llama-chatgpt-bard-co-pilot–all-the-rest–how-large-language-models-will-become-huge-cloud-services-with-massive-ecosystems/?sh=78764e1175b7
- Q8-Chat LLM: An environment friendly generative AI expertise on Intel® CPUs. (n.d.). Intel. https://www.intel.com/content material/www/us/en/developer/articles/case-study/q8-chat-efficient-generative-ai-experience-xeon.html#gs.36q4lk
- Ollama. (n.d.). Ollama. https://ollama.com/
- AI Accelerated Intel® Xeon® Scalable Processors Product Temporary. (n.d.). Intel. https://www.intel.com/content material/www/us/en/merchandise/docs/processors/xeon-accelerated/ai-accelerators-product-brief.html
- Intel® Gaudi® AI Accelerator merchandise. (n.d.). Intel. https://www.intel.com/content material/www/us/en/merchandise/particulars/processors/ai-accelerators/gaudi-overview.html
- Confidential Computing Options — Intel. (n.d.). Intel. https://www.intel.com/content material/www/us/en/safety/confidential-computing.html
- What’s a Trusted Execution Setting? (n.d.). Intel. https://www.intel.com/content material/www/us/en/content-details/788130/what-is-a-trusted-execution-environment.html
- Adeojo, J. (2023, December 3). GPT-4 Debates Open Orca-2–13B with Shocking Outcomes! Medium. https://pub.aimind.so/gpt-4-debates-open-orca-2-13b-with-surprising-results-b4ada53845ba
- Information Centric. (2023, November 30). Shocking Debate Showdown: GPT-4 Turbo vs. Orca-2–13B — Programmed with AutoGen! [Video]. YouTube. https://www.youtube.com/watch?v=JuwJLeVlB-w