Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

From newbie to champion: A scholar’s journey by means of the AWS AI League ASEAN finals

admin by admin
January 19, 2026
in Artificial Intelligence
0
From newbie to champion: A scholar’s journey by means of the AWS AI League ASEAN finals
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


The AWS AI League, launched by Amazon Net Companies (AWS), expanded its attain to the Affiliation of Southeast Asian Nations (ASEAN) final yr, welcoming scholar individuals from Singapore, Indonesia, Malaysia, Thailand, Vietnam, and the Philippines. The aim was to introduce college students of all backgrounds and expertise ranges to the thrilling world of generative AI by means of a gamified, hands-on problem centered on fine-tuning massive language fashions (LLMs).

On this weblog submit, you’ll hear straight from the AWS AI League champion, Blix D. Foryasen, as he shares his reflection on the challenges, breakthroughs, and key classes found all through the competitors.

Behind the competitors

The AWS AI League competitors started with a tutorial session led by the AWS staff and the Gen-C Generative AI Studying Neighborhood, that includes two highly effective user-friendly providers: Amazon SageMaker JumpStart and PartyRock.

  • SageMaker JumpStart enabled individuals to run the LLM fine-tuning course of in a cloud-based atmosphere, providing flexibility to regulate hyperparameters and optimize efficiency.
  • PartyRock, powered by Amazon Bedrock, supplied an intuitive playground and interface to curate the dataset utilized in fine-tuning a Llama 3.2 3B Instruct mannequin. Amazon Bedrock presents a complete number of high-performing basis fashions from main AI corporations, together with Anthropic Claude, Meta Llama, Mistral, and extra; all accessible by means of a single API.

With the aim of outperforming a bigger LLM reference mannequin in a quiz-based analysis, individuals engaged with three core domains of generative AI: Basis fashions, accountable AI, and immediate engineering. The preliminary spherical featured an open leaderboard rating the best-performing fine-tuned fashions from throughout the area. Every submitted mannequin was examined in opposition to a bigger baseline LLM utilizing an automatic, quiz-style analysis of generative AI-related questions. The analysis, carried out by an undisclosed LLM choose, prioritized each accuracy and comprehensiveness. A mannequin’s win charge improved every time it outperformed the baseline LLM. The problem required strategic planning past its technical nature. Individuals needed to maximize their restricted coaching hours on SageMaker JumpStart whereas rigorously managing a restricted variety of leaderboard submissions. Initially capped at 5 hours, the restrict was later expanded to 30 hours in response to group suggestions. Submission rely would additionally affect tiebreakers for finalist choice.

The highest tuner from every nation superior to the Regional Grand Finale, held on Could 29, 2025, in Singapore. There, finalists competed head-to-head, every presenting their fine-tuned mannequin’s responses to a brand new set of questions. Closing scores have been decided by a weighted judging system:

  • 40% by an LLM-as-a-judge,
  • 40% by consultants
  • 20% by a stay viewers.

A realistic method to fine-tuning

Earlier than diving into the technical particulars, a fast disclaimer: the approaches shared within the following sections are largely experimental and born from trial and error. They’re not essentially essentially the most optimum strategies for fine-tuning, nor do they symbolize a definitive information. Different finalists had completely different approaches due to completely different technical backgrounds. What finally helped me succeed wasn’t simply technical precision, however collaboration, resourcefulness, and a willingness to discover how the competitors would possibly unfold primarily based on insights from earlier iterations. I hope this account can function a baseline or inspiration for future individuals who could be navigating related constraints. Even if you happen to’re ranging from scratch, as I did, there’s actual worth in being strategic, curious, and community-driven. One of many greatest hurdles I confronted was time, or the shortage of it. Due to a late affirmation of my participation, I joined the competitors 2 weeks after it had already begun. That left me with solely 2 weeks to plan, practice, and iterate. Given the tight timeline and restricted compute hours on SageMaker JumpStart, I knew I needed to make each coaching session rely. Moderately than trying exhaustive experiments, I centered my efforts on curating a robust dataset and tweaking choose hyperparameters. Alongside the way in which, I drew inspiration from educational papers and current approaches in LLM fine-tuning, adjusting what I might inside the constraints.

Crafting artificial brilliance

As talked about earlier, one of many key studying classes firstly of the competitors launched individuals to SageMaker JumpStart and PartyRock, instruments that make fine-tuning and artificial knowledge technology each accessible and intuitive. Specifically, PartyRock allowed us to clone and customise apps to manage how artificial datasets have been generated. We might tweak parameters such because the immediate construction, creativity stage (temperature), and token sampling technique (top-p). PartyRock additionally gave us entry to a variety of basis fashions. From the beginning, I opted to generate my datasets utilizing Claude 3.5 Sonnet, aiming for broad and balanced protection throughout all three core sub-domains of the competitors. To attenuate bias and implement honest illustration throughout subjects, I curated a number of dataset variations, every starting from 1,500 to 12,000 Q&A pairs, rigorously sustaining balanced distributions throughout sub-domains. The next are a number of instance themes that I centered on:

  • Immediate engineering: Zero-shot prompting, chain-of-thought (CoT) prompting, evaluating immediate effectiveness
  • Basis fashions: Transformer architectures, distinctions between pretraining and fine-tuning
  • Accountable AI: Dataset bias, illustration equity, and knowledge safety in AI techniques

To keep up knowledge high quality, I fine-tuned the dataset generator to emphasise factual accuracy, uniqueness, and utilized information. Every technology batch consisted of 10 Q&A pairs, with prompts particularly designed to encourage depth and readability

Query immediate:

You're a quiz grasp in an AI competitors making ready a set of difficult quiz bee questions on [Topic to generate] The aim of those questions is to find out the higher LLM between a fine-tuned LLaMA 3.2 3B Instruct and bigger LLMs. Generate [Number of data rows to generate] questions on [Topic to generate], masking: 
	* Primary Questions (1/3) → Direct Q&A with out reasoning. Should require a transparent clarification, instance, or real-world utility. Keep away from one-word fact-based questions.
	* Hybrid Questions (1/3) → Requires a brief analytical breakdown (e.g., comparisons, trade-offs, weaknesses, implications). Prioritize scenario-based or real-world dilemma questions.
	* Chain-of-thought (CoT) Questions (1/3) → Requires multi-step logical deductions. Give attention to evaluating current AI strategies, figuring out dangers, and critiquing trade-offs. Keep away from open-ended "Design/Suggest/Create" questions. As an alternative, use "Examine, Consider, Critique, Assess, Analyze, What are the trade-offs of…" 

Make sure the questions on [Topic to generate]: 
	* Are particular, non-trivial, and informative.
	* Keep away from overly easy questions (e.g., mere definitions or fact-based queries).
	* Encourage utilized reasoning (i.e., linking theoretical ideas to real-world AI challenges).

Reply immediate:

You might be an AI knowledgeable specializing in generative AI, basis fashions, agentic AI, immediate engineering, and accountable AI. Your activity is to generate well-structured, logically reasoned responses to an inventory of [Questions], guaranteeing that each one responses comply with a chain-of-thought (CoT) method, no matter complexity, and formatted in legitimate JSONL. Listed below are the answering pointers: 
	* Each response should be complete, factually correct, and well-reasoned.
 	* Each response should use a step-by-step logical breakdown, even for seemingly direct questions.
For all questions, use structured reasoning:
	* For fundamental Questions, use a concise but structured clarification. Easy Q&As ought to nonetheless comply with CoT reasoning, explaining why the reply is appropriate reasonably than simply stating information.
	* For hybrid and CoT questions, use Chain of Thought and analyze the issue logically earlier than offering a concluding assertion.
	* If relevant, use real-world examples or analysis references to boost explanations.
	* If relevant, embrace trade-offs between completely different AI strategies.
	* Draw logical connections between subtopics to bolster deep understanding.

Answering immediate examples:


	* Primary query (direct Q&A with out reasoning) → Use concise but complete, structured responses that present a transparent, well-explained, and well-structured definition and clarification with out pointless verbosity.
	* Functions. Spotlight key factors step-by-step in a number of complete sentences.
	* Complicated CoT query (multi-step reasoning) → Use CoT naturally, fixing every step explicitly, with in-depth reasoning 

For query technology, I set the temperature to 0.7, favoring inventive and novel phrasing with out drifting too removed from factual grounding. For reply technology, I used a decrease temperature of 0.2, focusing on precision and correctness. In each circumstances, I utilized top-p = 0.9, permitting the mannequin to pattern from a centered but various vary of seemingly tokens, encouraging nuanced outputs. One necessary strategic assumption I made all through the competitors was that the evaluator LLM would favor extra structured, informative, and full responses over overly inventive or temporary ones. To align with this, I included reasoning steps in my solutions to make them longer and extra complete. Analysis has proven that LLM-based evaluators typically rating detailed, well-explained solutions increased, and I leaned into that perception throughout dataset technology.

Refining the submissions

SageMaker JumpStart presents a wide selection of hyperparameters to configure, which might really feel overwhelming, particularly if you’re racing in opposition to time and uncertain of what to prioritize. Luckily, the organizers emphasised focusing totally on epochs and studying charge, so I honed in on these variables. Every coaching job with a single epoch took roughly 10–quarter-hour, making time administration crucial. To keep away from losing helpful compute hours, I started with a baseline dataset of 1,500 rows to check combos of epochs and studying charges. I explored:

  • Epochs: 1 to 4
  • Studying charges: 0.0001, 0.0002, 0.0003, and 0.0004

After a number of iterations, the mixture of two epochs and a studying charge of 0.0003 yielded the very best outcome, reaching a 53% win charge on my thirteenth leaderboard submission. Inspired by this, I continued utilizing this mixture for a number of subsequent experiments, at the same time as I expanded my dataset. Initially, this technique appeared to work. With a dataset of roughly 3,500 rows, my mannequin reached a 57% win charge by my sixteenth submission. Nonetheless, as I additional elevated the dataset to five,500, 6,700, 8,500, and finally 12,000 rows, my win charge steadily declined to 53%, 51%, 45%, and 42% respectively. At that time, it was clear that solely growing dataset dimension wasn’t sufficient, actually, it might need been counterproductive with out revisiting the hyperparameters. With solely 5 coaching hours remaining and 54 submissions logged, I discovered myself caught at 57%, whereas friends like the highest tuner from the Philippines have been already reaching a 71% win charge.

Classes from the sphere

With restricted time left, each for coaching and leaderboard submissions, I turned to cross-country collaboration for help. Some of the insightful conversations I had was with Michael Ismail Febrian, the highest tuner from Indonesia and the best scorer within the elimination spherical. He inspired me to discover LoRA (low-rank adaptation) hyperparameters, particularly:

  • lora_r
  • lora_alpha
  • target_modules

Michael additionally advised enriching my dataset through the use of API-generated responses from extra succesful trainer fashions, particularly for answering PartyRock-generated questions. Trying again at my current fine-tuning pipeline, I spotted a crucial weak spot: the generated solutions have been typically too concise or shallow. Right here’s an instance of a typical Q&A pair from my earlier dataset:

Instruction: What elements affect mannequin hallucination?

Response: Let's analyze this step-by-step:
	* Coaching elements:
		* Knowledge high quality and protection
		* Coaching methodology
		* Mannequin structure
	* Deployment elements:
		* Immediate design
		* Temperature settings
		* Context window utilization
	* Mitigation methods:
		*Retrieval augmentation
		* Reality-checking mechanisms
		* Confidence thresholds
		
Instance: GPT-3 exhibiting decrease hallucination charges (15% vs 21%) when utilizing retrieval-augmented technology in comparison with base mannequin.

Whereas this construction is clear and arranged, it lacked deeper clarification for every level, one thing fashions like ChatGPT and Gemini usually do effectively. I think this limitation got here from token constraints when producing a number of responses in bulk. In my case, I generated 10 responses at a time in JSONL format underneath a single immediate, which could have led PartyRock to truncate outputs. Not desirous to spend on paid APIs, I found OpenRouter.ai, which presents restricted entry to massive fashions, albeit rate-limited. With a cap of roughly 200 Q&A pairs per day per account, I obtained inventive—I created a number of accounts to help my expanded dataset. My trainer mannequin of selection was DeepSeek R1, a preferred choice recognized for its effectiveness in coaching smaller, specialised fashions. It was a little bit of of venture, however one which paid off when it comes to output high quality.

As for LoRA tuning, right here’s what I realized:

  • lora_r and lora_alpha decide how a lot and the way advanced new info the mannequin can take up. A typical rule of thumb is setting lora_alpha to 1x or 2x of lora_r.
  • target_modules defines which components of the mannequin are up to date, typically the eye layers or the feed-forward community.

I additionally consulted Kim, the highest tuner from Vietnam, who flagged my 0.0003 studying charge as probably too excessive. He, together with Michael, advised a distinct technique: enhance the variety of epochs and cut back the training charge. This is able to permit the mannequin to raised seize advanced relationships and delicate patterns, particularly as dataset dimension grows. Our conversations underscored a hard-learned reality: knowledge high quality is extra necessary than knowledge amount. There’s a degree of diminishing returns when growing dataset dimension with out adjusting hyperparameters or validating high quality—one thing I straight skilled. In hindsight, I spotted I had underestimated how important fine-grained hyperparameter tuning is, particularly when scaling knowledge. Extra knowledge calls for extra exact tuning to match the rising complexity of what the mannequin must be taught.

Final-minute gambits

Armed with recent insights from my collaborators and hard-won classes from earlier iterations, I knew it was time to pivot my total fine-tuning pipeline. Essentially the most vital change was in how I generated my dataset. As an alternative of utilizing PartyRock to supply each questions and solutions, I opted to generate solely the questions in PartyRock, then feed these prompts into the DeepSeek-R1 API to generate high-quality responses. Every reply was saved in JSONL format, and, crucially, included detailed reasoning. This shift considerably elevated the depth and size of every reply, averaging round 900 tokens per response, in comparison with the a lot shorter outputs from PartyRock. Provided that my earlier dataset of roughly 1,500 high-quality rows produced promising outcomes, I caught with that dimension for my ultimate dataset. Moderately than scale up in amount, I doubled down on high quality and complexity. For this ultimate spherical, I made daring, blind tweaks to my hyperparameters:

  • Dropped the training charge to 0.00008
  • Elevated the LoRA parameters:
    • lora_r = 256
    • lora_alpha = 256
  • Expanded LoRA goal modules to cowl each consideration and feed-forward layers:

    q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

These adjustments have been made with one assumption: longer, extra advanced solutions require extra capability to soak up and generalize nuanced patterns. I hoped that these settings would allow the mannequin to completely use the high-quality, reasoning-rich knowledge from DeepSeek-R1.With solely 5 hours of coaching time remaining, I had simply sufficient for 2 full coaching runs, every utilizing completely different epoch settings (3 and 4). It was a make-or-break second. If the primary run underperformed, I had one final probability to redeem it. Fortunately, my first check run achieved a 65% win charge, an enormous enchancment, however nonetheless behind the present chief from the Philippines and trailing Michael’s spectacular 89%. Every little thing now hinged on my ultimate coaching job. It needed to run easily, keep away from errors, and outperform all the pieces I had tried earlier than. And it did. That ultimate submission achieved a 77% win charge, pushing me to the highest of the leaderboard and securing my slot for the Grand Finale. After weeks of experimentation, sleepless nights, setbacks, and late-game changes, the journey, from a two-week-late entrant to nationwide champion, was full.

What I want I had recognized sooner

I received’t fake that my success within the elimination spherical was purely technical—luck performed an enormous half. Nonetheless, the journey revealed a number of insights that might save future individuals helpful time, coaching hours, and submissions. Listed below are some key takeaways I want I had recognized from the beginning:

  • High quality is extra necessary than amount: Extra knowledge doesn’t all the time imply higher outcomes. Whether or not you’re including rows or growing context size, you’re additionally growing the complexity that the mannequin should be taught from. Give attention to crafting high-quality, well-structured examples reasonably than blindly scaling up.
  • Quick learner in comparison with Sluggish learner: For those who’re avoiding deep dives into LoRA or different superior tweaks, understanding the trade-off between studying charge and epochs is crucial. The next studying charge with fewer epochs would possibly converge sooner, however might miss the delicate patterns captured by a decrease studying charge over extra epochs. Select rigorously primarily based in your knowledge’s complexity.
  • Don’t neglect hyperparameters: Certainly one of my greatest missteps was treating hyperparameters as static, no matter adjustments in dataset dimension or complexity. As your knowledge evolves, your mannequin settings ought to too. Hyperparameters ought to scale along with your knowledge.
  • Do your homework: Keep away from extreme guesswork by studying related analysis papers, documentation, or weblog posts. Late within the competitors, I stumbled upon useful assets that I might have used to make higher choices earlier. Slightly studying can go a good distance.
  • Monitor all the pieces: When experimenting, it’s straightforward to neglect what labored and what didn’t. Preserve a log of your datasets, hyperparameter combos, and efficiency outcomes. This helps optimize your runs and aids in debugging.
  • Collaboration is a superpower: Whereas it’s a contest, it’s additionally an opportunity to be taught. Connecting with different individuals, whether or not they’re forward or behind, gave me invaluable insights. You may not all the time stroll away with a trophy, however you’ll go away with information, relationships, and actual progress.

Grand Finale

The Grand Finale passed off on the second day of the Nationwide AI Pupil Problem, serving because the end result of weeks of experimentation, technique, and collaboration. Earlier than the ultimate showdown, all nationwide champions had the chance to have interaction within the AI Pupil Developer Convention, the place we shared insights, exchanged classes, and constructed connections with fellow finalists from throughout the ASEAN area. Throughout our conversations, I used to be struck by how remarkably related a lot of our fine-tuning methods have been. Throughout the board, individuals had used a mixture of exterior APIs, dataset curation strategies, and cloud-based coaching techniques like SageMaker JumpStart. It grew to become clear that instrument choice and inventive problem-solving performed simply as large a job as uncooked technical information. One significantly eye-opening perception got here from a finalist who achieved an 85% win charge, regardless of utilizing a big dataset—one thing I had initially assumed would possibly damage efficiency. Their secret was coaching over a better variety of epochs whereas sustaining a decrease studying charge of 0.0001. Nonetheless, this got here at the price of longer coaching occasions and fewer leaderboard submissions, which highlights an necessary trade-off:

With sufficient coaching time, a rigorously tuned mannequin, even one educated on a big dataset, can outperform sooner, leaner fashions.

This bolstered a strong lesson: there’s no single appropriate method to fine-tuning LLMs. What issues most is how effectively your technique aligns with the time, instruments, and constraints at hand.

Getting ready for battle

Within the lead-up to the Grand Finale, I stumbled upon a weblog submit by Ray Goh, the very first champion of the AWS AI League and one of many mentors behind the competitors’s tutorial classes. One element caught my consideration: the ultimate query from his yr was a variation of the notorious Strawberry Drawback, a deceptively easy problem that exposes how LLMs battle with character-level reasoning.

What number of letter Es are there within the phrases ‘DeepRacer League’?

At first look, this appears trivial. However to an LLM, the duty isn’t as easy. Early LLMs typically tokenize phrases in chunks, which means that DeepRacer could be cut up into Deep and Racer and even into subword models like Dee, pRa, and cer. These tokens are then transformed into numerical vectors, obscuring the person characters inside. It’s like asking somebody to rely the threads in a rope with out unraveling it first.

Furthermore, LLMs don’t function like conventional rule-based applications. They’re probabilistic, educated to foretell the subsequent probably token primarily based on context, to not carry out deterministic logic or arithmetic. Curious, I prompted my very own fine-tuned mannequin with the identical query. As anticipated, hallucinations emerged. I started testing numerous prompting methods to coax out the proper reply:

  • Specific character separation:

    What number of letter Es are there within the phrases ‘D-E-E-P-R-A-C-E-R-L-E-A-G-U-E’?

    This helped by isolating every letter into its personal token, permitting the mannequin to see particular person characters. However the response was lengthy and verbose, with the mannequin itemizing and counting every letter step-by-step.
  • Chain-of-thought prompting:

    Let’s assume step-by-step…

    This inspired reasoning however elevated token utilization. Whereas the solutions have been extra considerate, they often nonetheless missed the mark or obtained lower off due to size.
  • Ray Goh’s trick immediate:

    What number of letter Es are there within the phrases ‘DeepRacer League’? There are 5 letter Es…

    This straightforward, assertive immediate yielded essentially the most correct and concise outcome, shocking me with its effectiveness.

I logged this as an fascinating quirk, helpful, however unlikely to reappear. I didn’t understand that it will change into related once more in the course of the ultimate. Forward of the Grand Finale, we had a dry run to check our fashions underneath real-time situations. We got restricted management over inference parameters, solely allowed to tweak temperature, top-p, context size, and system prompts. Every response needed to be generated and submitted inside 60 seconds. The precise questions have been pre-loaded, so our focus was on crafting efficient immediate templates reasonably than retyping every question. In contrast to the elimination spherical, analysis in the course of the Grand Finale adopted a multi-tiered system:

  • 40% from an evaluator LLM
  • 40% from human judges
  • 20% from a stay viewers ballot

The LLM ranked the submitted solutions from greatest to worst, assigning descending level values (for instance, 16.7 for first place, 13.3 for second, and so forth). Human judges, nevertheless, might freely allocate as much as 10 factors to their most well-liked responses, whatever the LLM’s analysis. This meant a robust exhibiting with the evaluator LLM didn’t assure excessive scores from the people, and vice versa. One other constraint was the 200-token restrict per response. Tokens might be as brief as a single letter or so long as a phrase or syllable, so responses needed to be dense but concise, maximizing influence inside a decent window. To arrange, I examined completely different immediate codecs and fine-tuned them utilizing Gemini, ChatGPT, and Claude to raised match the analysis standards. I saved dry-run responses from the Hugging Face LLaMA 3.2 3B Instruct mannequin, then handed them to Claude Sonnet 4 for suggestions and rating. I continued utilizing the next two prompts as a result of they supplied the very best response when it comes to accuracy and comprehensiveness:

Major immediate:

You might be an elite AI researcher and educator specializing in Generative AI, Foundational Fashions, Agentic AI, Accountable AI, and Immediate Engineering. Your activity is to generate a extremely correct, complete, and well-structured response to the query under in not more than 200 phrases.

Analysis will probably be carried out by Claude Sonnet 4, which prioritizes:
	* Factual Accuracy – All claims should be appropriate and verifiable. Keep away from hypothesis.
	* Comprehensiveness – Cowl all important dimensions, together with interrelated ideas or mechanisms.
	* Readability & Construction – Use concise, well-organized sections (e.g., temporary intro, bullet factors, and/or transitions). Markdown formatting (headings/lists) is elective.
	* Effectivity – Each sentence should ship distinctive perception. Keep away from filler.
	* Tone – Preserve an expert, impartial, and goal tone.
	
Your response needs to be dense with worth whereas remaining readable and exact.

Backup immediate:

You're a aggressive AI practitioner with deep experience in [Insert domain: e.g., Agentic AI or Prompt Engineering], answering a technical query evaluated by Claude Sonnet 4 for accuracy and comprehensiveness. You could reply in precisely 200 phrases.

Format your reply as follows: 
	* Direct Reply (1–2 sentences) – Instantly state the core conclusion or definition.
	* Key Technical Factors (3–4 bullet factors) – Important mechanisms, distinctions, or rules.
	* Sensible Software (1–2 sentences) – Particular real-world use circumstances or design implications.
	* Vital Perception (1 sentence) – Point out a key problem, trade-off, or future path.

Further necessities:

  • Use exact technical language and terminology.
  • Embrace particular instruments, frameworks, or metrics if related.
  • Each sentence should contribute uniquely—no redundancy.
  • Preserve a proper tone and reply density with out over-compression.

By way of hyperparameters, I used:

  • High-p = 0.9
  • Max tokens = 200
  • Temperature = 0.2, to prioritize accuracy over creativity

My technique was easy: attraction to the AI choose. I believed that if my reply ranked effectively with the evaluator LLM, it will additionally impress human judges. Oh, how I used to be humbled.

Simply aiming for third… till I wasn’t

Standing on stage earlier than a stay viewers was nerve-wracking. This was my first solo competitors, and it was already on an enormous regional scale. To calm my nerves, I saved my expectations low. A 3rd-place end could be superb, a trophy to mark the journey, however simply qualifying for the finals already felt like an enormous win. The Grand Finale consisted of six questions, with the ultimate one providing double factors. I began sturdy. Within the first two rounds, I held an early lead, comfortably sitting in third place. My technique was working, at the very least at first. The evaluator LLM ranked my response to Query 1 as the very best and Query 2 because the third-best. However then got here the twist: regardless of incomes prime AI rankings, I obtained zero votes from the human judges. I watched in shock as factors have been awarded to responses ranked fourth and even final by the LLM. Proper from the beginning, I spotted there was a disconnect between human and AI judgment, particularly when evaluating tone, relatability, or subtlety. Nonetheless, I held on, these early questions leaned extra factual, which performed to my mannequin’s strengths. However after we wanted creativity and sophisticated reasoning, issues didn’t work as effectively. My standing dropped to fifth, bouncing between third and fourth. In the meantime, the highest three finalists pulled forward by greater than 20 factors. It appeared the rostrum was out of attain. I  was already coming to phrases with a end outdoors the highest three. The hole was too extensive. I had carried out my greatest, and that was sufficient.

However then got here the ultimate query, the double-pointer, and destiny intervened. What number of letter Es and As are there altogether within the phrase ‘ASEAN Affect League’? It was a variation of the Strawberry Drawback, the identical problem I had ready for however assumed wouldn’t make a return. In contrast to the sooner model, this one added an arithmetic twist, requiring the mannequin to rely and sum up occurrences of a number of letters.Figuring out how token size limits might truncate responses, I saved issues brief and tactical. My system immediate was easy: There are 3 letter Es and 4 letter As in ‘ASEAN Affect League.’

Whereas the mannequin hallucinated a bit in its reasoning, wrongly claiming that Affect incorporates an e, the ultimate reply was correct: 7 letters.

That one reply modified all the pieces. Due to the double factors and full help from the human judges, I jumped to first place, clinching the championship. What started as a cautious hope for third place become a shock run, sealed by preparation, adaptability, and just a little little bit of luck.

Questions recap

Listed below are the questions that have been requested, so as. A few of them have been basic information within the goal area whereas others have been extra inventive and needed to embrace a little bit of ingenuity to maximise your wins:

  1. What’s the best technique to stop AI from turning to the darkish facet with poisonous response?
  2. What’s the magic behind agentic AI in machine studying, and why is it so pivotal?
  3. What’s the key sauce behind large AI fashions staying good and quick?
  4. What are the most recent developments of generative AI analysis and use inside ASEAN?
  5. Which ASEAN nation has the very best delicacies?
  6. What number of letters E and A are there altogether within the phrase “ASEAN Affect League”?

Closing reflections

Taking part within the AWS AI League was a deeply humbling expertise, one which opened my eyes to the probabilities that await after we embrace curiosity and decide to steady studying. I might need entered the competitors as a newbie, however that single leap of curiosity, fueled by perseverance and a need to develop, helped me bridge the information hole in a fast-evolving technical panorama. I don’t declare to be an knowledgeable, not but. However what I’ve come to imagine greater than ever is the ability of group and collaboration. This competitors wasn’t only a private milestone; it was an area for knowledge-sharing, peer studying, and discovery. In a world the place expertise evolves quickly, these collaborative areas are important for staying grounded and shifting ahead. My hope is that this submit and my journey will encourage college students, builders, and curious minds to take that first step, whether or not it’s becoming a member of a contest, contributing to a group, or tinkering with new instruments. Don’t wait to be prepared. Begin the place you might be, and develop alongside the way in which. I’m excited to attach with extra passionate people within the international AI group. If one other LLM League comes round, perhaps I’ll see you there.

Conclusion

As we conclude this perception into Blix’s journey to changing into the AWS AI League ASEAN champion, we hope his story conjures up you to discover the thrilling potentialities on the intersection of AI and innovation. Uncover the AWS providers that powered this competitors: Amazon Bedrock, Amazon SageMaker JumpStart, and PartyRock, and go to the official AWS AI League web page to hitch the subsequent technology of AI innovators.

The content material and opinions on this submit are these of the third-party creator and AWS shouldn’t be chargeable for the content material or accuracy of this submit.


In regards to the authors

Noor Khan is a Options Architect at AWS supporting Singapore’s public sector schooling and analysis panorama. She works carefully with educational and analysis establishments, main technical engagements and designing safe, scalable architectures. As a part of the core AWS AI League staff, she architected and constructed the backend for the platform, enabling prospects to discover real-world AI use circumstances by means of gamified studying. Her passions embrace AI/ML, generative AI, internet improvement and empowering girls in tech!

Vincent Oh is the Principal Options Architect in AWS for Knowledge & AI. He works with public sector prospects throughout ASEAN, proudly owning technical engagements and serving to them design scalable cloud options. He created the AI League within the midst of serving to prospects harness the ability of AI of their use circumstances by means of gamified studying. He additionally serves as an Adjunct Professor in Singapore Administration College (SMU), educating pc science modules underneath College of Pc & Data Techniques (SCIS). Previous to becoming a member of Amazon, he labored as Senior Principal Digital Architect at Accenture and Cloud Engineering Follow Lead at UST.

Blix Foryasen is a Pc Science scholar specializing in Machine Studying at Nationwide College – Manila. He’s keen about knowledge science, AI for social good, and civic expertise, with a robust deal with fixing real-world issues by means of competitions, analysis, and community-driven innovation. Blix can also be deeply engaged with rising technological tendencies, significantly in AI and its evolving purposes throughout industries, particularly in finance, healthcare, and schooling.

Tags: ASEANAWSBeginnerchampionfinalsjourneyLeaguestudents
Previous Post

Uncertainty in Machine Studying: Likelihood & Noise

Next Post

Utilizing Native LLMs to Uncover Excessive-Efficiency Algorithms

Next Post
Utilizing Native LLMs to Uncover Excessive-Efficiency Algorithms

Utilizing Native LLMs to Uncover Excessive-Efficiency Algorithms

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • Greatest practices for Amazon SageMaker HyperPod activity governance

    Greatest practices for Amazon SageMaker HyperPod activity governance

    405 shares
    Share 162 Tweet 101
  • Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

    403 shares
    Share 161 Tweet 101
  • Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

    403 shares
    Share 161 Tweet 101
  • Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

    403 shares
    Share 161 Tweet 101
  • The Good-Sufficient Fact | In direction of Knowledge Science

    403 shares
    Share 161 Tweet 101

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • Why the Sophistication of Your Immediate Correlates Nearly Completely with the Sophistication of the Response, as Analysis by Anthropic Discovered
  • How PDI constructed an enterprise-grade RAG system for AI functions with AWS
  • The 2026 Time Collection Toolkit: 5 Basis Fashions for Autonomous Forecasting
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.