// For my Substack readers, this is technical post (reposted from LinkedIn), read on if interested (lots of technical jargon here).

While much of the work on previous MOOCs on Agentic AI/LLMs for UC Berkeley Agentic Artificial Intelligence (AI) MOOC taught by Computer Scientist faculty and UC Berkeley Center for Responsible, Decentralized Intelligence Co-Director, Professor Dawn Song (outside of Professor Dawn Song’s of course on AI Safety) were broad and had different areas of interest that I found compelling (though not my favorite cup of tea due to the nature of how difficult it was like (LLMs for mathematics and learning language Lean) last term, my favorite lectures in this semester, were about one of my favorite subjects - GameAI and reinforcement learning and its evolution over the years. I had been putting off a comprehensive post on GameAI since Nvidia GTC, GDC (my two favorite conferences each year next to Meta Connect, and whatever other exciting papers come out of NeurIPS - FYI my goal for writing my first Medium post recapping GTC 2017, was to have a link I posted on LinkedIn wanting to publish an article and be able to attend NeurIPS, which I still haven’t been to lol, but because of that post, I got offered to write my first book - back-story of the O’Reilly Media Creating AR VR, and I was asked by a different publisher to write a book on “how to create Pokemon go from scratch across 6 different AR VR Head Mounted Displays (HMDs).

More specifically, as a poker player myself, I was so excited to hear guest lectures delivered by 3 AI research scientist and professor celebrity status heavyweights:

Noam Brown (OpenAI) famous for creating the AI beating professional human poker players, namely Libratus and Pluribus, and was formerly Meta AI Research - FAIR, now at OpenAI). // Lol, I have long followed reading his work on poker and been so inspired by his work--total fangirl here wanting to still create “PasoyDosAI” (Filipino poker AI for kicks).
Oriol Vinyals (Google DeepMind, VP of Research) - hearing all about AlphaStar was also eye-opening. // As a Starcraft player back in the day, I was super excited to hear someone talk about all the Alphas (AlphaGo, Alpha Star, AlphaFold). As someone who pitched (and turned down) a job at a confidential game company years ago - no it wasn’t Blizzard btw, sadly-- and I didn’t want to move to SoCal) to create an angel investing arm AND create a team for gameAI analyzing their games (citing AlphaStar as my inspiration) - something they eventually did (create an angel investing position for a woman, sadly non-technical) and probably investing way more in AI now given their recent production in film and breakthroughs in the gaming industry with AI--this was so cool to have him come and speak!
Peter Stone(Chief Scientist of SonyAI, Professor at University of Texas, Austin) - widely known for his work on Gran Turismo, with AI surpassing human level performance of human car racers. As the sister and cousin to many hardcore car enthusiasts (my brother and many of my cousins are what we call “aZn” racers who modify their cars), I absolutely was surprised to hear from and loved this guest lecture by University of Texas, Austin

“Autonomous Agents: Embodiment, Interaction, and Learning”

One of the most exciting lectures came from University of Texas, Austin, Professor Peter Stone (known for his work at SonyAI) and Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning.

Having visited the Afeela (new Sony car store) in Valley Fair Mall in Santa Clara/San Jose, most recently (before seeing this lecture), see image below, I was so surprisingly excited to hear more all about the in-depth back-story about another use case in Reinforcement Learning (RL), AI agent beating out professional drivers (racers).

First, he differentiated the difference between what we know as ‘robots’ vs ‘agents’

“all robots are agents, not all agents are robots”

Source: Substack

He makes the distinction between AI agents and robots (physical environment as the differentiator).

Source: Substack

Me taking a quick tour of the new Afeela car at the new store in Valley Fair - at Santa Clara/SJ (Silicon Valley/South Bay) Mall

As someone who has been heavily invested in Augmented Reality (Virtual Reality Mixed Reality eXtended Reality (AR VR MR XR/ spatial computing) for quite some time, it was refreshing to hear about this.

Professor Stone showed the various methods to train AI agents that included human-in-the-loop techniques, similar to the one mentioned in a previous lecture in the Fall 2024 MOOC by Jim Fan (Nvidia, Director of AI & Distinguished Scientist), where it showed the use of the teleoperation by Nvidia using Apple Vision Pro headsets last year for various tasks. There was more details given about finetuning planner parameters with teleoperation vs. (human input), vs. where a person is remote controlling the robot, vs. actual RL.

Source: Substack

As more of an side, Professor Stone’s other research he mentioned focuses on the concept of “ad-hoc teamwork“, (reference from a paper in 2010), which seemed very much a stark contrast to Professor Dawn Song’s lecture, which focused on preventing attacks by adversarial agents we consider at the nexus/intersection of AI, and cybersecurity, but instead focused on effective team player coordination and collaboration. This is certainly a very interesting portion of the field that has yet to be fully explored when it comes to thinking about new theories, experiments, and policies we can develop when it comes to rewarding positive productivity between multi-player games and multi-agent systems.

Noam Brown and Multi-Agent AI

I was so thrilled to finally hear a guest lecture from Noam Brown, a must-watch lecture, as I was looking for more of his insight since his TED talk and an earlier lecture at University of Washington (YouTube) where a lot of his work at OpenAI focused instead on general AI models as opposed to more juicy details into beating professional poker players and surpassing existing benchmarks (some have told me most of this domain has been largely completely conquered, though I beg to differ and think there are still other games outside of Chinese/Filipino/Asian poker, the same way years ago I heard AlphaStar and DOTA was criticized by some since it was singular in level and not fully autonomous when we came to AI agents surpassing human level performance in this game arena. We have come a long way since then!

In this lecture, he did speak broadly about GameAI (with a more extended and detailed explanation of Rock Paper Scissors, the paper I read ages ago).

My favorite parts of this lecture, was his point about the essentialism of human data was to AI (event though there has been so much hype personally that I’ve seen about synthetic data). Here he says that population best response, which requires data on the population of players (human data). He says to you if you need to ask if that data is “other copies of yourself, is the population humans?

“Depending on what the population is you may need to have data on that population. If you care about is performance with humans.”

Source: Substack

Then he recommends this 3-step process to describe the optimal solution for 2 player non-zero sum games are these steps.

Source: Substack

He goes on to detail his work on Hanabi, Diplomacy and multi-agent systems in general.

Oriol Vinyals (Google DeepMind, VP Ressearch) - Multi-Agent Systems in the Era of LLMs

Known for his work on everything suffixed “Alpha” that has been groundbreaking in reinforcement learning, it was a joy to hear the insider perspective on the evolution from everything between AlphaGo (focused on the game AI agents beating out professional Go players), AlphaStar (StarCraft) through Gemini.

See the history of Deepmind’s contributions to the field below.

Source: Substack

One of my all time favorite games as a Blizzard fangirl was Starcraft!

Source: Substack

Vinyals went into deep detail about the architecture of LLMs overtime, which was a lot simpler back then.

Source: Substack

Similar to Noam Brown’s comments about population play, Vinyals notes what is distinct with StarCraft and also adopts Rock Paper Scissors, imitation learning, and other methods.

Source: Substack

What’s different about Starcraft is that it is a real time strategy game, and like poker (where you don’t know every combination since there is information hidden by players that show all their cards that they are holding) is they are both considered “imperfect information” games. While I also like chess, it is considered perfect information game, and I find imperfect information games more challenging, complex, and more interesting in general.

What’s different now in the age of LLMs

Source: Substack

Vinyals talks about the differences with LLMs, that the input, interaction, data ingested is different than how RL was adopted prior, the simplicity of the architectures prior, and the complexities we have to take into account now, and areas to be explored.

Overall the course was comprehensive, extending upon the previous 2 I took with Professor Dawn Song on advanced reasoning Fall and Spring of 2024.

UC Berkeley Agentic Artificial Intelligence (AI) - with Professor Dawn Song - GameAI, RL (Poker, Starcraft, + Racecars)-Noam Brown, Oriol Vinyals, Peter Stone

Noam Brown and Multi-Agent AI

Oriol Vinyals (Google DeepMind, VP Ressearch) - Multi-Agent Systems in the Era of LLMs