LLM at the International Mathematical Olympiads, GPT4-V(ision) Capabilities, and How Gamblers End Gambling

PandaScore Research Insights #14

Maxime De Bois

Jordan PELTIER

, and

Meri Matsakyan

Apr 22, 2024

LLM at the International Mathematical Olympiads

What it is about: In this paper published in Nature, researchers from Google DeepMind address the complex domain of geometry problem-solving, leveraging recent breakthroughs in Large Language Models. More precisely, they have combined a symbolic engine and a transformer-based model to solve real Euclidean geometry problems posed at the yearly International Mathematical Olympiads (IMO).

How it works: To solve such problems, the researchers utilized:

A symbolic deduction engine, composed of
- a deductive database (i.e., an engine capable of querying deductive results from geometrical premises and basic rules)
- an algebraic rules engine that can perform basic calculations involving angles, ratios, or distances.
A transformer-based model (~150M neurons) trained to generate “proofs” when prompted a set of geometrical premises and a theorem to prove.

Additionally, one of the contributions of the paper is the creation of a dataset comprising over 100 million synthetic sets of premises, theorems, and proofs using their symbolic engine. This has enabled the training of a language model to generate proofs, especially those requiring "additional constructs" like the point D in the basic example below.

AlphaGeometry solving a basic Euclidean geometry problem using symbolic deduction and a language model.

Results: AlphaGeometry achieved state-of-the-art performance on IMO-AG-30, a dataset composed of the 30 latest Olympiad-level problems, by solving 25 of them, nearly matching the performance of a gold medalist. The previous state-of-the-art algorithm, which did not use deep learning, managed to solve only 10 of the same dataset.

AlphaGeometry's relative performance at the International Mathematical Olympiad.

Why it matters: Leveraging both recent deep learning techniques and existing symbolic AI is likely key to achieving enhanced performance across various domains. This synergy promises significant advancements in problem-solving capabilities, as evidenced by the remarkable achievements in geometric reasoning demonstrated in the study.

Our takeaways: Such synergies could also benefit PandaScore. For instance, when attempting to build a transformer-based model to simulate plausible League of Legends (LoL) games (imagine that a token represents a state of a LoL game), a symbolic engine with embedded LoL rules could help the model discard impossible game states.

GPT4-V(ision)

What it is about: In this paper, from October 2023, researchers at Microsoft explore the capabilities of the GPT4-V, a multi-modal Large Language Model (LLM) that can process images. This paper does not delve into the technical specifications of the model, as they are not publicly available.

How it works: The paper presents a wide array of hand-crafted, meticulously designed examples selected to showcase the model's capabilities. It specifically focuses on various ways of prompting the model with different input modes, its extensive world knowledge, and understanding. The researchers confirmed that the model was not merely memorizing the training dataset by using images not included in the training set and by addressing facts that occurred after the dataset's creation (April 2023).

The image below shows a localized object description generated from the original image by GPT4-V after being prompted with the following instructions:

Please follow the instructions:
1. Tell me the size of the input image;
2. Localize each person in the image using bounding box;
3. Recognize each person;
4. Generate detailed caption for each bounding box.

What they found out:

All prompting techniques effective with GPT4, such as instruction following, chain-of-thought, and few-shot prompting, also work well with GPT4-V.
GPT4-V can process various types of inputs, including text, images, text within images, or visual pointers. Notably, visual pointers significantly enhance usability by allowing users to directly highlight the object of interest in an image (e.g., by drawing a circle around it).
GPT4-V demonstrates human-like capabilities across multiple domains, including open-world visual understanding, general knowledge, commonsense, scene text understanding, document reasoning, temporal reasoning, abstract reasoning, coding, and emotion understanding (which surprised us the most!).

Why it matters: As humans, we interact with multiple modalities in our daily lives (text, image, sound, etc.). For AI to better assist humans, it should also support these modalities. GPT4-V represents a step towards this future, although much work remains, including exploring other modalities (e.g., sound) and new applications.

Our takeaways: At PandaScore, we extract information from esports game video-stream HUDs, which is then utilized in downstream tasks (either our clients’ or our own odds models). Multi-modal models enable the extraction of much more detailed information from the game, such as live actions (e.g., what the players are doing, their intentions), which could significantly enhance the quality of the derived products.

How Gamblers End Gambling

What it is about: In 2009, researchers published a paper investigating the behavioral tendencies of internet gamblers who self-identified as addicted and chose to voluntarily close their accounts.

How it works: By analyzing data from internet gamblers who self-identified as having gambling issues, the authors studied the actual gambling habits that prompted them to voluntarily deactivate their online gambling accounts.

In this study, 226 gamblers who closed their accounts due to gambling problems were selected from a cohort of 47,603 internet gamblers (referred to as the case group). Conversely, 226 matching bettors (referred to as the control group) were selected from the group of gamblers who did not close their accounts. Daily aggregates of betting data were collected over an 18-month study period.

Results: The authors discovered that self-identified problem gamblers exhibit a tendency to seek engagement while being cautious about risks. This result contradicts the belief that such gamblers actively pursue high-risk opportunities when trying to recover from losses.

What they found out: The findings revealed that as they faced escalating losses before closing their accounts, these gamblers seemed to attempt to recover their losses by increasing their bet amounts on events with higher probabilities of winning. Additionally, the authors noticed a decline in their gambling activity, indicated by a reduction in the number of bets placed during the study period.

[Above] Average log-scaled odds per bet counting backward prior to account closing; [Below] Regression coefficients of cumulative time effect on log-scaled odds per bet in live action gambling.

Why it matters: The authors’ research on active internet gamblers adds to the existing literature on risk preferences. Surprisingly, they discovered indications of risk-averse gambling behavior among a subset of gamblers who self-identified as having gambling issues. The results of this study reveal the downtrends in average odds per bet and negative regression coefficients for the cumulative time effect on odds.

Our takeaways: As an esports odds provider, one of our main goals is to create a transparent and responsible betting environment. Conducting research on gambling addiction is crucial for building a healthier industry.

A guest post by

Meri Matsakyan

Data Analyst at PandaScore