ChatGPT, Carbon Footprint of AI, Airbnb's Metrics Platform

PandaScore Research Insights #8

, and

Apr 12, 2023

ChatGPT

What it is about: ChatGPT is a conversational agent developed by OpenAI in 2022 based on the large language model (LLM) GPT-3.

How it works: To train this conversational agent, two things are needed:

A very large language model trained on a large corpus of text to predict the most probable next word.
Fine-tuning the model to converse in a human-like way.

For 1., OpenAI used a Transformer-based neural network trained on the "common crawl" dataset. First, they used unsupervised learning to predict the next token of a sentence. Then, they fine-tuned the model to answer more specific tasks like Q&A, classification, and comparison. This is GPT-3.

For 2., they used an improved Reinforcement Learning from Human Feedback algorithm to fine-tune the LLM for conversational abilities. One key element is not to let the reinforcement learning policy deviate too much from the supervision given by the labeller during step 1. This ensured that ChatGPT closely aligned with the behavior wanted by the authors.

Results: Even though ChatGPT performed well on regular Natural Language Processing academic benchmarks, its capabilities go beyond regular LLM capacities. It can pass many hard exams that students have to pass to get their diplomas (medicine and law school, MBA, coding tests, etc.). At this level, this is an unprecedented achievement.

What they found out: One difficulty with this type of conversational agent is mitigating the hallucinating effect where the model comes up with completely made-up and wrong assertions. However, doing this drastically reduces the coverage of the model for questions it would have correctly answered.

Why it matters: We may be at a new turning point in terms of what today and future Artificial Intelligences (AI) capabilities will unlock. Technology has always been the main way of improving our working efficiency. Thus, we may ask ourselves today whether ChatGPT-like AIs will profoundly disrupt our jobs. There is a world where most of today's necessary skills would be totally obsolete, so following closely the next steps of those big conversational LLMs is of primary importance.

Our takeaways: Even though tomorrow's society may not be defined by the supra-human performance of GPT4, 5 or 6, we have access to ChatGPT today, which already has enough skills to help us code better, write better, and come up with new ideas. For instance, this very article has been cleaned-up by ChatGPT after being given the following prompt “Can you cleanup the following text? No content should be removed.”.

The Carbon Footprint of Machine Learning Training

What it is about: The carbon footprint of Information and Communication Technology (ICT) is growing, and as Machine Learning (ML) practitioners, it is our responsibility to limit the carbon footprint of our models and training. In a 2022 article, Google researchers compared the performance of several models and proposed best practices to reduce the carbon emissions of ML training by up to 1000x.

Result: Comparison of GPT-3 and GLaM:

GPT-3 is an autoregressive language model with 175B parameters (Transformer models used ≤0.2B). It was trained on 10,000 V100 GPUs in a Microsoft cloud data center.
GLaM is a new language model that uses 7x more parameters than GPT-3. It is a mixture of experts model that only activates experts selectively based on the input, so that no more than 95B parameters (8%) are active per input token.

GPT-3 activates all 175B parameters on every token. GLaM can exceed GPT-3 on quality and efficiency due to its more parameters and sparsity. GLaM can be trained with 14x less CO2e footprint.

Parameters, accelerator years of computation, energy consumption, and gross CO2e for GPT-3 and GLaM.

What they found out: The paper proposes four best practices:

Model: selecting efficient ML model architectures while advancing ML quality, such as sparse models versus dense models, can reduce computation by factors of ~5–10.
Machine: using processors optimized for ML training such as TPUs or recent GPUs (e.g., V100 or A100), versus general-purpose processors, can improve performance per Watt by factors of 2–5.
Mechanization: computing in the Cloud rather than on premise improves data center energy efficiency, reducing energy costs by a factor of 1.4–2.
Map: cloud computing lets ML practitioners pick the location with the cleanest energy, further reducing the gross carbon footprint by factors of 5–10.

Our takeaways: Climate change is one of the most important problems of our century, and as ML practitioners, we have a role to play in limiting our energy consumption. Even if our usage is far from that of Google or Microsoft, it’s still essential to keep in mind the four simple good practices proposed by the authors and do our best to reduce our carbon footprint.

In addition to what the authors recommend, here are some simple guidelines to help you develop your next model:

In most cases, the architecture you need already exists, and using an existing model architecture will save effort.
Several versions of model architecture are often available, so use the most efficient model available (often the most recent version).
If you need to build a custom architecture, you can search for a more efficient one using Neural Architecture Search (NAS) or other architecture discovery techniques.

Minerva: metrics platform at Airbnb

What it is about: This series of articles (part I, II, and III), released in 2021, is about Minerva - a metric platform built at Airbnb. The purpose of Minerva is to serve the internal users’ end-to-end needs. In this series of articles, the authors discuss why they built Minerva, how they did it, and share their learnings.

How it works: Airbnb started by exposing facts and dimensions. Facts tables contain measurable values (e.g., number of bookings, average length of stay), while dimensions tables contain contextual information to filter said facts (e.g., date, location). As the number of use cases increased, and as the tables became too complex, they needed to standardize the whole workflow. With Minerva, metrics are defined only once and can be used by different stakeholders. From metrics definition to serving, different life cycle steps are manageable such as back-filling, monitoring, DAG orchestration and more.

Full metrics lifecyle management at Airbnb by Minerva.

Minerva has been designed to be:

Standardized: Single clear and consistent definitions for all data.
Declarative: Users define the "what" not the "how" of data processing.
Scalable: Minerva can handle large amounts of data and traffic.
Consistent: Data is always up-to-date and consistent, with automated back-filling after change in metrics definition for instance.
Highly available: New data is rolled out without downtime or interruption.
Well tested: Changes are extensively tested before being deployed.

Why it matters: Providing users with the same metrics is key. Their decisions are based on data, stakeholders need to have the same references. They should speak about the same metrics. Minerva is not the only metric platform available. Other companies are building similar ones. Metric platforms can give more freedom to analysts. They can focus on business topics more than on technical ways to build a metric.

Our takeaways: It was interesting to see an approach based on metrics, instead of exposing the raw data. That being said, this is something that seems to fit the data needs of bigger teams (i.e., bigger than the 40-ish people we are at PandaScore). Also, even after reading this article, it is not clear how to build such a complex platform and grasp the complexity of using it.