An Interactive Introduction to Model-Agnostic Meta-Learning

Exploring the world of model-agnostic meta-learning and its variants.

This page is part of a multi-part series on Model-Agnostic Meta-Learning. If you are already familiar with the topic, use the menu on the right side to jump straight to the part that is of interest for you. Otherwise, we suggest you start at the beginning.

Why MAML is Model-Agnostic

In this section we explain why MAML is "model-agnostic" and thereby gain a bit more of an overview of the meta-learning field. Metric-based and model-based approaches force constraints on either the sampling (e.g. episodic training) or the architecture of the model. MAML on the other hand requires only one very general assumption: the model needs to be optimizable by an gradient-based optimizer. Hence, it has been introduced as "model-agnostic". But notice that the desired model is still not completely free of assumptions. It is important to view the method in the context of the field to understand what really sets it apart in terms of design, assumption and approach, which is what we will consider on the rest of this page.

Applications of Meta-Learning outside the domain of few-shot learning include the optimization of the task-level optimizer using a LSTM network.

Metric-based approaches

The core idea of metric-based approaches is to compare two samples in a latent (metric) space: In this space, samples of the same class are supposed to be close to each other, while two samples from different classes are supposed to have a large distance (the notion of a distance is what makes the latent space a metric space).

Model-based approaches

Model-based approaches are neural architectures that are deliberately designed for fast adaption to new tasks without an inclination to overfit. Memory-Augmented Neural Networks and MetaNet are two examples. Both employ an external memory while still maintaining the ability to be trained end-to-end.

Optimization-based approach MAML

MAML goes a different route: The neural network is designed the same way your usual model might be (in the many-shot case). All the magic happens during the optimization, which is what makes it "optimization-based". As a consequence, unlike metric-based and model-based approaches, MAML lets you choose the model architecture freely. This has the great benefit of being applicable not only to conventional supervised learning classification tasks but also to reinforcement learning .

In the following figure, you can find a selection of meta-learning methods that tackle few-shot learning, their performance on Omniglot, as well as your own accuracy score from the starting page. Next to recurrent and attention-based models , there is also optimized-based meta-learning, of which model-agnostic meta-learning (short: MAML, marked in red) is the most famous representative and the main interest for the rest of this article.

This figure shows the results of different methods on the Omniglot dataset. If not stated differently, you see the results of 20-way 1-shot, but some differences in the evaluation procedure exist. As usual, accuracy numbers need to be taken with a grain of salt as differences in the evaluation method, implementation, and model complexity may have a non-negligible impact on the performance.

The generative stroke model was introduced in the paper, which also introduced the Omniglot dataset. The model is based on a latent stroke representation (including the number and directions of strokes). While it is an interesting approach, it can hardly be generalized to other few-shot problems.

The same authors improved the model by learning latent primitive motor elements and called this process "Hierarchical Bayesian Program Learning" (HBPL). While the accuracy was greatly increased, it also is focused on symbol learning.

Siamese Nets consist of two identical networks which produce a latent representation. From the representations of two samples, a distance is calculated to assess the similarity of the two samples. The result (accuracy of 88.1%) results from a reimplementation of the method making it more comparable.

Matching Networks also work by comparing new samples to labeled examples. They do so by utilizing an attention kernel. Though the second version of the paper is cited here, it was first published in 2016.

Prototypical Networks use prototype vectors to represent each class in the metric space. The nearest neighbor (i.e., the closest prototype) of a sample then determines the prediction.

Memory-Augmented Networks (MANNs) use external memory to make accurate predictions using a small number of samples.

Meta Networks utilize a base learner (task level) and a meta learner as well as a set of slow and rapid weights to allow meta-learning and task-specific concepts.

In the next section we will take a close look at MAML and study the math behind the method. Furthermore, you will get the chance to explore a simple few-shot learning problem and find out firsthand why a meta-learning