An Interactive Introduction to Model-Agnostic Meta-Learning

Exploring the world of model-agnostic meta-learning and its variants.

This page is part of a multi-part series on Model-Agnostic Meta-Learning. If you are already familiar with the topic, use the menu on the right side to jump straight to the part that is of interest for you. Otherwise, we suggest you start at the beginning.

Comparison of MAML and its variants

In order to compare the above methods visually, we return to the non-linearized-line-fitting problem from before, see this figure. This time, however, we will plot a single update direction of MAML, FOMAML, Reptile, and iMAML on the combined loss space of the two tasks, such that you can verify whether the methods point into reasonable directions (i.e., towards the local optimum). The combined loss space is defined via the meta loss of the two tasks, i.e., \[\mathcal{L}(\theta) := \sum_{i \in \{0, 1\}} \mathcal{L}_{\tau_i, \text{test}}(\phi_i).\] A word of warning: The update directions are computed on actual data and with the actual algorithms running in your browser on tensorflow.js. If you are experiencing delays on the vector update when moving \(\theta\), you can disable some of the computations via the panel under the figure.

Empirical comparison

By now, you have a theoretical understanding of the four methods we presented and might have looked into how the methods produce different updates on the meta-parameter \( \theta \). To complete the comparison, we want to give you a short overview of the empirical results of these methods on two common few-shot benchmarks and Mini-ImageNet .

	Omniglot		Mini-ImageNet
Method	5-way 1-shot	20-way 1-shot	5-way 1-shot
MAML	98.7 ± 0.4%	95.8 ± 0.3%	48.70 ± 1.84 %
FOMAML	98.3 ± 0.5%	89.4 ± 0.5%	48.07 ± 1.75 %
REPTILE	97.68 ± 0.04%	89.43 ± 0.14%	49.97 ± 0.32 %
iMAML	99.16 ± 0.35%	94.46 ± 0.42%	48.96 ± 1.84 %

All numbers were taken from Rajeswaran et al. (2019), which accumulated the results from the various papers.

There is no clear winner. Each method has its place, and only time will show which methods will prevail.