Zero-shot Learning

Andrea Stevens Karnyoto
3 min readNov 22, 2021

--

Zero-shot Learning by Andrea Stevens Karnyoto

The definition of zero-shot learning is the ability of a machine learning model to recognize classes that have never been trained. The zero-shot learning approach is designed to study intermediate semantic layers and attributes; then apply it to predict new classes at inference time by considering the attribute descriptions. In other words, the model is able to classify invisible classes. For this case, invisible classes have data that has never been trained in the training process.

Zero-shots are very useful in many situations. There are many species, types of goods, products, or activities without labels and have their specific description (picture, text information, or other information). Zero-shot learning is inherently a two-stage process: training and inference. At the training stage, knowledge of attributes is captured, and the inference stage is used to categorize examples among a new set of classes. This paradigm is inspired by how humans can identify new objects simply by reading their descriptions, utilizing the similarities between definitions of new objects and previously learned concepts. For instance, if we have classified several types of birds, cats, and snakes in an animal dataset, it will be closer to birds than cats or snakes when we input a new kind of bird (untrained).

Example of Zero-shot Learning by Andrea Stevens Karnyoto

For example, as seen in the picture above, if we input the description of the eagle (untrained) using the trained model, the model should be able to classify it close to the group of birds. So on if we input a description of the mamba (untrained), it should be classified to the group of snakes.

We can apply zero-shot to Natural Language Processing, especially to perform the text classification task. We can predict the class’s probability by utilizing transfer learning such as Bidirectional and Auto-Regressive Transformers (BART). Transfer learning makes it possible to apply the task without using big data. HuggingFace repository has many models that have been used for zero-shot learning. They also provide a very easy-to-use zero-shot pipeline.

In this article, I will use Facebook AI Research (FAIR) to do zero-shot text classification.

from transformers import pipeline
import pandas as pd
classifier = pipeline(“zero-shot-classification”, model=”facebook/bart-large-mnli”)
Downloading: 100%|██████████| 908/908 [00:00<00:00, 183kB/s]
Downloading: 100%|██████████| 1.63G/1.63G [10:15<00:00, 2.65MB/s]
sequence_to_classify = “I went around the world for one purpose, which is to test various food.”
candidate_labels = [‘travel’, ‘cooking’, ‘dancing’, ‘exploration’]
result = classifier(sequence_to_classify, candidate_labels)
pd.DataFrame(result)

output:

Result of single class Zero-shot by Andrea Stevens Karnyoto

However, sometimes, one sample can belong to more than one class (multilabel). HuggingFace provides a parameter called multi_class for this. The following example uses this parameter:

result = classifier(sequence_to_classify, candidate_labels, multi_class=True)

pd.DataFrame(result)

Result of multi class Zero-shot by Andrea Stevens Karnyoto

--

--

Andrea Stevens Karnyoto
Andrea Stevens Karnyoto

Written by Andrea Stevens Karnyoto

Natural Language Processing Researcher

No responses yet