Introduction

Preface

In the field of computer vision, many techniques are available to perform Image Classification. New algorithms are released every year and they keep on outperforming what was previously defined as the state of the art.

This notebook investigates the fundamental literature behind the history of Neural Networks and seeks to shine light on some of the recent advancements in the area. Particular focus is given to networks that require little data.

This notebook is structured as follows. Firstly, the Image Classification problem and its key challenges are defined. Following, the architecture, history and advantages of Neural Networks are presented; then, an overview is given of artificial neurons and some popular activation functions. Contextually, the network training process is explained.

The second part discusses Convolutional Neural Networks, one of the most popular implementations of the deep networks model. The first historical CNNs are described, their architecture is reviewed, and the layers that make up these networks are detailed, with particular focus on the convolutional and pooling layers.

The last section presents one-shot Image Classification, along with a practical example. Following, siamese neural networks are outlined; these networks do not need to be trained beforehand to recognise new objects, thus achieving one-shot learning. Finally, other one-shot learning algorithms are discussed, which include K-Nearest Neighbour and Matching Networks.

Background

The first digitally operated and programmable robot, Unimate, was invented in 1954 by George Devol. Since then, robots have increasingly been employed in a number of sectors, among which are industry, healthcare, transport, science and even hospitality.

Recent developments in Artificial Intelligence paved the way to the new era of robotics: robots are taking on more complex and less structured tasks, including collaborating with people to complete jobs. To see and interact in a collaborative environment, robots can be equipped with artificial neural networks. However, in practical settings this can soon lead to a number of complications. To illustrate this point, the following domestic scenario is considered: a kitchen robot is designed to help with cooking via passing objects to a human user, and is trained to recognise spatulas, whisks and pepper mills. When a never before seen object (e.g. a ladle) is introduced into the kitchen, the robot does not recognise it, and thus cannot pass it to the human when asked.

This problem would be easily overcome if the robot could learn to recognise the ladle on the spot.

Such problematic situation does not have a simple solution. Although humans can naturally learn new objects from a single example, an artificial neural network needs to be fully retrained to recognise a new object class. This is the problem that one-shot learning tries to solve: it aims to incrementally learn new objects via seeing only one or a few examples of it.

Referring back to domestic scenario, thanks to one-shot learning the robot could simply take a picture of the ladle, associate it with a class communicated verbally by the user, and store it to complete future object passing tasks.

Problem statement

One of the biggest ongoing challenges of Neural Networks involves training models from little data that can generalise well. Currently, deep networks require a substantial amount of data to be trained and achieve top results. This is often expensive, labour-intensive and time consuming to obtain. \medskip

One-shot learning aims to solve the problems linked to expensive training via learning new objects using only one or a few examples. Moreover, it allows to incrementally assimilate new data and to avoid parameter retraining.

Aim and Objectives

The aim of this project is to tackle the one-shot learning challenge via building a one-shot recognition system for 3D objects.

The project objectives are listed below:

  1. Analyse the challenges of image recognition.

  2. Review the existing literature on standard and convolutional neural networks.

  3. Examine the most acclaimed approaches at tackling image recognition with one-shot learning algorithms.

  4. Design a system to tackle the one-shot image verification and recognition tasks.

  5. Test the performance of the designed system on the selected dataset as well as in real settings on a NAO robot.

A research gap is identified in applying one-shot learning for the recognition of 3D objects. From this, three research questions and one sub-question are derived:

  1. What results does one-shot learning achieve in recognising 3D objects unseen at training time using only one or a few examples?

    • What results are achieved in the image verification task in the same settings?
  2. What are the implications of using models pretrained on different domain data for the one or few-shot task?

  3. How do the results translate when one or few-shot recognition is carried out in real settings?