Local Files

    • Transfer learning for image classification

      In this example we want to train a neural network to distinguish between pictures of cats and pictures of dogs. This task is called image classification and typically requires large neural networks and a lot of training data. Training such large models requires high performance computers and can not be done on a personal computer. Luckily, the training process can be simplyfied by the technique called transfer learning. For transfer learning the knowledge from a neural network that was trained on a similar problem is "transered" to the actual problem. A similar problem means that the network was trained with a similar objective and on data that shares certain features with the desired dataset.

      For this problem we will use the pre-trained MobileNet v2 model that has been trained for general image classification on the ImageNet database. The front part of the MobileNet network extracts features from the images and the last two layers are used to deduce the classification based on the given features. For the task of classifying cats and dogs, we will reuse the front part of the network and only retrain the classification part based on our classes.

      Be aware that even though transfer learning reduces the complexity of the training process, this example requires substantial computing power and at least 12 GB of memory.

      As a first step, let us import all libraries that we need for machine learning, unziping and image decoding.

    • Download and unzip the training images

      To train the pre-trained model for the classification of cats and dogs, we need pictures of cats and dogs as training data. So in the following step we will download a zipped directory with images and store its data as an Arraybuffer.

    • We use the fflate library to decompress the zipped data into a Map that contains the local paths of the images and the corresponding data.

    • For the training process we need to attach labels to the training data. And hence we need to seperate the decompressed data into arrays of cat images, dog images and validation images.

    • Decode JPEG images and convert them to tensors

      In the next step we will decode the images into bitmap representations and store them inside of tensors. The tensors have 3 dimensions, 2 for the dimensions of the image and 1 for the 3 values of the RGB code. To simplify the classification we resize the images to 224 by 224 pixels.

      In the end we return an object that contains the resized Tensors as its xs property and its label as its ys property. The labels are tensors with the length equal to the number of classes (in our case 2). The first entry contains the probability of the input being a cat and the second entry the probability of the input being a dog. For the cat images the labels are [1,0]. Aferwards, we repeat this step for the dog and validation images.

      During this process a lot of memory has to be allocated. It is therefore important to perform this tasks in one function call, so that the intermediate results are discarded. Additionally we will wrap these steps inside of a tf.tidy() call to make sure that unneeded tensors are cleared. Be aware that executing the following cells can take a while and can lead to temporarily freezing the browser tab.

    • To ensure an unbiased training process we need to shuffle our training data. Otherwise the model will be trianed with cat images first and then with dof images. To do this conveniently we define a dataset from our arrays of images. We can then easily shuffle and batch the data accordingly.

    • Create model for transfer learning

      In the next step we setup the neural network for transfer learning. As discussed before the first part of the network is responsible for the feature extraction of the images while the last two layers perform the classification. For the first part we download the "headless" version of the MobileNet model, meaning without the classification layers.

    • Since we only want to train the classification layers we set the layers of the first part to not be trainable.

    • Next, we have to define the two classification layers oursevles. The average pooling layer downscales the output of the first part into the feature vector that contains the information about the features present in the current image. The second layer outputs the actual classification. Since we have 2 classes, its units parameter is equal to 2.

    • With our layers in place we can create the model.

    • Let's print out a summary of the model to the console.

    • Finally, we have to specify wich loss function we want to minimize and what optimization algorithm we want to use.

    • Train model

      Now that the model structure is defined and the data is setup, it's time to train the model. Running it for only '10' epochs will still give a high enough accuracy.

    • Make prediction

      Let us test if the transfer learning worked by classifying a picture from a remote url. For that we load the picture into an image element. If you have the url of a different picture, you can try that instead.

    • We will then convert the image element into a tensor and pass it to the prediction function of our model. Let's see what the prediction is.