How to Deploy a Machine Learning Algorithm

Everything but a guide.

Home Hugging face Model Interface Custom Model Interface

I'm unreasonably excited just typing this, because this week I've had countless conversations with myself about it. But now I get to have a more structured conversation— with you. Welcome.

There's a lot to compress in this post, because it's been a week of learning. And I don't mean just new skills, I also mean learning the very nature of how I learn.

On Learning

I've been thinking a lot about this quote (people who have done practical deep learning for coders by jeremy howard will be familiar with this idea):

"You don't teach someone baseball by handing them a textbook on the physics of parabolic motion."

That idea stuck with me because I'm genuinely torn between a deep, low-level understanding and chasing the thrill of high-level deployment. So I sit down for hours, grinding through linear algebra, working out backpropagation by hand. It's intense and exhausting. But it's also weirdly satisfying, there's a kind of beauty in visualizing the math, seeing how it all fits together.

And yet... it always feels a little distant from the real world.

Then I actually build something. A high-level deployment, maybe an image classifier using Gradio and Hugging Face. And at first, it feels almost too easy. A few lines of code, a slick interface, and it works — eventually. A bug hits every now and again and you descend into temporary madness. But strangely, that's when it starts to feel real. Because once it does work, and you see it in action, and you start thinking of all the clever ways you can use it, it starts to feel exciting.

A mathematician named Paul Lockhart put it even more scathingly in his essay A Mathematician's Lament. He imagines a world where kids aren't allowed to paint until they've memorized pigment formulas, and music is off-limits until they've done years of harmony analysis. And that's basically how we treat math and science. Theory now, joy later (if ever).

That vision came flooding back this week as I zigzagged between two very different approaches to understanding neural networks:

So let's learn how a neural network is able to tell a dog from a cat. In 2 Parts.

PART 1: Low Level

Convolution

Convolution is a mathematical operation that combines two functions to produce a third. Put simply. For a more contextual understanding I recommend the video by 3blue1brown ( This video). In the context of images, one function is the input image and the other is a small kernel (or filter). You slide the kernel across the image and compute the dot product at every location. The result is a new image (called a feature map) that highlights certain features of the original image.

Think of it like this: if the input image is a massive haystack, convolution helps you design tiny, reusable magnets (kernels) to efficiently scan for needles (features like edges or curves).

Given an image I and a kernel K, convolution is defined (in 2D) as:

S(i, j) = ∑∑ K(u, v) · I(i + u - 1, j + v - 1)

Where:

You're essentially computing a weighted sum of pixels in the local neighborhood around (i, j).

Famous Kernels in Computer Vision (Before Deep Learning Took Over)

Before CNNs were learning kernels automatically, people used handcrafted kernels for basic image processing. Some notable examples:

Name Purpose Kernel Example
Sobel Edge detection
            [-1  0  1]
            [-2  0  2]
            [-1  0  1]
                            
Gaussian Blurring/smoothing
            [1  2  1]
            [2  4  2]
            [1  2  1] (normalized)
                            
Laplacian Second derivative, detects edges and changes in intensity
            [ 0 -1  0]
            [-1  4 -1]
            [ 0 -1  0]
                            
Emboss Gives a 3D shadow effect
            [-2 -1  0]
            [-1  1  1]
            [ 0  1  2]
                            

CNNs essentially learn their own custom kernels through training, ones that aren't just for edges or textures, but specialized for things like "cat ear with 3-pixel tilt" or "whisker-like curve."

In a typical deep learning model, multiple such kernels are used at each layer. Each kernel detects a different pattern. Instead of being handcrafted, the model learns the values in each kernel during training.

So with, say, 32 kernels scanning a 64×64 image, you end up with 32 filtered versions of the original image each one highlighting a different type of feature. These 32 filtered images are called feature maps.

This feature detection process continues layer by layer, with deeper layers detecting more abstract features (first layer: edges; second layer: corners and textures; later layers: eyes, paws, whiskers, etc.).

RELU Function

The concept of activation functions has been discussed a lot throughout the series especially in the introduction to deep learning blog.

After a convolution operation (i.e. filtering the image for features), you're left with a bunch of numerical values; some positive, some negative. But here's the thing:

Without activation functions, your neural network is just stacking a bunch of linear equations. And guess what? A stack of linear equations is still just... one big linear equation.

That's where activation functions come in, and ReLU is the reigning champion (especially inbetween the networks).

What is ReLU?

ReLU stands for Rectified Linear Unit, and it's defined by a simple rule:

f(x) = max(0, x)

In other words:

That's it. Just hard pruning.

Why Use It?

Pooling

After filtering with convolutions and applying ReLU to keep only the strong, positive signals, you're left with feature maps which are rich but still quite large. Convolution results in images that are larger than both the input variables, so at this stage you picture matrix is much larger. However only specific features are detected by the actual neural network, and so you can just "pool" these and get rid of all the whitespace.

Pooling achieves that by shrinking things down intelligently.

What Is Pooling?

Pooling is a downsampling technique. It reduces the size of the feature maps while retaining the most important information. The goal is:

How Does It Work?

The most common type is Max Pooling.

Max Pooling:
You slide a small window (e.g. 2×2 or 3×3) over the feature map and pick the maximum value inside that window.

If your feature map looks like this:

[1 3 2 4
 5 6 1 2
 7 8 9 4
 3 2 1 0]

...and you apply a 2×2 max pooling with stride 2 (i.e. no overlap), you'll get:

[6 4
 8 9]

This kind of pulling only notices the loudest parts of the image"

Equation:
Let's define it more generally. For a window W ⊂ ℝⁿ×ⁿ, max pooling is:

MaxPool(W) = max(x ∈ W)

There's also Average Pooling, which takes the mean instead of the max, but max pooling is generally preferred for classification tasks — it keeps the strong, defining features.

Why Pool?

Flattening

At this point, we've cleaned the image, filtered it, activated it, and squeezed out the most important spatial features with pooling.

Flattening is the step where we say:
"Alright, enough image stuff. Let's go full neural net now."

What Is Flattening?

Flattening takes the multi-dimensional output of the previous layer (typically 2D or 3D arrays of features) and unrolls it into a 1D vector. It's like taking every little number from each feature map and laying them all in a straight line.

So a 3D array like:
Shape: (32, 32, 16)
...becomes a single array:
Shape: (32 × 32 × 16) = (16,384,)

This vector is then passed into the Fully Connected Layers, i.e. traditional neural network territory. Here's where the model starts making actual decisions based on all the condensed image data — like "cat," "dog,", all from just a (really) long string of numbers!

For the Rest....

At this point, the preprocessing is done. All the heavy image manipulation and filtering is out of the way. Now it's just numbers, weights, biases, and matrix multiplication from here on out.

For the rest of the process including the math behind how those weights are updated, what backpropagation is, and how a neural net actually learns. I covered it in my earlier post:
👉 Neural Networks: An Introduction to Deep Learning
Feel free to check that out for the equations and brain gymnastics.

PART 2: Practical Application

The Model: Training Before Deployment

Before you can deploy a model, you have to train it. If Part 1 was the theory this is the doing. And trust me, clicking "run" after hours of debugging feels incredible.

Using the fastai library (which is built on top of PyTorch), you can get a working image classifier in surprisingly few lines of code.

Step 1: Point to Your Data

from pathlib import Path

path = Path(r"C:\Users\Fanny\OneDrive - Fanny Fushayi\Computer Science\Building_AI\Computer_Vision\Cat_v_Dog")

You should organize your images into subfolders (e.g., Cat/ and Dog/), because fastai uses folder names to infer class labels.

Step 2: Load and Prepare the Data

from fastai.vision.all import *

dls = ImageDataLoaders.from_folder(
    path,
    train='.',              # Use subfolders as class labels
    valid_pct=0.2,          # 20% validation split
    item_tfms=Resize(224),  # Resize all images to 224x224
    batch_tfms=aug_transforms(), # Data augmentation
    bs=6                    # Batch size
)
What is Data Augmentation?

Think of data augmentation as training your model with optical illusions. It helps the model generalize better by transforming your input images: rotating, flipping, zooming, lighting changes, etc. This simulates real-world variety and helps avoid overfitting.

Step 3: Show a Batch

dls.show_batch(max_n=9)

This lets you visually confirm that your images are being loaded, labeled, and transformed correctly. If you see upside-down cats or inverted dogs don't panic. That's the augmentation at work.

Step 4: Train the Model

learn = vision_learner(dls, resnet34, metrics=accuracy)
learn.fine_tune(2)
What Is Fine-Tuning?

resnet34 is a pretrained model, it's already learned to recognize patterns from millions of images (thanks, ImageNet). Fine-tuning means we keep its earlier layers (those that detect general features like edges, textures, and shapes) and only train the final layers on our specific task; classifying cats and dogs.

It's like hiring an experienced detective and only teaching them the details of this new case. (** now you know why companies require experience on job posts)

Step 5: Save Your Model

learn.export("My_model.pkl")

This saves your trained model to a .pkl file which is a serialized file that can be loaded later for inference.

You're done with training. Let's go deploy this thing.

Deployment: Taking It Online

Now, this part gets messy. Not because it's hard but because you'll encounter issues that don't feel "ML-related." Like GitHub version control, Python environments, GPU access, OS differences; basically, the adult stuff.

Quick Setup Summary for Hugging Face + GitHub Deployment:

  1. Create a Hugging Face account
  2. Create a new Space (select Gradio as the SDK)
  3. Clone the space repo using Git:
    git clone https://huggingface.co/spaces/your-username/your-space-name
  4. Add your files:
    • model.pkl
    • app.py
    • requirements.txt (this lists packages to install, like fastai, gradio, etc.)
  5. Git basics:
    git add .
    git commit -m "Initial commit"
    git push

Done right, your app launches automatically.

Challenges I Faced

The Final app.py Code

import gradio as gr
from fastai.vision.all import *

# Load the model
learn = load_learner('model.pkl')

# Define categories (fastai usually infers this automatically)
categories = ('Dog', 'Cat')

def classify_image(img):
    pred, idx, probs = learn.predict(img)
    return dict(zip(categories, map(float, probs)))

# Create interface
demo = gr.Interface(
    fn=classify_image,
    inputs=gr.Image(type="pil"),
    outputs=gr.Label(num_top_classes=2),
    title="🐱🐶 Cat vs Dog Classifier",
    description="Upload an image to classify whether it's a cat or dog!"
)

if __name__ == "__main__":
    demo.launch()

Final Thoughts

The practical side of ML felt easy, until it wasn't. Most of my bugs weren't in matrix multiplication, they were in getting files into the right folders or figuring out why Hugging Face wouldn't boot.

But once it worked? I couldn't stop playing with it. I tried random photos, my camera feed, comic cats, even weird color-inverted images. It was fun. It made the whole "low-level theory" suddenly click. Now I knew what all that math was for.

And that's why this wasn't just a guide — it was a learning arc.
Everything but a guide.

Obviously, this is surface level, and if you were to go down the rabbit hole, I'm sure you'd need both at some point; the theory and the tooling. There's a lot of nuanced discussion that can come out of all this. Another one of the projects I did was a family facial recognition system because someone said we all look alike... Not very true, apparently, because it took just one epoch and only 10 images per person to hit 90% accuracy. Go figure. (NB: This is outside the technical section, I am not claiming that it was easy for the NN, it probably could do it because of the augmentations (more samples than 10), and the pretraining of a very good model (that could recognise most of the things anyway))

The point is: image classification opens the door to a ton of fun and weird ideas. You can experiment fast, build cool things, and learn a lot along the way (often by breaking stuff).

Oh Also Also, if you want to go the extra mile, you can use the Hugging Face API to get more control over the UI, and even embed your model directly into your own website. And make a super duper awesome webpage using your model as you wish (or other people's). Something like this: Et Voila

import requests

response = requests.post(
    "https://api-inference.huggingface.co/models/your-username/your-model-name",
    headers={"Authorization": "Bearer YOUR_HUGGINGFACE_TOKEN"},
    files={"file": open("my_cat_image.jpg", "rb")}
)

print(response.json())

With a bit of front-end work, you could turn this into a slick embedded app. No need to use Gradio's hosted UI, just wire it into your own HTML/CSS/JS setup and roll your own thing. Super useful if you want to integrate it into a portfolio or blog.

Anywaysssss... however far you decide to go, it's a playground. Just don't forget to git commit before you break something again.