Week 22 · Space GIS Architect~7 min · 703 words

ML for satellite imagery: CNNs and U-Net segmentation

Deep learning has rewritten remote sensing. CNNs (object detection) and U-Nets (semantic segmentation) are now standard. This week you train one on real GOES data.

Could a machine learning model spot bleaching coral from satellite imagery before a human diver does?

Yes — and several already do. This week you'll learn U-Net, the same architecture researchers use for reef-bleaching detection, lava-flow mapping, and (yes) rocket-plume segmentation.

Learning objectives

Train a CNN for object detection in raster imagery
Build a U-Net for semantic segmentation of clouds / plumes / fires
Generate training data via thresholding + manual labels
Evaluate with IoU and confusion matrices

Kāneʻohe Bay's reefs have been imaged thousands of times. U-Net learns from those images to flag bleaching automatically.

Try it: tune a confidence threshold

Every classifier has a knob: how confident must it be to flag a detection? Higher threshold → fewer false positives, more missed detections. Try it.

Confidence threshold (value)0.60

Primer

Deep learning has rewritten the playbook for satellite imagery analysis over the past decade. Convolutional neural networks now do object detection, semantic segmentation, super-resolution, and change detection at production scale across every major Earth-observation platform. This week is the practical primer: when to use deep learning vs threshold rules, the U-Net architecture, and how to train one on real GOES data.

When deep learning beats thresholding

Threshold rules (Week 14's Band 7 > 320 K) work when the discriminator is a single scalar feature. They break down when:

The discriminator is spatial-contextual (a plume looks different from a wildfire in shape and spatial neighborhood, not just brightness).
You need probabilistic output (confidence scores) for downstream cost-of-error decisions.
You have many labeled examples and want a single classifier that captures complex patterns.

Threshold rules are great for fast, explainable, debuggable baseline detection. Deep learning shines for the next layer: scoring, classification, and segmentation refinement.

The U-Net architecture

U-Net (Ronneberger et al. 2015) is the workhorse for image segmentation in remote sensing. It's an encoder-decoder with skip connections:

Encoder — successive 3x3 convolutions + 2x2 max-pool, halving the spatial dimensions and doubling the channel count at each level. By the bottleneck, the feature map is small but channel-rich.
Decoder — successive 2x2 transpose convolutions + 3x3 convolutions, doubling the spatial dimensions and halving channels. Reconstructs the original resolution.
Skip connections — at each decoder level, concatenate the corresponding encoder feature map. This preserves fine-grained spatial detail that would otherwise be lost in the bottleneck.

The output is a same-size map of per-pixel class probabilities. For plume segmentation, the classes are {background, plume}; for multi-class fire/plume/cloud, expand accordingly.

import torch.nn as nn

class UNet(nn.Module):
    def __init__(self, in_ch=1, out_ch=1, n_features=32):
        super().__init__()
        # ... 4 down blocks + bottleneck + 4 up blocks ...
        # Each block: Conv3x3 → BatchNorm → ReLU → Conv3x3 → BatchNorm → ReLU
        # Down: Conv block + MaxPool2x2
        # Up: ConvTranspose2x2 + concat with skip + Conv block

Weak supervision

The training-data problem: who hand-labels rocket plumes in tens of thousands of GOES frames? Nobody. The trick is weak supervision — generate the training labels programmatically.

For plumes: run Week 14's threshold detector + Week 20's morphology cleanup over a year of GOES frames around known launches. Cross-check against the published launch schedule. Use those pixel masks as training labels. The labels are noisy (some false positives, some false negatives), but with enough volume the U-Net learns to denoise — it picks up on spatial context the threshold rule can't see.

Evaluation: IoU and confusion matrices

For segmentation, accuracy is misleading (a network that predicts "no plume everywhere" gets 99.99% accuracy because most pixels really are no plume). Use:

IoU (Intersection over Union) — area of overlap / area of union. 1.0 is perfect, 0.0 is no overlap. Compute per-class IoU and report the mean (mIoU).
Confusion matrix — true positives, false positives, false negatives, true negatives at the pixel level. Derive precision (TP / (TP + FP)) and recall (TP / (TP + FN)). For a typical production detector, the gate is false positive rate must stay below a few percent — alarm fatigue kills the user experience long before missed detections do.

Small models, not big

For thermal plume segmentation in 200×200 pixel windows, a 32-feature U-Net (~1M parameters) is more than enough. Don't reach for big pretrained models — they need huge training sets, they're slow to deploy, and the feature distribution of satellite imagery is far enough from ImageNet that pretrained weights help less than you'd expect.

The lab

You'll generate weakly-supervised training data from threshold detections + morphology over a year of GOES Band 7 frames — a generic recipe applicable to plume detection, wildfire mapping, gas-flare inventories, or volcanic-hotspot classification — train a small U-Net in PyTorch, and evaluate on held-out scenes with IoU + confusion matrices. The same architecture (encoder-decoder with skip connections, small parameter count, weak-supervision pretraining) underpins most operational hotspot-classification layers in industry. What each operator does on top — feature stacking, gating, fusion — is the secret sauce that doesn't ship in an open curriculum.

Connecting to Hawaiʻi: ML for reef health

Researchers at the Hawaiʻi Institute of Marine Biology, at NOAA Coral Reef Watch, and at University of Hawaiʻi have published work using U-Net and similar CNNs to detect coral bleaching from satellite imagery — automating what used to require human diver surveys. The same architecture (encoder-decoder with skip connections) you'll learn this week powers those reef-monitoring systems. Weak supervision (training on imperfect labels) was a key technique because hand-labeling bleaching is expensive.

Coral Reef Watch's daily bleaching alerts use thermal-IR + ML. Every alert that goes out saves diver-survey time and lets reef managers respond faster.

Hands-on lab: U-Net for plume segmentation

Generate training data from threshold-detected plumes in GOES Band 7. Train a small U-Net to segment plume pixels. Evaluate on held-out launches with IoU and confusion matrices.

Open in Colab Download .ipynb

Quiz — click an answer to check it

No grade, no shame. Tap any option; you'll see if it's right plus the answer if not. The point is to notice what you already know and what's still settling.

Q1. U-Net architecture is:

Encoder-decoder with skip connections, ideal for segmentation
Just a CNN
Recurrent
Transformer-only

Q2. IoU (intersection over union) measures:

Overlap between predicted and ground-truth mask
Loss only
Reprojection error
Compression ratio

Q3. Generating training data via thresholding is called:

Weak supervision (programmatic labels)
Manual labeling
Synthetic data
Augmentation

Q4. CNNs work well on images because:

Translation invariance and locality
They're newest
Only choice
Marketing

Q5. Why use a small U-Net (not a giant model)?

Faster inference, less overfitting on small training sets, deployable to edge
Always smaller is worse
Required by law
No reason

Reflection

Take five minutes with this. Write your answer somewhere. Carry it into next week.

ML automates pattern detection that used to require people. Sometimes that's freeing (humans stop doing repetitive work). Sometimes that's deskilling (humans lose embodied knowledge). Where is the line?

Mark this week complete Visiting alone doesn't count it as 'done'. Click when you've actually worked through the primer + lab + quiz.

Share + discuss on Twitter/X Discuss on GitHub