Week 27 · Space GIS Architect~7 min · 711 words

Production pipelines: S3 → Lambda → EventBridge → DDB

Production geospatial isn't a notebook — it's a pipeline. This week is the real AWS architecture LaunchDetect runs in production: S3 ingest, Lambda compute, EventBridge schedule, DynamoDB state.

When something important happens — a launch, an eruption, a hurricane — how does the alert get from the satellite to the people who need it?

Through a pipeline of cloud services. This week you'll wire one up: S3 ingests data, Lambda processes it, DynamoDB stores it, EventBridge routes the alert. Same architecture LaunchDetect uses; same architecture you could use to build community-grade alerts.

Learning objectives

Pacific Disaster Center on Maui serves geospatial alerts to 90+ countries. The architecture they use is the architecture this week teaches.

Primer

Production geospatial isn't a notebook. It's a pipeline: data lands somewhere, code runs, results land somewhere else, the pipeline is monitored, alerts fire when something breaks. This week is the AWS architecture that LaunchDetect actually runs in production — minus a few enterprise-specific layers.

The core stack: S3 + Lambda + EventBridge + DynamoDB

Four services, each doing one thing well:

  • S3 — object storage. Every GOES NetCDF, every detection JSON, every static page lives in S3. Cheap (~$0.023/GB/month), durable (11 9s), event-emitting.
  • Lambda — serverless compute. Function as a service. You pay for invocations and execution duration; no servers to manage. Cold start is the main pitfall.
  • EventBridge — event bus. Routes events between AWS services and your own consumers. Replaces ad-hoc SNS/SQS/CloudWatch Events combinations.
  • DynamoDB — NoSQL key-value/document store. Single-digit-millisecond reads. Pay for storage + read/write units.

The detection pipeline

A representative operational flow looks like:

  1. The provider (e.g. NOAA) writes a new GOES Band 7 mesoscale NetCDF to a public S3 bucket such as s3://noaa-goes18/....
  2. The bucket emits an S3 event; an EventBridge rule fans it out to a downstream scorer Lambda.
  3. The scorer fetches the NetCDF (range-requested for just the geographic window of interest) and runs the detection chain you've built across weeks 13–15: radiance → brightness temperature, threshold, parallax correction, write candidates to DynamoDB. The classification layer that decides which candidates clear the bar to become alerts is intentionally treated as a black box here — that's where each operator differentiates and isn't part of a generic teaching example.
  4. A DynamoDB stream triggers a publisher Lambda that writes the cleared-detection JSON to a public-facing S3 location and emits a domain event to EventBridge.
  5. Subscribers (web dashboard, push-notification service, blog generator) receive the event and update their own state.

Total latency from the source file landing to a notification on a user's phone: typically on the order of a minute, depending on which step you optimize.

DynamoDB partition key design

DynamoDB's #1 footgun is hot partitions. Every item has a partition key (PK) and optionally a sort key (SK). DynamoDB hashes PK and routes the item to a physical partition. If 90% of your writes go to a single PK, you bottleneck on that one partition's WCU/RCU limit (3,000 reads / 1,000 writes per second).

Good PK choices spread writes evenly across partitions. For launch detections, a natural PK is DETECTION#{ulid} — ULIDs are time-ordered but have enough entropy that they distribute evenly. Bad PK: DATE#{yyyy-mm-dd} — all today's writes go to one partition.

Lambda cold starts

When Lambda receives a request and has no warm container available, it cold-starts: provision a sandbox, download the function code, initialize the runtime, run the handler. Cold start can be 200 ms (Python 3.13 lightweight) to 3+ seconds (heavy Java / large dependency tree).

For latency-sensitive request paths (API endpoints), cold start matters and you mitigate with: provisioned concurrency, smaller deployment packages, lighter runtimes, lazy imports. For event-driven batch (which is most space-GIS pipelines), cold start is fine — a launch detection that takes 90 seconds doesn't care about 500 ms cold start.

AWS CDK

AWS CDK (Cloud Development Kit) is infrastructure-as-code in real programming languages — TypeScript, Python, Java, Go. You write classes that instantiate AWS resources; CDK synthesizes them to CloudFormation templates; CloudFormation deploys them.

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { Bucket, EventType } from 'aws-cdk-lib/aws-s3';
import { Function, Runtime, Code } from 'aws-cdk-lib/aws-lambda';
import { S3EventSource } from 'aws-cdk-lib/aws-lambda-event-sources';
import { Table, AttributeType } from 'aws-cdk-lib/aws-dynamodb';

export class DetectionStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const bucket = new Bucket(this, 'IngestBucket');
    const table = new Table(this, 'Detections', {
      partitionKey: { name: 'pk', type: AttributeType.STRING },
      sortKey:      { name: 'sk', type: AttributeType.STRING },
    });

    const scorer = new Function(this, 'Scorer', {
      runtime: Runtime.PYTHON_3_13,
      handler: 'handler.handler',
      code:    Code.fromAsset('lambda/scorer'),
    });
    scorer.addEventSource(new S3EventSource(bucket, {
      events: [EventType.OBJECT_CREATED],
    }));
    table.grantWriteData(scorer);
  }
}

The lab

You'll build a mini detection pipeline: a Lambda triggered by S3 PutObject, that reads a small GOES NetCDF, threshold-detects hotspots, and writes detection records to a DynamoDB table. Deploy with AWS CDK. This is the same overall S3 → Lambda → DDB shape that powers most modern serverless geospatial scorers — the differentiating logic (which candidates clear the bar to become real alerts) stays with you and your operational context.

Connecting to Hawaiʻi: Community alerting infrastructure

The State of Hawaiʻi Emergency Management Agency's HI-EMA Alert system runs on cloud-native infrastructure not unlike what you'll learn this week. Pacific Disaster Center, based on Maui, runs the DisasterAWARE platform serving 90+ countries on similar AWS architecture. Knowing how this works means you can build the same kind of system for your own community: a flood-alert pipeline for a single watershed, a beach-closure alert for a stretch of coast, a coral-bleaching alert for a specific reef.

Pacific Disaster Center has internships. So does HI-EMA. Knowing the cloud-native geospatial stack opens those doors.

Hands-on lab: Mini detection pipeline

Build a Lambda triggered by S3 PutObject. The Lambda reads a small GOES NetCDF, threshold-detects hotspots, writes records to a DynamoDB table. Deploy with AWS CDK.

Quiz — click an answer to check it

No grade, no shame. Tap any option; you'll see if it's right plus the answer if not. The point is to notice what you already know and what's still settling.

Q1. S3 → Lambda trigger is configured via:
  1. S3 event notification to Lambda function ARN
  2. Polling
  3. SNS only
  4. EventBridge only
Q2. EventBridge is best for:
  1. Decoupled event routing, scheduled rules, cross-service orchestration
  2. Database
  3. Just cron
  4. File storage
Q3. DynamoDB partition key choice impacts:
  1. Distribution and hot-partition behavior
  2. Cost only
  3. Nothing
  4. Display order
Q4. Lambda cold start matters for:
  1. Latency-sensitive endpoints; less for event-driven batch
  2. Always
  3. Never
  4. Only TypeScript
Q5. AWS CDK is:
  1. Infrastructure-as-code in TypeScript / Python / Java / Go
  2. Just a CLI
  3. A database
  4. A managed service

Reflection

Take five minutes with this. Write your answer somewhere. Carry it into next week.

Cloud infrastructure makes powerful systems cheap to build. It also concentrates control in three big cloud providers. What does that tradeoff mean for community-scale tools?
Mark this week complete Visiting alone doesn't count it as 'done'. Click when you've actually worked through the primer + lab + quiz.
Share + discuss on Twitter/X Discuss on GitHub