Production pipelines: S3 → Lambda → EventBridge → DDB
Production geospatial isn't a notebook — it's a pipeline. This week is the real AWS architecture LaunchDetect runs in production: S3 ingest, Lambda compute, EventBridge schedule, DynamoDB state.
When something important happens — a launch, an eruption, a hurricane — how does the alert get from the satellite to the people who need it?
Through a pipeline of cloud services. This week you'll wire one up: S3 ingests data, Lambda processes it, DynamoDB stores it, EventBridge routes the alert. Same architecture LaunchDetect uses; same architecture you could use to build community-grade alerts.
Learning objectives
- Wire S3 PutObject events to Lambda triggers
- Use EventBridge for scheduled and event-driven orchestration
- Persist detection records to DynamoDB
- Reason about cost and latency in serverless geospatial pipelines
Primer
Production geospatial isn't a notebook. It's a pipeline: data lands somewhere, code runs, results land somewhere else, the pipeline is monitored, alerts fire when something breaks. This week is the AWS architecture that LaunchDetect actually runs in production — minus a few enterprise-specific layers.
The core stack: S3 + Lambda + EventBridge + DynamoDB
Four services, each doing one thing well:
- S3 — object storage. Every GOES NetCDF, every detection JSON, every static page lives in S3. Cheap (~$0.023/GB/month), durable (11 9s), event-emitting.
- Lambda — serverless compute. Function as a service. You pay for invocations and execution duration; no servers to manage. Cold start is the main pitfall.
- EventBridge — event bus. Routes events between AWS services and your own consumers. Replaces ad-hoc SNS/SQS/CloudWatch Events combinations.
- DynamoDB — NoSQL key-value/document store. Single-digit-millisecond reads. Pay for storage + read/write units.
The detection pipeline
A representative operational flow looks like:
- The provider (e.g. NOAA) writes a new GOES Band 7 mesoscale NetCDF to a public S3 bucket such as
s3://noaa-goes18/.... - The bucket emits an S3 event; an EventBridge rule fans it out to a downstream scorer Lambda.
- The scorer fetches the NetCDF (range-requested for just the geographic window of interest) and runs the detection chain you've built across weeks 13–15: radiance → brightness temperature, threshold, parallax correction, write candidates to DynamoDB. The classification layer that decides which candidates clear the bar to become alerts is intentionally treated as a black box here — that's where each operator differentiates and isn't part of a generic teaching example.
- A DynamoDB stream triggers a publisher Lambda that writes the cleared-detection JSON to a public-facing S3 location and emits a domain event to EventBridge.
- Subscribers (web dashboard, push-notification service, blog generator) receive the event and update their own state.
Total latency from the source file landing to a notification on a user's phone: typically on the order of a minute, depending on which step you optimize.
DynamoDB partition key design
DynamoDB's #1 footgun is hot partitions. Every item has a partition key (PK) and optionally a sort key (SK). DynamoDB hashes PK and routes the item to a physical partition. If 90% of your writes go to a single PK, you bottleneck on that one partition's WCU/RCU limit (3,000 reads / 1,000 writes per second).
Good PK choices spread writes evenly across partitions. For launch detections, a natural PK is DETECTION#{ulid} — ULIDs are time-ordered but have enough entropy that they distribute evenly. Bad PK: DATE#{yyyy-mm-dd} — all today's writes go to one partition.
Lambda cold starts
When Lambda receives a request and has no warm container available, it cold-starts: provision a sandbox, download the function code, initialize the runtime, run the handler. Cold start can be 200 ms (Python 3.13 lightweight) to 3+ seconds (heavy Java / large dependency tree).
For latency-sensitive request paths (API endpoints), cold start matters and you mitigate with: provisioned concurrency, smaller deployment packages, lighter runtimes, lazy imports. For event-driven batch (which is most space-GIS pipelines), cold start is fine — a launch detection that takes 90 seconds doesn't care about 500 ms cold start.
AWS CDK
AWS CDK (Cloud Development Kit) is infrastructure-as-code in real programming languages — TypeScript, Python, Java, Go. You write classes that instantiate AWS resources; CDK synthesizes them to CloudFormation templates; CloudFormation deploys them.
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { Bucket, EventType } from 'aws-cdk-lib/aws-s3';
import { Function, Runtime, Code } from 'aws-cdk-lib/aws-lambda';
import { S3EventSource } from 'aws-cdk-lib/aws-lambda-event-sources';
import { Table, AttributeType } from 'aws-cdk-lib/aws-dynamodb';
export class DetectionStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const bucket = new Bucket(this, 'IngestBucket');
const table = new Table(this, 'Detections', {
partitionKey: { name: 'pk', type: AttributeType.STRING },
sortKey: { name: 'sk', type: AttributeType.STRING },
});
const scorer = new Function(this, 'Scorer', {
runtime: Runtime.PYTHON_3_13,
handler: 'handler.handler',
code: Code.fromAsset('lambda/scorer'),
});
scorer.addEventSource(new S3EventSource(bucket, {
events: [EventType.OBJECT_CREATED],
}));
table.grantWriteData(scorer);
}
}
The lab
You'll build a mini detection pipeline: a Lambda triggered by S3 PutObject, that reads a small GOES NetCDF, threshold-detects hotspots, and writes detection records to a DynamoDB table. Deploy with AWS CDK. This is the same overall S3 → Lambda → DDB shape that powers most modern serverless geospatial scorers — the differentiating logic (which candidates clear the bar to become real alerts) stays with you and your operational context.
Connecting to Hawaiʻi: Community alerting infrastructure
The State of Hawaiʻi Emergency Management Agency's HI-EMA Alert system runs on cloud-native infrastructure not unlike what you'll learn this week. Pacific Disaster Center, based on Maui, runs the DisasterAWARE platform serving 90+ countries on similar AWS architecture. Knowing how this works means you can build the same kind of system for your own community: a flood-alert pipeline for a single watershed, a beach-closure alert for a stretch of coast, a coral-bleaching alert for a specific reef.
Hands-on lab: Mini detection pipeline
Build a Lambda triggered by S3 PutObject. The Lambda reads a small GOES NetCDF, threshold-detects hotspots, writes records to a DynamoDB table. Deploy with AWS CDK.
Quiz — click an answer to check it
No grade, no shame. Tap any option; you'll see if it's right plus the answer if not. The point is to notice what you already know and what's still settling.
- S3 event notification to Lambda function ARN
- Polling
- SNS only
- EventBridge only
- Decoupled event routing, scheduled rules, cross-service orchestration
- Database
- Just cron
- File storage
- Distribution and hot-partition behavior
- Cost only
- Nothing
- Display order
- Latency-sensitive endpoints; less for event-driven batch
- Always
- Never
- Only TypeScript
- Infrastructure-as-code in TypeScript / Python / Java / Go
- Just a CLI
- A database
- A managed service
Reflection
Take five minutes with this. Write your answer somewhere. Carry it into next week.