Post-Incident Mortem

At Least It Works
52,651 SMS_

One topic change. One wildcard match. One extremely enthusiastic alert pipeline.

April 3, 2026 • BHS Manufacturing • Plant 2

01
01

Everything Was Fine
Which, in hindsight, was suspicious

The oven published to its own isolated topic. No wildcard rules matched it. No accidental text-message performance art was taking place.

ESP32-C6
P2-Oven
MQTT
bhs/events/P2/DryContact/Oven
BhsP2OvenRule
exact match
DynamoDB
ButtonEvents

Meanwhile, the MachineShop wildcard rule was sitting quietly in the corner, minding its own business.

Other stations
BhsP2MachineShopRule
bhs/events/P2/MachineShop/+
Lambda → SMS
02
02

One Line Changed
And the oven found a new social circle

platformio.ini — one small edit, one large personality shift
- -D DEVICE_CLASS="DryContact"
+ -D DEVICE_CLASS="MachineShop"

This changed the MQTT topic from an isolated oven path
bhs/events/P2/DryContact/Oven
to a MachineShop path the wildcard rule could now match
bhs/events/P2/MachineShop/Oven

Technically valid. Operationally a little too exciting.

03
03

The Wildcard Took It Personally

Once the oven started publishing under MachineShop/Oven, the MachineShop wildcard rule began catching every event.

ESP32-C6
P2-Oven
MQTT
bhs/events/P2/MachineShop/Oven
BhsP2MachineShopRule
wildcard + matches!
Lambda
NO THROTTLE
SMS × ∞
every event

So now every oven event was doing two jobs: behaving normally and texting people with far too much confidence.
~115 SMS per minute. No cooldown. No cap. No one saying “maybe stop.”

04

6 Hours. Full Commitment.

+1-314-346-9027
+1-314-224-6046
0
MachineShop Texts
0
Button Handler Texts
~$618
Unexpected Enthusiasm Budget
05
04

How a Normal Day Became a Presentation

06
05
👨‍💻

“I stopped the texts.
Why is Twilio still charging us?”

At 05:03, Ting disabled the 5 oven rules. His own SMS stopped. Problem solved — or so he thought. Hours later, Twilio recharge emails kept arriving. That's when he realized his coworker was also getting flooded — from a completely different Lambda, triggered by the MachineShop wildcard rule. In other words: the infrastructure believed in overcommunication.

Two phones. Two Lambdas. Two different flavors of chaos.
04:50 Flood starts on both phones simultaneously
05:03 Ting disables oven rules — his SMS stops
05:04–11:03 Coworker still getting flooded via MachineShop Lambda
~10:00 Twilio recharge emails arrive — credits burning fast
11:03 Ting disables ALL 9 remaining IoT rules — flood stops
07
06

Three Gaps, One Flood

1
No wildcard collision check
Terraform and CloudFormation rule sets operated independently. Changing DEVICE_CLASS moved the device into a wildcard namespace invisible to Terraform.
2
No throttle on SMS Lambda
The MachineShop Lambda sent SMS on every invocation with zero rate limiting. The button handler's throttle was fail-open — DynamoDB errors silently bypassed.
3
No monitoring or alarms
Zero CloudWatch alarms on any SMS Lambda. Zero Twilio spend alerts. The flood ran 6 hours before manual detection.
08
07

Before & After
How We Stopped Letting the Oven Freelance

Before (too trusting)

ESP32 → MachineShop/Oven
  ↓ matched by wildcard MachineShop/+
  ↓ Lambda (no throttle)
  ↓ Twilio SMS ×
  ↓ no alarm
  = one very committed incident

After (less dramatic)

ESP32 → Oven/Oven (isolated)
  ↓ exact match only
  ↓ Lambda (fail-closed throttle)
  ↓ concurrency = 1
  ↓ CloudWatch alarm < 5 min
  = max 1 SMS every 5 min
09
08

Defense-in-Depth
Because one fix is a promise. Five fixes are a strategy.

1Topic Isolationthe oven stays in its lane
2Fail-Closed Throttlemax 1 SMS / 5 min
3Concurrency Capmax 1 concurrent exec
4CloudWatch Alarmsbilling is not first responder
5Battery Alert Capmax 3 per cycle
Worst-case blast radius reduction: 50,000× smaller Max 1 SMS before throttle blocks further sends. Alarm fires within 5 minutes.
10
09

The Pipeline, Now With Guardrails

ESP32-C6 P2-Oven AWS IoT Core bhs/events/P2/ Oven/Oven isolated • no wildcard 🛡 IoT Rule exact match only WHERE message_type = 'button_press' Lambda oven-button-handler concurrency = 1 fail-closed throttle 🛡 DynamoDB OvenStatus-prod 5-min throttle window 🛡 📱 SMS max 1 / 5 min CloudWatch Alarms >10 inv/5min → ALARM >3 errors/5min → ALARM monitors ALL 3 plants metrics SNS → SMS instant alert to on-call +1-314-224-6046 🛡 Kill Switch M5Stack Button → concurrency = 0 all 3 plants emergency override Battery Alert max 3 per critical cycle 30-min post-recovery cooldown

Every arrow is a gate. Every node has a cap. Every alarm fires before billing does.

11
10

The Physical Kill Switch
Because sometimes you need a button, not a console

KILL
M5Stack sends bhs/control/throttle
IoT Rule → Throttle Controller Lambda
Sets concurrency = 0 on all targets
Click the button to simulate →
P1 Welding
CloudFormation
1
P2 Oven
Terraform
1
Machine Shop
SAM
1
One button. Three plants. Zero SMS until you say so.
12
11

CloudWatch — The Early Warning System
Billing is no longer the first responder

CloudWatch Alarm Hub Oven Button >10 inv / 5 min OK Battery Alert >6 inv / 5 min OK MachineShop >20 inv / 5 min OK Error Rate >3 errors / 5 min OK SNS → SMS Alert instant notification to on-call
Before: 6 hours to detect • Now: < 5 minutes
13
12

The System Was Fast. Our Guardrails Were Not.

LambdaSMS SentRecipientThrottle
bhs-machineShop-iiot-button-handler50,953+1-314-346-9027None
bhs-oven-button-handler1,692+1-314-224-6046Fail-open
bhs-oven-battery-alert6+1-314-224-6046Max 3/cycle
$622.12
Actual Twilio bill • $437 SMS + $185 carrier fees • vs. ~$4/mo normal
14
13

Follow-Up Actions
So this becomes a lesson, not a recurring subscription

01Set up Twilio monthly spend alert at $5 threshold
02Add WHERE filters to BhsP2MachineShopRule to limit blast radius
03Decouple DEVICE_CLASS from MQTT topic path in firmware
04Maintain shared IoT rule registry across CloudFormation + Terraform
05Pre-deploy CI check: audit all wildcard topic rules before terraform apply
15
End of Report

So yes — it works.
Now it also behaves.

The button worked. The SMS pipeline worked. The wildcard worked. We have now politely asked them not to collaborate like this again.

Prepared by Ting Xu • April 5, 2026

Navigate: Arrow keys • Space • Scroll

16