Back to Blog
IoT Engineering

ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

Most IoT tutorials get you to "blink an LED." This guide shows you how to architect an ESP32-based IoT system that handles 10,000+ devices in production with TLS security, OTA updates, and sub-100ms latency.

January 15, 2024
12 min read
ESP32MQTTAWS IoT CoreProduction

ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

Most IoT tutorials teach you to blink an LED or send a single sensor reading to a free MQTT broker. That's fine for learning. But when you're deploying 100+ devices for a real client, you need a different mindset entirely.

In this guide, I'll walk through the exact architecture we use at CodeCaracal for production IoT systems — the kind that runs with 99.9% uptime SLAs.

The Production Stack

Our proven stack for end-to-end IoT:

ESP32 Firmware (C/C++)
    ↓ TLS 1.3 MQTT
AWS IoT Core (MQTT Broker + Rules Engine)
    ↓
Node.js Backend (WebSocket + REST)
    ↓
InfluxDB (Time-series storage)
    ↓
React/Next.js Dashboard (Real-time)
    ↓
Flutter App (Mobile)

Step 1: ESP32 Firmware — Security from Day One

Never ship firmware without TLS. Period.

#include 
#include 

const char* AWS_IOT_ENDPOINT = "your-endpoint.iot.us-east-1.amazonaws.com"; const int AWS_IOT_PORT = 8883;

// Certificate store — embed at compile time extern const char AWS_ROOT_CA[] asm("_binary_AmazonRootCA1_pem_start"); extern const char DEVICE_CERT[] asm("_binary_certificate_pem_crt_start"); extern const char DEVICE_KEY[] asm("_binary_private_pem_key_start");

WiFiClientSecure tlsClient; PubSubClient mqttClient(tlsClient);

void setupMQTT() { tlsClient.setCACert(AWS_ROOT_CA); tlsClient.setCertificate(DEVICE_CERT); tlsClient.setPrivateKey(DEVICE_KEY); mqttClient.setServer(AWS_IOT_ENDPOINT, AWS_IOT_PORT); mqttClient.setCallback(messageHandler); mqttClient.setBufferSize(1024); }

Key firmware design principles

  • 1. Non-blocking publish loop — never block loop() waiting for MQTT
  • 2. Reconnect backoff — exponential backoff prevents thundering herd on broker
  • 3. Message queuing — buffer outbound messages when disconnected
  • 4. Watchdog timer — hardware watchdog resets device if firmware hangs
  • Step 2: AWS IoT Core Configuration

    AWS IoT Core gives you a fully managed MQTT broker with fleet management, rules engine, and device shadows.

    Device Shadow for State Sync

    Device shadows solve a critical problem: what happens when the device is offline when you send a command?

    {
      "state": {
        "desired": {
          "relay1": true,
          "brightness": 80
        },
        "reported": {
          "relay1": false,
          "brightness": 0,
          "temperature": 24.5,
          "firmware": "v2.3.1"
        }
      }
    }
    

    When the device comes online, it reads the delta and applies the desired state. Clean, reliable, offline-safe.

    IoT Rules Engine

    Route telemetry to multiple targets simultaneously:

    SELECT *, topic(3) as deviceId, timestamp() as ts
    FROM 'devices/+/telemetry'
    WHERE temperature > 0 AND humidity BETWEEN 0 AND 100
    

    Route to: Kinesis (stream), DynamoDB (device state), SNS (alerts), Lambda (processing).

    Step 3: OTA Updates at Scale

    OTA is where many IoT projects fall apart. Here's the architecture:

  • 1. Version manifest stored in S3, served via CloudFront
  • 2. ESP32 polls manifest URL every 6 hours
  • 3. Rollback mechanism — two firmware slots, boot from known-good on failure
  • 4. Staged rollout — deploy to 5% of fleet, monitor error rates, then roll out
  • void checkOTA() {
      HTTPClient http;
      http.begin("https://cdn.codecaracal.dev/firmware/manifest.json");

    if (http.GET() == 200) { StaticJsonDocument<512> manifest; deserializeJson(manifest, http.getString());

    const char* latestVersion = manifest["version"]; if (strcmp(latestVersion, CURRENT_VERSION) > 0) { Serial.printf("OTA: %s → %s\n", CURRENT_VERSION, latestVersion); performOTA(manifest["url"]); } } }

    Step 4: Scalable Backend

    At 10,000 devices publishing every 5 seconds, you're handling 2,000 messages/second. Your backend needs to be ready.

    Time-Series Data with InfluxDB

    InfluxDB is purpose-built for IoT telemetry:

    // Write telemetry to InfluxDB
    const point = new Point('sensor_reading')
      .tag('device_id', deviceId)
      .tag('location',  device.location)
      .floatField('temperature', data.temperature)
      .floatField('humidity',    data.humidity)
      .timestamp(new Date())

    await writeApi.writePoint(point)

    InfluxDB handles billions of data points efficiently with automatic retention policies and continuous queries for downsampling.

    Production Checklist

    Before going live with any IoT system, verify:

  • TLS certificates rotated annually (automate with AWS Certificate Manager)
  • Device attestation — every device has unique credentials
  • Rate limiting on MQTT topics — prevent misbehaving devices from flooding broker
  • Dead letter queue for failed messages
  • Alerting on device heartbeat — detect silent failures
  • Firmware rollback tested and verified
  • Load tested at 2× expected peak
  • Conclusion

    Production IoT engineering is 20% clever firmware and 80% boring reliability engineering. The teams who ship reliable IoT systems are the ones who've thought deeply about failure modes, security from day one, and observability at every layer.

    If you're building a system like this, [reach out to us](/contact) — we've shipped this stack dozens of times and can help you avoid the landmines.

    Written by CodeCaracal Engineering

    We write from production experience — every technique in our articles has been deployed to real clients. No academic theory.

    More Articles

    Business · 12 min read

    IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

    Business · 11 min read

    What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

    Business · 11 min read

    IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

    Business · 11 min read

    IoT Development Agency vs Building In-House: A Decision Framework for Founders

    IoT Dashboard · 13 min read

    Next.js IoT Analytics Dashboard: From Sensor Data to Production App

    Business · 11 min read

    How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

    IoT Dashboard · 11 min read

    IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

    IoT Dashboard · 12 min read

    Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

    Cloud & DevOps · 12 min read

    Containerizing IoT Backend Services with Docker: From Dev to Production

    IoT Dashboard · 14 min read

    Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

    IoT Dashboard · 12 min read

    Building Real-Time IoT Dashboards with React and Recharts

    Cloud & DevOps · 13 min read

    CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

    Mobile Development · 12 min read

    Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

    Cloud & DevOps · 14 min read

    Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

    Mobile Development · 10 min read

    Flutter IoT Alerts: Firebase Push Notifications for Device Events

    Cloud & DevOps · 12 min read

    Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

    Mobile Development · 11 min read

    Flutter + MQTT: Building Production IoT Mobile Apps That Scale

    Mobile Development · 13 min read

    Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

    Cloud & DevOps · 13 min read

    AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

    IoT Engineering · 13 min read

    Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

    IoT Engineering · 14 min read

    IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

    IoT Engineering · 14 min read

    Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

    Embedded Systems · 14 min read

    IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

    IoT Engineering · 14 min read

    Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

    Embedded Systems · 14 min read

    Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

    IoT Engineering · 13 min read

    IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

    IoT Engineering · 12 min read

    Edge Computing in IoT: When to Process On-Device vs In the Cloud

    IoT Engineering · 13 min read

    Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

    Embedded Systems · 14 min read

    ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

    IoT Engineering · 10 min read

    MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

    IoT Engineering · 14 min read

    IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

    Embedded Systems · 14 min read

    Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

    IoT Engineering · 12 min read

    WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

    Embedded Systems · 13 min read

    STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

    IoT Engineering · 13 min read

    IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

    IoT Engineering · 13 min read

    Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

    Embedded Systems · 13 min read

    UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

    IoT Engineering · 12 min read

    Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

    Embedded Systems · 12 min read

    ESP32 Partition Table: Designing Flash Layout for Production Firmware

    IoT Engineering · 12 min read

    IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

    Embedded Systems · 13 min read

    IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

    IoT Engineering · 13 min read

    Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

    Security · 14 min read

    Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

    Embedded Systems · 14 min read

    FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

    IoT Engineering · 12 min read

    Building a Production IoT Gateway with Raspberry Pi and Node.js

    Embedded Systems · 13 min read

    ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

    Mobile Development · 10 min read

    Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

    IoT Engineering · 13 min read

    IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

    IoT Engineering · 11 min read

    MQTT vs HTTP for IoT: Which Protocol Wins in Production?

    Let's Build Together

    Got an IoT challenge?
    We've shipped it.

    Whether you need a fleet to track, a factory to monitor, or a farm to automate — our team has done it before and we'd love to build it with you. Typical response time: under 24 hours.

    No upfront commitment99.9% uptime SLANDA on requestFixed-price options