Back to Blog
Cloud & DevOps

Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

ECS Fargate, Lambda, and EC2 each solve different IoT backend problems. Picking the wrong one means paying for idle capacity, fighting cold-start latency, or managing servers you should never have touched. Here is how we decide.

August 25, 2024
12 min read
AWSECS FargateLambdaEC2

Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

The three AWS compute options — ECS Fargate, Lambda, and EC2 — are not interchangeable. Each one has a workload profile it is designed for, and IoT backends are diverse enough that a single product might use all three simultaneously. Getting this decision wrong is expensive: Lambda's cold starts will kill your real-time WebSocket latency; EC2 will charge you for idle CPU at 3 AM when your devices are sleeping; Fargate will frustrate you when you actually need a long-running stateful process.

This guide walks through the decision framework we use at Code Caracal for every IoT backend we deploy.

The Three Workload Archetypes

Before comparing services, define your workload:

Always-on, stateful connections: MQTT bridges, WebSocket servers, real-time notification dispatchers. These processes hold open connections; they cannot restart mid-stream.

Event-driven, short-duration processing: Parsing a telemetry message, writing to DynamoDB, sending an alert email, running a data validation step. These fire, execute, and finish.

Heavy or long-running compute: ML model inference, video transcoding from a camera feed, large ETL jobs on historical sensor data. These need consistent CPU/memory for minutes to hours.

ECS Fargate: The Right Choice for Always-On IoT Services

If your IoT backend includes a WebSocket server, a persistent MQTT bridge, or any long-lived process that holds state in memory, ECS Fargate is the answer.

Lambda will not work here. AWS Lambda has a maximum execution duration of 15 minutes and cannot maintain a WebSocket connection across invocations. Attempting to use Lambda as a WebSocket server forces you to push all state to DynamoDB or ElastiCache on every message — adding 20–50 ms of latency per round trip and significant cost at scale.

Here is a production-ready ECS task definition for a Node.js MQTT bridge service:

{
  "family": "iot-mqtt-bridge",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::ACCOUNT_ID:role/iotMqttBridgeRole",
  "containerDefinitions": [
    {
      "name": "mqtt-bridge",
      "image": "ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/mqtt-bridge:latest",
      "essential": true,
      "portMappings": [
        { "containerPort": 3000, "protocol": "tcp" },
        { "containerPort": 1883, "protocol": "tcp" }
      ],
      "environment": [
        { "name": "NODE_ENV", "value": "production" },
        { "name": "MQTT_BROKER_URL", "value": "mqtt://localhost:1883" }
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:iot/db-password"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 10
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/mqtt-bridge",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Auto-scaling on Fargate: Attach an Application Auto Scaling policy to your ECS service. For WebSocket servers, scale on the custom metric ActiveConnections rather than CPU — a well-written WebSocket server is async and will show low CPU even with thousands of connections.

Lambda: The Right Choice for Event-Driven IoT Processing

Lambda excels at the processing layer behind AWS IoT Core. The Rules Engine publishes a message to an SQS queue or invokes Lambda directly; Lambda transforms the payload, writes to DynamoDB or InfluxDB, and terminates. Total execution time: 50–300 ms.

// Lambda handler: process telemetry from IoT Rules Engine
exports.handler = async (event) => {
  const { deviceId, temperature, humidity, timestamp } = event;

// Validate range if (temperature < -40 || temperature > 125) { console.warn(Out-of-range reading from ${deviceId}: ${temperature}°C); return { statusCode: 400, body: 'Invalid reading' }; }

// Write to DynamoDB await dynamodb.put({ TableName: process.env.TABLE_NAME, Item: { pk: DEVICE#${deviceId}, sk: TS#${timestamp}, temperature, humidity, ttl: Math.floor(Date.now() / 1000) + 7776000, // 90-day TTL }, }).promise();

// Trigger alert if threshold breached if (temperature > 80) { await sns.publish({ TopicArn: process.env.ALERT_TOPIC_ARN, Message: JSON.stringify({ deviceId, temperature, timestamp }), Subject: 'High temperature alert', }).promise(); }

return { statusCode: 200 }; };

The cold start problem: Lambda cold starts for a Node.js function average 200–400 ms. For IoT telemetry processing this is acceptable — devices do not notice processing latency. Where cold starts are unacceptable is API endpoints that your Flutter app or React dashboard calls directly. Provision concurrency for those functions, or route them through ECS instead.

At 1 million Lambda invocations per month (typical for a 500-device fleet sending data every 30 seconds), cost is approximately $0.20/month — essentially free.

EC2: When You Actually Need It

EC2 earns its place for heavy, long-running compute that exceeds Lambda's 15-minute limit or requires persistent local storage. Common IoT use cases:

  • ML inference on a large model (> 1 GB) that cannot tolerate Lambda's cold start
  • InfluxDB or TimescaleDB self-hosted to avoid managed database costs at scale
  • Custom MQTT broker (Mosquitto, EMQX) when AWS IoT Core per-message pricing becomes prohibitive above 100M messages/day
  • For EC2, use Reserved Instances for baseline capacity and Spot Instances for batch workloads. A t3.medium Reserved Instance (1-year, no upfront) runs approximately $22/month versus $35/month on-demand.

    Cost Comparison at Scale

    | Scale | Lambda | ECS Fargate (0.5 vCPU / 1 GB) | EC2 t3.medium | |---|---|---|---| | 100K msgs/day | ~$0.02/month | $25/month | $35/month | | 1M msgs/day | ~$0.20/month | $25/month | $35/month | | 10M msgs/day | ~$2/month | $50/month (scaled) | $70/month | | 100M msgs/day | ~$20/month | $200/month (scaled) | $140/month |

    At very high message volumes, Lambda remains cheapest for pure processing. Fargate becomes expensive only if you over-provision task count. EC2 wins for sustained heavy compute but requires operational overhead.

    Our Decision Rule

    Apply this in order:

  • 1. Does the service need persistent connections (WebSocket, MQTT)? → ECS Fargate
  • 2. Is it triggered by events and completes in < 15 minutes? → Lambda
  • 3. Does it run continuously, need large local disk, or require GPU? → EC2
  • Most IoT backends use all three: Fargate for the real-time layer, Lambda for the processing layer, EC2 (or RDS) for the database layer.

    ---

    Not sure which compute mix fits your IoT product? [Contact Code Caracal](/contact) and we will design your AWS architecture in a free scoping session.

    Written by CodeCaracal Engineering

    We write from production experience — every technique in our articles has been deployed to real clients. No academic theory.

    More Articles

    Business · 12 min read

    IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

    Business · 11 min read

    What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

    Business · 11 min read

    IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

    Business · 11 min read

    IoT Development Agency vs Building In-House: A Decision Framework for Founders

    IoT Dashboard · 13 min read

    Next.js IoT Analytics Dashboard: From Sensor Data to Production App

    Business · 11 min read

    How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

    IoT Dashboard · 11 min read

    IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

    IoT Dashboard · 12 min read

    Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

    Cloud & DevOps · 12 min read

    Containerizing IoT Backend Services with Docker: From Dev to Production

    IoT Dashboard · 14 min read

    Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

    IoT Dashboard · 12 min read

    Building Real-Time IoT Dashboards with React and Recharts

    Cloud & DevOps · 13 min read

    CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

    Mobile Development · 12 min read

    Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

    Cloud & DevOps · 14 min read

    Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

    Mobile Development · 10 min read

    Flutter IoT Alerts: Firebase Push Notifications for Device Events

    Mobile Development · 11 min read

    Flutter + MQTT: Building Production IoT Mobile Apps That Scale

    Mobile Development · 13 min read

    Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

    Cloud & DevOps · 13 min read

    AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

    IoT Engineering · 13 min read

    Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

    IoT Engineering · 14 min read

    IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

    IoT Engineering · 14 min read

    Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

    Embedded Systems · 14 min read

    IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

    IoT Engineering · 14 min read

    Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

    Embedded Systems · 14 min read

    Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

    IoT Engineering · 13 min read

    IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

    IoT Engineering · 12 min read

    Edge Computing in IoT: When to Process On-Device vs In the Cloud

    IoT Engineering · 13 min read

    Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

    Embedded Systems · 14 min read

    ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

    IoT Engineering · 10 min read

    MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

    IoT Engineering · 14 min read

    IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

    Embedded Systems · 14 min read

    Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

    IoT Engineering · 12 min read

    WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

    Embedded Systems · 13 min read

    STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

    IoT Engineering · 13 min read

    IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

    IoT Engineering · 13 min read

    Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

    Embedded Systems · 13 min read

    UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

    IoT Engineering · 12 min read

    Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

    Embedded Systems · 12 min read

    ESP32 Partition Table: Designing Flash Layout for Production Firmware

    IoT Engineering · 12 min read

    IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

    Embedded Systems · 13 min read

    IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

    IoT Engineering · 13 min read

    Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

    Security · 14 min read

    Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

    Embedded Systems · 14 min read

    FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

    IoT Engineering · 12 min read

    Building a Production IoT Gateway with Raspberry Pi and Node.js

    Embedded Systems · 13 min read

    ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

    Mobile Development · 10 min read

    Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

    IoT Engineering · 13 min read

    IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

    IoT Engineering · 11 min read

    MQTT vs HTTP for IoT: Which Protocol Wins in Production?

    IoT Engineering · 12 min read

    ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

    Let's Build Together

    Got an IoT challenge?
    We've shipped it.

    Whether you need a fleet to track, a factory to monitor, or a farm to automate — our team has done it before and we'd love to build it with you. Typical response time: under 24 hours.

    No upfront commitment99.9% uptime SLANDA on requestFixed-price options