Back to Blog
IoT Engineering

IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

Managing hundreds of IoT devices manually doesn't scale — AWS IoT Core's Device Registry, fleet provisioning templates, and Jobs API automate the entire lifecycle. This guide shows engineers how to implement it correctly.

February 5, 2024
13 min read
AWS IoT CoreFleet ManagementDevice ProvisioningOTA

IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

When you have 10 devices, a spreadsheet works. When you have 10,000 devices, you need automated provisioning, group-based policy management, remote job execution, and real-time fleet health visibility. AWS IoT Core's fleet management features handle all of this — if you configure them correctly from day one.

The Device Registry: Your Source of Truth

The AWS IoT Core Device Registry is a managed database for every device in your fleet. Each entry (called a *Thing*) stores:

  • A unique device name (typically your serial number or MAC address)
  • Attributes: arbitrary key-value metadata (firmware version, hardware revision, location)
  • Thing type: a template that defines expected attributes
  • Group membership: one or more logical groups
  • Attributes are queryable via Fleet Indexing, making them critical for operational visibility.

    // Node.js: register a new device at factory
    const { IoTClient, CreateThingCommand, AttachThingPrincipalCommand } = require('@aws-sdk/client-iot')

    const iot = new IoTClient({ region: 'us-east-1' })

    async function registerDevice(serialNumber, hwRevision, location) { const thingName = device-${serialNumber}

    await iot.send(new CreateThingCommand({ thingName, thingTypeName: 'EnvironmentalSensor', attributePayload: { attributes: { hwRevision, location, firmwareVersion: '1.0.0', provisionedAt: new Date().toISOString(), }, }, }))

    console.log(Registered: ${thingName}) return thingName }

    Best practice: use your physical serial number as the Thing name. It creates a durable 1:1 mapping between hardware and cloud identity that survives firmware reflashes and certificate rotations.

    Fleet Provisioning Templates: Zero-Touch at Scale

    Manually creating certificates for each device doesn't scale past a few hundred units. Fleet Provisioning Templates let devices generate their own certificates and register automatically at first boot.

    The flow:

  • 1. Device ships with a single shared *provisioning claim certificate* (low-privilege, can only call CreateKeysAndCertificate and RegisterThing)
  • 2. At first boot, device connects with the claim cert, generates a unique key pair, registers itself via the template
  • 3. Template creates the Thing, attaches a permanent certificate, assigns the correct policy
  • 4. Device stores the permanent cert in flash and never uses the claim cert again
  • {
      "templateBody": {
        "Parameters": {
          "SerialNumber": { "Type": "String" },
          "HardwareRevision": { "Type": "String" },
          "AWS::IoT::Certificate::Id": { "Type": "String" }
        },
        "Resources": {
          "thing": {
            "Type": "AWS::IoT::Thing",
            "Properties": {
              "ThingName": { "Fn::Join": ["-", ["device", { "Ref": "SerialNumber" }]] },
              "ThingTypeName": "EnvironmentalSensor",
              "AttributePayload": {
                "hwRevision": { "Ref": "HardwareRevision" },
                "firmwareVersion": "1.0.0"
              }
            }
          },
          "certificate": {
            "Type": "AWS::IoT::Certificate",
            "Properties": {
              "CertificateId": { "Ref": "AWS::IoT::Certificate::Id" },
              "Status": "ACTIVE"
            }
          },
          "policy": {
            "Type": "AWS::IoT::Policy",
            "Properties": {
              "PolicyName": "SensorDevicePolicy"
            }
          }
        }
      }
    }
    

    This template runs at device first-boot and wires everything together automatically — no human intervention after the factory programs the claim certificate.

    Device Groups: Organizing Your Fleet

    Thing Groups let you apply policies, jobs, and logging rules to logical subsets of your fleet. Groups are hierarchical, which mirrors real-world deployments:

    FleetRoot
    ├── Building-A
    │   ├── Floor-1
    │   └── Floor-2
    ├── Building-B
    └── Staging
        └── QA-Devices
    

    A device inherits the policies of all groups in its ancestry. This means you can push a firmware update to Building-A without touching Building-B or Staging.

    Dynamic groups use Fleet Indexing queries instead of static membership — devices automatically join or leave based on their attributes:

    // Automatically group all devices running firmware < 2.0.0
    const { CreateDynamicThingGroupCommand } = require('@aws-sdk/client-iot')

    await iot.send(new CreateDynamicThingGroupCommand({ thingGroupName: 'LegacyFirmware', queryString: 'attributes.firmwareVersion < "2.0.0"', }))

    AWS IoT Jobs: Coordinated OTA Updates

    The Jobs API orchestrates any operation across a group of devices — firmware updates, configuration changes, certificate rotations. Each job tracks per-device status: queued → in-progress → succeeded/failed.

    // Create a firmware OTA job for a device group
    const { CreateJobCommand } = require('@aws-sdk/client-iot')

    async function createOtaJob(targetGroup, firmwareVersion, s3Url) { const jobId = ota-${firmwareVersion.replace(/./g, '-')}-${Date.now()}

    await iot.send(new CreateJobCommand({ jobId, targets: [arn:aws:iot:us-east-1:123456789:thinggroup/${targetGroup}], document: JSON.stringify({ operation: 'firmware_update', firmwareVersion, url: s3Url, checksum: 'sha256:abc123...', }), jobExecutionsRolloutConfig: { maximumPerMinute: 50, // rate-limit rollout exponentialRate: { baseRatePerMinute: 5, incrementFactor: 2, rateIncreaseCriteria: { numberOfSucceededThings: 20 }, }, }, abortConfig: { criteriaList: [{ action: 'CANCEL', failureType: 'FAILED', minNumberOfExecutedThings: 10, thresholdPercentage: 20, // abort if >20% fail }], }, timeoutConfig: { inProgressTimeoutInMinutes: 30 }, description: Firmware upgrade to ${firmwareVersion}, }))

    return jobId }

    The exponential rollout and automatic abort are critical in production. A bad firmware build that bricks 20% of the first 10 devices should not reach the remaining 9,990.

    Fleet Indexing and Search

    Fleet Indexing indexes Thing attributes, connectivity status, shadow state, and job execution status in near-real-time. This turns operational questions into simple queries:

    const { SearchIndexCommand } = require('@aws-sdk/client-iot')

    // Find all offline devices in Building-A const offline = await iot.send(new SearchIndexCommand({ queryString: 'connectivity.connected:false AND thingGroupNames:Building-A', maxResults: 250, }))

    // Find devices that failed the last OTA job const failedOta = await iot.send(new SearchIndexCommand({ queryString: 'jobExecution.ota-2-1-0.status:FAILED', }))

    console.log(Offline devices: ${offline.things.length}) console.log(OTA failures: ${failedOta.things.length})

    Enable indexing for REGISTRY, REGISTRY_AND_SHADOW, and CONNECTIVITY in your IoT Core settings. The cost is minimal ($0.25 per million indexed updates) and the operational value is enormous.

    Monitoring Fleet Health

    Connect IoT Core metrics to CloudWatch for alerting:

  • iot.NumConnectedDevices — current online count
  • iot.PublishIn.Success / iot.PublishIn.ClientError — message rate and error ratio
  • iot.NumSubscriptions — active subscriptions
  • Job execution success/failure rates via EventBridge events
  • Set a CloudWatch alarm when NumConnectedDevices drops more than 10% in 5 minutes — that's your early-warning for a regional network issue or a broken firmware build.

    Cost Considerations

    AWS IoT Core pricing has three components:

  • 1. Connectivity: $0.042 per million device minutes connected
  • 2. Messaging: $1.00 per million messages (first 1B/month)
  • 3. Device Shadow: $1.25 per million operations
  • At 10,000 devices sending one reading per minute, you're looking at ~$15/month for connectivity and ~$14/month for messaging — roughly $30/month for the broker layer. This is dramatically cheaper than running your own managed MQTT cluster.

    Device Shadow costs add up if you update shadow state on every telemetry publish. Only update the shadow when device *state* changes (firmware version updated, configuration changed), not on every sensor reading.

    For the full end-to-end architecture connecting firmware to this fleet management layer, see the [ESP32 MQTT AWS IoT Core Production Guide](/blog/esp32-mqtt-aws-iot-core-production-guide).

    Also pair this with [IoT Architecture Patterns](/blog/iot-architecture-patterns-2024) for guidance on how fleet management fits into your overall system topology.

    Need help with IoT fleet management at scale? [Contact Code Caracal](/contact) — we've shipped these systems for clients across 15+ countries.

    Written by CodeCaracal Engineering

    We write from production experience — every technique in our articles has been deployed to real clients. No academic theory.

    More Articles

    Business · 12 min read

    IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

    Business · 11 min read

    What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

    Business · 11 min read

    IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

    Business · 11 min read

    IoT Development Agency vs Building In-House: A Decision Framework for Founders

    IoT Dashboard · 13 min read

    Next.js IoT Analytics Dashboard: From Sensor Data to Production App

    Business · 11 min read

    How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

    IoT Dashboard · 11 min read

    IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

    IoT Dashboard · 12 min read

    Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

    Cloud & DevOps · 12 min read

    Containerizing IoT Backend Services with Docker: From Dev to Production

    IoT Dashboard · 14 min read

    Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

    IoT Dashboard · 12 min read

    Building Real-Time IoT Dashboards with React and Recharts

    Cloud & DevOps · 13 min read

    CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

    Mobile Development · 12 min read

    Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

    Cloud & DevOps · 14 min read

    Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

    Mobile Development · 10 min read

    Flutter IoT Alerts: Firebase Push Notifications for Device Events

    Cloud & DevOps · 12 min read

    Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

    Mobile Development · 11 min read

    Flutter + MQTT: Building Production IoT Mobile Apps That Scale

    Mobile Development · 13 min read

    Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

    Cloud & DevOps · 13 min read

    AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

    IoT Engineering · 13 min read

    Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

    IoT Engineering · 14 min read

    IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

    IoT Engineering · 14 min read

    Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

    Embedded Systems · 14 min read

    IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

    IoT Engineering · 14 min read

    Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

    Embedded Systems · 14 min read

    Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

    IoT Engineering · 13 min read

    IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

    IoT Engineering · 12 min read

    Edge Computing in IoT: When to Process On-Device vs In the Cloud

    IoT Engineering · 13 min read

    Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

    Embedded Systems · 14 min read

    ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

    IoT Engineering · 10 min read

    MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

    IoT Engineering · 14 min read

    IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

    Embedded Systems · 14 min read

    Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

    IoT Engineering · 12 min read

    WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

    Embedded Systems · 13 min read

    STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

    IoT Engineering · 13 min read

    IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

    IoT Engineering · 13 min read

    Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

    Embedded Systems · 13 min read

    UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

    IoT Engineering · 12 min read

    Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

    Embedded Systems · 12 min read

    ESP32 Partition Table: Designing Flash Layout for Production Firmware

    IoT Engineering · 12 min read

    IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

    Embedded Systems · 13 min read

    IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

    IoT Engineering · 13 min read

    Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

    Security · 14 min read

    Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

    Embedded Systems · 14 min read

    FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

    IoT Engineering · 12 min read

    Building a Production IoT Gateway with Raspberry Pi and Node.js

    Embedded Systems · 13 min read

    ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

    Mobile Development · 10 min read

    Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

    IoT Engineering · 11 min read

    MQTT vs HTTP for IoT: Which Protocol Wins in Production?

    IoT Engineering · 12 min read

    ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

    Let's Build Together

    Got an IoT challenge?
    We've shipped it.

    Whether you need a fleet to track, a factory to monitor, or a farm to automate — our team has done it before and we'd love to build it with you. Typical response time: under 24 hours.

    No upfront commitment99.9% uptime SLANDA on requestFixed-price options