Edge Computing in IoT: When to Process On-Device vs In the Cloud

"Just send everything to the cloud and process it there" sounds simple, but it creates systems that are slow, expensive, and fragile. Every byte you transmit costs money and power. Every cloud round-trip adds latency. Every connectivity hiccup risks a system failure.

The right question isn't "cloud or edge?" — it's "what belongs where?" The answer is different for every data type, use case, and business constraint.

The Decision Framework

Ask these four questions for each processing task:

1. Latency: How fast must action be taken?

< 10ms required   → On-device only
10ms – 100ms      → Edge gateway
100ms – 2 seconds → Cloud feasible (good connectivity)
2 seconds       → Cloud fine, batch processing also viable

A conveyor belt safety system that must stop the motor if a hand is detected has a latency budget of ~10ms — no cloud round-trip can meet this. The inference must run on-device.

A temperature alert that sends a notification when a server room overheats has a budget of several seconds. Cloud processing is fine.

2. Bandwidth: How much data does the raw stream generate?

| Sensor type | Raw data rate | Cloud-viable? | |-------------|-------------|---------------| | Temperature (1 Hz) | 8 bytes/s | Yes | | GPS (1 Hz) | 20 bytes/s | Yes | | Audio (16kHz, 16-bit) | 32 KB/s | Marginal | | Vibration (10kHz, 3-axis) | 240 KB/s | No — process at edge | | HD video (720p) | 2 MB/s | No — run inference at edge |

High-frequency sensors almost always require edge processing to reduce data to meaningful events or aggregates before cloud upload.

3. Privacy: Can this data leave the device/facility?

Healthcare patient monitoring, industrial process data with trade secret implications, and consumer home monitoring all have privacy requirements that may prohibit transmitting raw data to the cloud. Edge processing extracts only the necessary inference result (event detected: yes/no) without exposing raw sensor streams.

4. Cost: What's the marginal cost of processing each unit?

Cloud compute and data transfer costs at scale. Run the math:

1,000 devices × 10 kHz vibration sensor × 3 axes × 4 bytes × 86,400 seconds = 10,368 GB/day raw data

AWS data transfer: $0.09/GB × 10,368 = $933/day = $340,000/year vs. Edge gateway + edge processing: ~$50/device one-time + $5/month connectivity = $50,000 capex + $60,000/year opex (at $5/month/device)

The edge wins by orders of magnitude on bandwidth-heavy sensors.

On-Device ML: TensorFlow Lite on ESP32

TensorFlow Lite for Microcontrollers (TFLM) runs quantized neural networks on devices with as little as 256KB of RAM. The ESP32 has 520KB SRAM and optionally 8MB PSRAM — enough for keyword detection, gesture recognition, and simple anomaly detection.

// ESP32: run TFLM inference for vibration anomaly detection
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "model_vibration_anomaly.h" // quantized INT8 model, generated offline
namespace {
  const int kTensorArenaSize = 60 * 1024; // 60KB arena
  uint8_t tensor_arena[kTensorArenaSize];
  tflite::MicroMutableOpResolver<4> resolver;
  tflite::MicroInterpreter* interpreter = nullptr;
  TfLiteTensor* input = nullptr;
  TfLiteTensor* output = nullptr;
}
void setupInference() {
  resolver.AddConv2D();
  resolver.AddMaxPool2D();
  resolver.AddFullyConnected();
  resolver.AddSoftmax();
  static tflite::MicroInterpreter static_interpreter(
    tflite::GetModel(g_vibration_model_data), resolver,
    tensor_arena, kTensorArenaSize
  );
  interpreter = &static_interpreter;
  interpreter->AllocateTensors();
  input  = interpreter->input(0);
  output = interpreter->output(0);
}
bool detectAnomaly(float* fftFeatures, int featureCount) {
  // Copy quantized features into input tensor
  for (int i = 0; i < featureCount; i++) {
    input->data.int8[i] = static_cast(
      (fftFeatures[i] / input->params.scale) + input->params.zero_point
    );
  }
  interpreter->Invoke();
  // Output: [normal_score, anomaly_score]
  float anomalyScore = (output->data.int8[1] - output->params.zero_point)
                       * output->params.scale;
  return anomalyScore > 0.85f; // 85% confidence threshold
}
void loop() {
  float fftFeatures[64];
  collectVibrationFFT(fftFeatures, 64); // sample + FFT on device  if (detectAnomaly(fftFeatures, 64)) {
    // Only publish event to cloud, NOT raw FFT data
    publishEvent("vibration_anomaly", millis());
  }
  // Raw data never leaves the device
}

TFLM model constraints on ESP32:

Model size: < 1MB (fits in flash)

Arena (RAM for activations): < 300KB (< PSRAM capacity)

Inference time: 10–200ms depending on model complexity

INT8 quantization required (no float32 on most MCUs)

Edge Gateway Processing

The Raspberry Pi or industrial gateway PC is the natural home for more complex processing that doesn't fit on a microcontroller.

// Node.js gateway: sliding window anomaly detection
const { InferenceSession, Tensor } = require('onnxruntime-node')
let session = null
async function loadModel() {
  session = await InferenceSession.create('/opt/models/vibration_lstm.onnx')
}
// Sliding window buffer per device
const windows = new Map()
async function processReading(deviceId, readings) {
  if (!windows.has(deviceId)) windows.set(deviceId, [])
  const window = windows.get(deviceId)
  window.push(...readings)
  // LSTM needs 256 time steps of 3-axis data
  if (window.length < 256) return null
  if (window.length > 256) window.splice(0, window.length - 256)
  const inputData = new Float32Array(window.flatMap(r => [r.x, r.y, r.z]))
  const tensor = new Tensor('float32', inputData, [1, 256, 3])
  const results = await session.run({ input: tensor })
  const anomalyScore = results.output.data[0]  if (anomalyScore > 0.85) {
    return {
      deviceId,
      type: 'vibration_anomaly',
      score: anomalyScore,
      ts: Date.now(),
    }
  }
  return null
}

The gateway's advantages over on-device:

Full Node.js or Python environment — use any ONNX or TensorFlow SavedModel

4–8 GB RAM — no model size constraints

Can process data from 10–100 sensors in parallel

Model updates via file copy, no firmware flash required

Cloud Processing: What Belongs There

Not everything should move to the edge. Cloud processing is the right choice for:

Fleet-level analytics: Detecting that 5% of a 10,000-device fleet has a subtle drift in calibration requires comparing data across all devices simultaneously — a task that requires a database and compute that no edge node can provide.

Model retraining: The cloud trains the next version of your edge model on accumulated labeled data, then pushes the updated model back to edge gateways and devices. The edge runs inference; the cloud does training.

Historical analysis: "What was the average efficiency of production line 3 for the last 6 months?" requires historical data storage and query — that's cloud territory.

Regulatory archiving: Long-term data retention for compliance (SCADA records, energy metering) belongs in cloud storage with access controls and audit logs.

The Fog Computing Pattern

Fog computing is the formal name for a hierarchy of processing layers — not just cloud and device, but cloud → regional server → site gateway → device. Each layer processes what it can locally and only escalates what it must.

Device (ESP32) → filter: only non-zero readings pass → inference: anomaly detection (TFLM) → transmit: events only Site Gateway (Raspberry Pi) → aggregate: 1-second summaries of raw sensor values → inference: multi-sensor correlation (ONNX) → transmit: summaries + anomaly events Regional Server (EC2 / on-prem) → batch analytics: shift-level production reports → model serving: high-accuracy models too large for Pi → transmit: reports + aggregated events

Cloud (AWS) → fleet analytics: cross-site comparisons → model training: update TFLM and ONNX models → dashboards + APIs: enterprise reporting

For the architecture patterns that define how these layers connect, see [IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid](/blog/iot-architecture-patterns-2024).

For implementing the gateway layer, see [Building a Production IoT Gateway with Raspberry Pi and Node.js](/blog/raspberry-pi-iot-gateway-nodejs).

For the security implications of processing data at the edge, see [IoT Security: Zero Trust for Embedded Systems](/blog/iot-security-zero-trust-embedded-systems).

Need help designing your edge computing strategy? [Contact Code Caracal](/contact) — we've shipped these systems for clients across 15+ countries.

Edge Computing in IoT: When to Process On-Device vs In the Cloud

Edge Computing in IoT: When to Process On-Device vs In the Cloud

The Decision Framework

1. Latency: How fast must action be taken?

2. Bandwidth: How much data does the raw stream generate?

3. Privacy: Can this data leave the device/facility?

4. Cost: What's the marginal cost of processing each unit?

On-Device ML: TensorFlow Lite on ESP32

Edge Gateway Processing

Cloud Processing: What Belongs There

The Fog Computing Pattern

More Articles

IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

IoT Development Agency vs Building In-House: A Decision Framework for Founders

Next.js IoT Analytics Dashboard: From Sensor Data to Production App

How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

Containerizing IoT Backend Services with Docker: From Dev to Production

Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

Building Real-Time IoT Dashboards with React and Recharts

CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

Flutter IoT Alerts: Firebase Push Notifications for Device Events

Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

Flutter + MQTT: Building Production IoT Mobile Apps That Scale

Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

ESP32 Partition Table: Designing Flash Layout for Production Firmware

IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

Building a Production IoT Gateway with Raspberry Pi and Node.js

ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

MQTT vs HTTP for IoT: Which Protocol Wins in Production?

ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

Got an IoT challenge?We've shipped it.

Got an IoT challenge?
We've shipped it.