Back to Blog
IoT Engineering

Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

Rule-based alerts tell you a machine has already failed. Machine learning on IoT sensor data tells you it will fail in 72 hours. Here is how to build both and when to upgrade.

July 1, 2024
14 min read
Predictive MaintenanceMachine LearningIoTSageMaker

Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

A factory floor has 200 motors. Each motor has a vibration sensor, a temperature sensor, and a current clamp. Without any analytics, maintenance is scheduled: every 90 days, a technician checks every motor. Failures between visits are surprises — expensive surprises.

With threshold alerts, you catch failures as they happen. The motor temperature hits 85°C, an alarm fires, a technician responds within the hour. Better, but still reactive.

With predictive maintenance IoT machine learning, you catch failure signatures 48–96 hours before a motor fails. The maintenance team schedules a planned replacement during a shift change instead of scrambling for an emergency repair during peak production. That difference is worth six figures annually in a mid-sized manufacturing facility.

This guide covers the full progression: from sensor selection to rule-based detection to ML models, with real implementation code at each stage.

Sensor Selection for Predictive Maintenance

The sensors you choose determine what failure modes you can detect.

| Sensor | Failure modes detected | Sampling rate | |---|---|---| | Vibration (accelerometer) | Bearing wear, imbalance, misalignment, looseness | 1–10 kHz | | Temperature (thermistor/IR) | Overloading, bearing failure, cooling blockage | 1–10 Hz | | Current clamp | Winding faults, overloading, mechanical load changes | 1 kHz | | Acoustic emission | Early bearing defects, gear tooth cracks | 100 kHz+ |

For most industrial motor monitoring, a three-axis accelerometer (ADXL345 or MPU6050) combined with a temperature sensor (DS18B20) gives you 80% of the failure detection capability at 20% of the cost.

Critical sampling consideration: Vibration analysis requires signal frequencies well above the fault frequencies of interest. A motor at 1450 RPM (24 Hz) has bearing fault frequencies typically in the range of 100–500 Hz. You need to sample at 1 kHz+ to capture these signatures reliably (Nyquist theorem).

// ESP32: high-frequency vibration sampling with FFT
#include 
#include 

const uint16_t FFT_SAMPLES = 256; const double SAMPLING_FREQ = 1000.0; // 1 kHz

double vReal[FFT_SAMPLES]; double vImag[FFT_SAMPLES]; arduinoFFT FFT = arduinoFFT(vReal, vImag, FFT_SAMPLES, SAMPLING_FREQ);

Adafruit_ADXL345_Unified accel = Adafruit_ADXL345_Unified(12345);

void sampleAndPublish() { // Collect 256 samples at 1 kHz (256ms window) for (int i = 0; i < FFT_SAMPLES; i++) { sensors_event_t event; accel.getEvent(&event); vReal[i] = event.acceleration.z; // axial vibration vImag[i] = 0.0; delayMicroseconds(1000); // 1ms = 1 kHz }

// Compute FFT FFT.DCRemoval(); FFT.Windowing(FFT_WIN_TYP_HAMMING, FFT_FORWARD); FFT.Compute(FFT_FORWARD); FFT.ComplexToMagnitude();

// Extract features: RMS, peak frequency, dominant bands double rms = 0, peakMag = 0, peakFreq = 0; for (int i = 1; i < FFT_SAMPLES / 2; i++) { rms += vReal[i] * vReal[i]; if (vReal[i] > peakMag) { peakMag = vReal[i]; peakFreq = (i * SAMPLING_FREQ) / FFT_SAMPLES; } } rms = sqrt(rms / (FFT_SAMPLES / 2));

// Publish features (not raw waveform) to save bandwidth char payload[256]; snprintf(payload, sizeof(payload), "{"rms":%.4f,"peakFreq":%.1f,"peakMag":%.4f,"temp":%.2f,"ts":%lu}", rms, peakFreq, peakMag, readTemperature(), millis() / 1000 ); mqttClient.publish("machines/motor01/vibration/features", payload, 0); }

Publishing FFT features instead of raw waveform data reduces bandwidth by 99% while retaining the information needed for fault detection.

Stage 1: Rule-Based Detection

Start here. Rule-based thresholds are interpretable, require no training data, and can be deployed in a day.

// Lambda: rule-based alert processing
interface VibrationFeatures {
  deviceId: string
  rms: number
  peakFreq: number
  peakMag: number
  temp: number
  ts: number
}

const THRESHOLDS = { rmsWarning: 2.5, // m/s² RMS — ISO 10816 Zone B/C boundary rmsCritical: 4.5, // m/s² RMS — ISO 10816 Zone C/D boundary tempWarning: 75, // °C tempCritical: 90, // °C }

export const handler = async (event: VibrationFeatures) => { const alerts: string[] = []

if (event.rms >= THRESHOLDS.rmsCritical) { alerts.push(CRITICAL: Vibration RMS ${event.rms.toFixed(2)} m/s² exceeds ${THRESHOLDS.rmsCritical}) } else if (event.rms >= THRESHOLDS.rmsWarning) { alerts.push(WARNING: Vibration RMS ${event.rms.toFixed(2)} m/s²) }

if (event.temp >= THRESHOLDS.tempCritical) { alerts.push(CRITICAL: Temperature ${event.temp}°C) }

if (alerts.length > 0) { await sendAlert(event.deviceId, alerts) }

// Always store for ML training data accumulation await storeFeatures(event) }

Rule-based detection catches gross failures reliably. Its weakness: it cannot detect subtle early-stage degradation where values stay within normal bounds but patterns change.

Stage 2: Statistical Anomaly Detection

Before deploying ML models, statistical approaches catch many subtle anomalies:

  • Rolling Z-score: Flag readings more than 3σ from a rolling baseline.
  • Trend detection: RMS vibration that increases 0.1 m/s² per week is a slow degradation signal invisible to static thresholds.
  • Spectral band energy: Track energy in the bearing fault frequency band (BPFI/BPFO) specifically.
  • // Anomaly score using rolling statistics — runs in Lambda
    async function computeAnomalyScore(
      deviceId: string,
      current: VibrationFeatures
    ): Promise {
      const history = await getRecentFeatures(deviceId, { hours: 72 })

    if (history.length < 100) return 0 // insufficient data

    const rmsValues = history.map((h) => h.rms) const mean = rmsValues.reduce((a, b) => a + b) / rmsValues.length const std = Math.sqrt( rmsValues.reduce((a, b) => a + Math.pow(b - mean, 2), 0) / rmsValues.length )

    // Weighted combination of multiple signals const rmsZScore = Math.abs((current.rms - mean) / (std || 0.001)) const tempZScore = computeZScore(history.map((h) => h.temp), current.temp) const trendScore = computeTrendScore(rmsValues.slice(-48)) // last 4h trend

    return (rmsZScore * 0.5) + (tempZScore * 0.3) + (trendScore * 0.2) }

    An anomaly score above 3.0 triggers a "watch" state — no alarm yet, but increased sampling rate and logging for the device.

    Stage 3: SageMaker ML Models

    After 3–6 months of labeled failure data, you can train a proper ML model. SageMaker Random Cut Forest is a strong baseline for IoT anomaly detection — it is unsupervised (no labeled failures required), handles multivariate input well, and produces interpretable anomaly scores.

    SageMaker training job — Random Cut Forest

    import boto3 import sagemaker from sagemaker import RandomCutForest

    session = sagemaker.Session() role = 'arn:aws:iam::123456789:role/SageMakerRole'

    Training data: 6 months of normal operation features

    Format: CSV with columns rms, peakFreq, peakMag, temp

    rcf = RandomCutForest( role=role, instance_count=1, instance_type='ml.m5.large', data_location=s3://iot-ml-data/training/motor-vibration/, output_path=s3://iot-ml-models/motor-vibration/, num_samples_per_tree=256, num_trees=50, eval_metrics=['accuracy', 'precision_recall_fscore'], )

    rcf.fit(rcf.record_set(training_data)) predictor = rcf.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

    Deploy the trained endpoint and call it from your Lambda processor:

    // Lambda: call SageMaker endpoint for real-time scoring
    import { SageMakerRuntimeClient, InvokeEndpointCommand } from '@aws-sdk/client-sagemaker-runtime'

    const sagemaker = new SageMakerRuntimeClient({ region: 'us-east-1' })

    async function getMLAnomalyScore(features: VibrationFeatures): Promise { const csvInput = ${features.rms},${features.peakFreq},${features.peakMag},${features.temp}

    const response = await sagemaker.send( new InvokeEndpointCommand({ EndpointName: 'motor-vibration-rcf-v2', Body: Buffer.from(csvInput), ContentType: 'text/csv', }) )

    const result = JSON.parse(Buffer.from(response.Body!).toString()) return result.scores[0].score // RCF anomaly score }

    ROI Calculation

    Before pitching predictive maintenance to a client, calculate the ROI concretely:

    Unplanned failure cost:
      Production downtime: 4 hours × $8,000/hour = $32,000
      Emergency repair: $3,500 (parts + overtime)
      Total per incident: $35,500

    Planned maintenance cost (caught by PdM): Planned downtime: 1 hour (shift change) × $8,000 = $8,000 Standard repair: $1,800 Total per incident: $9,800

    Savings per avoided failure: $25,700

    Sensor + cloud infrastructure cost per motor: $85/month Payback: 1 avoided failure pays for 25 years of monitoring

    For a 200-motor facility with historically 8 unplanned failures per year, predictive maintenance catching 75% of them delivers $154,200 in annual savings against a monitoring cost of ~$204,000/year — but that number improves dramatically as the fleet grows and the model matures.

    Real industrial IoT is a long game. The sensor data you collect today trains the models that prevent failures in year three.

    Need help? [Contact Code Caracal](/contact) — we've shipped these systems for clients across 15+ countries.

    Written by CodeCaracal Engineering

    We write from production experience — every technique in our articles has been deployed to real clients. No academic theory.

    More Articles

    Business · 12 min read

    IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

    Business · 11 min read

    What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

    Business · 11 min read

    IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

    Business · 11 min read

    IoT Development Agency vs Building In-House: A Decision Framework for Founders

    IoT Dashboard · 13 min read

    Next.js IoT Analytics Dashboard: From Sensor Data to Production App

    Business · 11 min read

    How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

    IoT Dashboard · 11 min read

    IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

    IoT Dashboard · 12 min read

    Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

    Cloud & DevOps · 12 min read

    Containerizing IoT Backend Services with Docker: From Dev to Production

    IoT Dashboard · 14 min read

    Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

    IoT Dashboard · 12 min read

    Building Real-Time IoT Dashboards with React and Recharts

    Cloud & DevOps · 13 min read

    CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

    Mobile Development · 12 min read

    Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

    Cloud & DevOps · 14 min read

    Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

    Mobile Development · 10 min read

    Flutter IoT Alerts: Firebase Push Notifications for Device Events

    Cloud & DevOps · 12 min read

    Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

    Mobile Development · 11 min read

    Flutter + MQTT: Building Production IoT Mobile Apps That Scale

    Mobile Development · 13 min read

    Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

    Cloud & DevOps · 13 min read

    AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

    IoT Engineering · 13 min read

    Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

    IoT Engineering · 14 min read

    IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

    Embedded Systems · 14 min read

    IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

    IoT Engineering · 14 min read

    Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

    Embedded Systems · 14 min read

    Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

    IoT Engineering · 13 min read

    IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

    IoT Engineering · 12 min read

    Edge Computing in IoT: When to Process On-Device vs In the Cloud

    IoT Engineering · 13 min read

    Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

    Embedded Systems · 14 min read

    ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

    IoT Engineering · 10 min read

    MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

    IoT Engineering · 14 min read

    IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

    Embedded Systems · 14 min read

    Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

    IoT Engineering · 12 min read

    WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

    Embedded Systems · 13 min read

    STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

    IoT Engineering · 13 min read

    IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

    IoT Engineering · 13 min read

    Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

    Embedded Systems · 13 min read

    UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

    IoT Engineering · 12 min read

    Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

    Embedded Systems · 12 min read

    ESP32 Partition Table: Designing Flash Layout for Production Firmware

    IoT Engineering · 12 min read

    IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

    Embedded Systems · 13 min read

    IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

    IoT Engineering · 13 min read

    Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

    Security · 14 min read

    Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

    Embedded Systems · 14 min read

    FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

    IoT Engineering · 12 min read

    Building a Production IoT Gateway with Raspberry Pi and Node.js

    Embedded Systems · 13 min read

    ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

    Mobile Development · 10 min read

    Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

    IoT Engineering · 13 min read

    IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

    IoT Engineering · 11 min read

    MQTT vs HTTP for IoT: Which Protocol Wins in Production?

    IoT Engineering · 12 min read

    ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

    Let's Build Together

    Got an IoT challenge?
    We've shipped it.

    Whether you need a fleet to track, a factory to monitor, or a farm to automate — our team has done it before and we'd love to build it with you. Typical response time: under 24 hours.

    No upfront commitment99.9% uptime SLANDA on requestFixed-price options