Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

A factory floor has 200 motors. Each motor has a vibration sensor, a temperature sensor, and a current clamp. Without any analytics, maintenance is scheduled: every 90 days, a technician checks every motor. Failures between visits are surprises — expensive surprises.

With threshold alerts, you catch failures as they happen. The motor temperature hits 85°C, an alarm fires, a technician responds within the hour. Better, but still reactive.

With predictive maintenance IoT machine learning, you catch failure signatures 48–96 hours before a motor fails. The maintenance team schedules a planned replacement during a shift change instead of scrambling for an emergency repair during peak production. That difference is worth six figures annually in a mid-sized manufacturing facility.

This guide covers the full progression: from sensor selection to rule-based detection to ML models, with real implementation code at each stage.

Sensor Selection for Predictive Maintenance

The sensors you choose determine what failure modes you can detect.

| Sensor | Failure modes detected | Sampling rate | |---|---|---| | Vibration (accelerometer) | Bearing wear, imbalance, misalignment, looseness | 1–10 kHz | | Temperature (thermistor/IR) | Overloading, bearing failure, cooling blockage | 1–10 Hz | | Current clamp | Winding faults, overloading, mechanical load changes | 1 kHz | | Acoustic emission | Early bearing defects, gear tooth cracks | 100 kHz+ |

For most industrial motor monitoring, a three-axis accelerometer (ADXL345 or MPU6050) combined with a temperature sensor (DS18B20) gives you 80% of the failure detection capability at 20% of the cost.

Critical sampling consideration: Vibration analysis requires signal frequencies well above the fault frequencies of interest. A motor at 1450 RPM (24 Hz) has bearing fault frequencies typically in the range of 100–500 Hz. You need to sample at 1 kHz+ to capture these signatures reliably (Nyquist theorem).

// ESP32: high-frequency vibration sampling with FFT
#include 
#include 
const uint16_t FFT_SAMPLES = 256;
const double SAMPLING_FREQ = 1000.0; // 1 kHz
double vReal[FFT_SAMPLES];
double vImag[FFT_SAMPLES];
arduinoFFT FFT = arduinoFFT(vReal, vImag, FFT_SAMPLES, SAMPLING_FREQ);
Adafruit_ADXL345_Unified accel = Adafruit_ADXL345_Unified(12345);
void sampleAndPublish() {
  // Collect 256 samples at 1 kHz (256ms window)
  for (int i = 0; i < FFT_SAMPLES; i++) {
    sensors_event_t event;
    accel.getEvent(&event);
    vReal[i] = event.acceleration.z; // axial vibration
    vImag[i] = 0.0;
    delayMicroseconds(1000); // 1ms = 1 kHz
  }
  // Compute FFT
  FFT.DCRemoval();
  FFT.Windowing(FFT_WIN_TYP_HAMMING, FFT_FORWARD);
  FFT.Compute(FFT_FORWARD);
  FFT.ComplexToMagnitude();
  // Extract features: RMS, peak frequency, dominant bands
  double rms = 0, peakMag = 0, peakFreq = 0;
  for (int i = 1; i < FFT_SAMPLES / 2; i++) {
    rms += vReal[i] * vReal[i];
    if (vReal[i] > peakMag) {
      peakMag  = vReal[i];
      peakFreq = (i * SAMPLING_FREQ) / FFT_SAMPLES;
    }
  }
  rms = sqrt(rms / (FFT_SAMPLES / 2));  // Publish features (not raw waveform) to save bandwidth
  char payload[256];
  snprintf(payload, sizeof(payload),
    "{"rms":%.4f,"peakFreq":%.1f,"peakMag":%.4f,"temp":%.2f,"ts":%lu}",
    rms, peakFreq, peakMag, readTemperature(), millis() / 1000
  );
  mqttClient.publish("machines/motor01/vibration/features", payload, 0);
}

Publishing FFT features instead of raw waveform data reduces bandwidth by 99% while retaining the information needed for fault detection.

Stage 1: Rule-Based Detection

Start here. Rule-based thresholds are interpretable, require no training data, and can be deployed in a day.

// Lambda: rule-based alert processing
interface VibrationFeatures {
  deviceId: string
  rms: number
  peakFreq: number
  peakMag: number
  temp: number
  ts: number
}
const THRESHOLDS = {
  rmsWarning:  2.5,  // m/s² RMS — ISO 10816 Zone B/C boundary
  rmsCritical: 4.5,  // m/s² RMS — ISO 10816 Zone C/D boundary
  tempWarning:  75,  // °C
  tempCritical: 90,  // °C
}
export const handler = async (event: VibrationFeatures) => {
  const alerts: string[] = []
  if (event.rms >= THRESHOLDS.rmsCritical) {
    alerts.push(CRITICAL: Vibration RMS ${event.rms.toFixed(2)} m/s² exceeds ${THRESHOLDS.rmsCritical})
  } else if (event.rms >= THRESHOLDS.rmsWarning) {
    alerts.push(WARNING: Vibration RMS ${event.rms.toFixed(2)} m/s²)
  }
  if (event.temp >= THRESHOLDS.tempCritical) {
    alerts.push(CRITICAL: Temperature ${event.temp}°C)
  }
  if (alerts.length > 0) {
    await sendAlert(event.deviceId, alerts)
  }  // Always store for ML training data accumulation
  await storeFeatures(event)
}

Rule-based detection catches gross failures reliably. Its weakness: it cannot detect subtle early-stage degradation where values stay within normal bounds but patterns change.

Stage 2: Statistical Anomaly Detection

Before deploying ML models, statistical approaches catch many subtle anomalies:

Rolling Z-score: Flag readings more than 3σ from a rolling baseline.

Trend detection: RMS vibration that increases 0.1 m/s² per week is a slow degradation signal invisible to static thresholds.

Spectral band energy: Track energy in the bearing fault frequency band (BPFI/BPFO) specifically.

// Anomaly score using rolling statistics — runs in Lambda
async function computeAnomalyScore(
  deviceId: string,
  current: VibrationFeatures
): Promise {
  const history = await getRecentFeatures(deviceId, { hours: 72 })
  if (history.length < 100) return 0 // insufficient data
  const rmsValues = history.map((h) => h.rms)
  const mean = rmsValues.reduce((a, b) => a + b) / rmsValues.length
  const std = Math.sqrt(
    rmsValues.reduce((a, b) => a + Math.pow(b - mean, 2), 0) / rmsValues.length
  )
  // Weighted combination of multiple signals
  const rmsZScore   = Math.abs((current.rms - mean) / (std || 0.001))
  const tempZScore  = computeZScore(history.map((h) => h.temp), current.temp)
  const trendScore  = computeTrendScore(rmsValues.slice(-48)) // last 4h trend  return (rmsZScore * 0.5) + (tempZScore * 0.3) + (trendScore * 0.2)
}

An anomaly score above 3.0 triggers a "watch" state — no alarm yet, but increased sampling rate and logging for the device.

Stage 3: SageMaker ML Models

After 3–6 months of labeled failure data, you can train a proper ML model. SageMaker Random Cut Forest is a strong baseline for IoT anomaly detection — it is unsupervised (no labeled failures required), handles multivariate input well, and produces interpretable anomaly scores.

SageMaker training job — Random Cut Forest
import boto3
import sagemaker
from sagemaker import RandomCutForest
session = sagemaker.Session()
role = 'arn:aws:iam::123456789:role/SageMakerRole'
Training data: 6 months of normal operation features
Format: CSV with columns rms, peakFreq, peakMag, temp
rcf = RandomCutForest(
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    data_location=s3://iot-ml-data/training/motor-vibration/,
    output_path=s3://iot-ml-models/motor-vibration/,
    num_samples_per_tree=256,
    num_trees=50,
    eval_metrics=['accuracy', 'precision_recall_fscore'],
)rcf.fit(rcf.record_set(training_data))
predictor = rcf.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

Deploy the trained endpoint and call it from your Lambda processor:

// Lambda: call SageMaker endpoint for real-time scoring
import { SageMakerRuntimeClient, InvokeEndpointCommand } from '@aws-sdk/client-sagemaker-runtime'
const sagemaker = new SageMakerRuntimeClient({ region: 'us-east-1' })
async function getMLAnomalyScore(features: VibrationFeatures): Promise {
  const csvInput = ${features.rms},${features.peakFreq},${features.peakMag},${features.temp}
  const response = await sagemaker.send(
    new InvokeEndpointCommand({
      EndpointName: 'motor-vibration-rcf-v2',
      Body: Buffer.from(csvInput),
      ContentType: 'text/csv',
    })
  )  const result = JSON.parse(Buffer.from(response.Body!).toString())
  return result.scores[0].score // RCF anomaly score
}

ROI Calculation

Before pitching predictive maintenance to a client, calculate the ROI concretely:

Unplanned failure cost: Production downtime: 4 hours × $8,000/hour = $32,000 Emergency repair: $3,500 (parts + overtime) Total per incident: $35,500 Planned maintenance cost (caught by PdM): Planned downtime: 1 hour (shift change) × $8,000 = $8,000 Standard repair: $1,800 Total per incident: $9,800 Savings per avoided failure: $25,700

Sensor + cloud infrastructure cost per motor: $85/month Payback: 1 avoided failure pays for 25 years of monitoring

For a 200-motor facility with historically 8 unplanned failures per year, predictive maintenance catching 75% of them delivers $154,200 in annual savings against a monitoring cost of ~$204,000/year — but that number improves dramatically as the fleet grows and the model matures.

Real industrial IoT is a long game. The sensor data you collect today trains the models that prevent failures in year three.

Need help? [Contact Code Caracal](/contact) — we've shipped these systems for clients across 15+ countries.

Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

Sensor Selection for Predictive Maintenance

Stage 1: Rule-Based Detection

Stage 2: Statistical Anomaly Detection

Stage 3: SageMaker ML Models

SageMaker training job — Random Cut Forest

Training data: 6 months of normal operation features

Format: CSV with columns rms, peakFreq, peakMag, temp

ROI Calculation

More Articles

IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

IoT Development Agency vs Building In-House: A Decision Framework for Founders

Next.js IoT Analytics Dashboard: From Sensor Data to Production App

How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

Containerizing IoT Backend Services with Docker: From Dev to Production

Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

Building Real-Time IoT Dashboards with React and Recharts

CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

Flutter IoT Alerts: Firebase Push Notifications for Device Events

Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

Flutter + MQTT: Building Production IoT Mobile Apps That Scale

Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

Edge Computing in IoT: When to Process On-Device vs In the Cloud

Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

ESP32 Partition Table: Designing Flash Layout for Production Firmware

IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

Building a Production IoT Gateway with Raspberry Pi and Node.js

ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

MQTT vs HTTP for IoT: Which Protocol Wins in Production?

ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

Got an IoT challenge?We've shipped it.

Got an IoT challenge?
We've shipped it.