Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning
A factory floor has 200 motors. Each motor has a vibration sensor, a temperature sensor, and a current clamp. Without any analytics, maintenance is scheduled: every 90 days, a technician checks every motor. Failures between visits are surprises — expensive surprises.
With threshold alerts, you catch failures as they happen. The motor temperature hits 85°C, an alarm fires, a technician responds within the hour. Better, but still reactive.
With predictive maintenance IoT machine learning, you catch failure signatures 48–96 hours before a motor fails. The maintenance team schedules a planned replacement during a shift change instead of scrambling for an emergency repair during peak production. That difference is worth six figures annually in a mid-sized manufacturing facility.
This guide covers the full progression: from sensor selection to rule-based detection to ML models, with real implementation code at each stage.
Sensor Selection for Predictive Maintenance
The sensors you choose determine what failure modes you can detect.
| Sensor | Failure modes detected | Sampling rate | |---|---|---| | Vibration (accelerometer) | Bearing wear, imbalance, misalignment, looseness | 1–10 kHz | | Temperature (thermistor/IR) | Overloading, bearing failure, cooling blockage | 1–10 Hz | | Current clamp | Winding faults, overloading, mechanical load changes | 1 kHz | | Acoustic emission | Early bearing defects, gear tooth cracks | 100 kHz+ |
For most industrial motor monitoring, a three-axis accelerometer (ADXL345 or MPU6050) combined with a temperature sensor (DS18B20) gives you 80% of the failure detection capability at 20% of the cost.
Critical sampling consideration: Vibration analysis requires signal frequencies well above the fault frequencies of interest. A motor at 1450 RPM (24 Hz) has bearing fault frequencies typically in the range of 100–500 Hz. You need to sample at 1 kHz+ to capture these signatures reliably (Nyquist theorem).
// ESP32: high-frequency vibration sampling with FFT
#include
#include const uint16_t FFT_SAMPLES = 256;
const double SAMPLING_FREQ = 1000.0; // 1 kHz
double vReal[FFT_SAMPLES];
double vImag[FFT_SAMPLES];
arduinoFFT FFT = arduinoFFT(vReal, vImag, FFT_SAMPLES, SAMPLING_FREQ);
Adafruit_ADXL345_Unified accel = Adafruit_ADXL345_Unified(12345);
void sampleAndPublish() {
// Collect 256 samples at 1 kHz (256ms window)
for (int i = 0; i < FFT_SAMPLES; i++) {
sensors_event_t event;
accel.getEvent(&event);
vReal[i] = event.acceleration.z; // axial vibration
vImag[i] = 0.0;
delayMicroseconds(1000); // 1ms = 1 kHz
}
// Compute FFT
FFT.DCRemoval();
FFT.Windowing(FFT_WIN_TYP_HAMMING, FFT_FORWARD);
FFT.Compute(FFT_FORWARD);
FFT.ComplexToMagnitude();
// Extract features: RMS, peak frequency, dominant bands
double rms = 0, peakMag = 0, peakFreq = 0;
for (int i = 1; i < FFT_SAMPLES / 2; i++) {
rms += vReal[i] * vReal[i];
if (vReal[i] > peakMag) {
peakMag = vReal[i];
peakFreq = (i * SAMPLING_FREQ) / FFT_SAMPLES;
}
}
rms = sqrt(rms / (FFT_SAMPLES / 2));
// Publish features (not raw waveform) to save bandwidth
char payload[256];
snprintf(payload, sizeof(payload),
"{"rms":%.4f,"peakFreq":%.1f,"peakMag":%.4f,"temp":%.2f,"ts":%lu}",
rms, peakFreq, peakMag, readTemperature(), millis() / 1000
);
mqttClient.publish("machines/motor01/vibration/features", payload, 0);
}
Publishing FFT features instead of raw waveform data reduces bandwidth by 99% while retaining the information needed for fault detection.
Stage 1: Rule-Based Detection
Start here. Rule-based thresholds are interpretable, require no training data, and can be deployed in a day.
// Lambda: rule-based alert processing
interface VibrationFeatures {
deviceId: string
rms: number
peakFreq: number
peakMag: number
temp: number
ts: number
}const THRESHOLDS = {
rmsWarning: 2.5, // m/s² RMS — ISO 10816 Zone B/C boundary
rmsCritical: 4.5, // m/s² RMS — ISO 10816 Zone C/D boundary
tempWarning: 75, // °C
tempCritical: 90, // °C
}
export const handler = async (event: VibrationFeatures) => {
const alerts: string[] = []
if (event.rms >= THRESHOLDS.rmsCritical) {
alerts.push(CRITICAL: Vibration RMS ${event.rms.toFixed(2)} m/s² exceeds ${THRESHOLDS.rmsCritical})
} else if (event.rms >= THRESHOLDS.rmsWarning) {
alerts.push(WARNING: Vibration RMS ${event.rms.toFixed(2)} m/s²)
}
if (event.temp >= THRESHOLDS.tempCritical) {
alerts.push(CRITICAL: Temperature ${event.temp}°C)
}
if (alerts.length > 0) {
await sendAlert(event.deviceId, alerts)
}
// Always store for ML training data accumulation
await storeFeatures(event)
}
Rule-based detection catches gross failures reliably. Its weakness: it cannot detect subtle early-stage degradation where values stay within normal bounds but patterns change.
Stage 2: Statistical Anomaly Detection
Before deploying ML models, statistical approaches catch many subtle anomalies:
// Anomaly score using rolling statistics — runs in Lambda
async function computeAnomalyScore(
deviceId: string,
current: VibrationFeatures
): Promise {
const history = await getRecentFeatures(deviceId, { hours: 72 }) if (history.length < 100) return 0 // insufficient data
const rmsValues = history.map((h) => h.rms)
const mean = rmsValues.reduce((a, b) => a + b) / rmsValues.length
const std = Math.sqrt(
rmsValues.reduce((a, b) => a + Math.pow(b - mean, 2), 0) / rmsValues.length
)
// Weighted combination of multiple signals
const rmsZScore = Math.abs((current.rms - mean) / (std || 0.001))
const tempZScore = computeZScore(history.map((h) => h.temp), current.temp)
const trendScore = computeTrendScore(rmsValues.slice(-48)) // last 4h trend
return (rmsZScore * 0.5) + (tempZScore * 0.3) + (trendScore * 0.2)
}
An anomaly score above 3.0 triggers a "watch" state — no alarm yet, but increased sampling rate and logging for the device.
Stage 3: SageMaker ML Models
After 3–6 months of labeled failure data, you can train a proper ML model. SageMaker Random Cut Forest is a strong baseline for IoT anomaly detection — it is unsupervised (no labeled failures required), handles multivariate input well, and produces interpretable anomaly scores.
SageMaker training job — Random Cut Forest
import boto3
import sagemaker
from sagemaker import RandomCutForestsession = sagemaker.Session()
role = 'arn:aws:iam::123456789:role/SageMakerRole'
Training data: 6 months of normal operation features
Format: CSV with columns rms, peakFreq, peakMag, temp
rcf = RandomCutForest(
role=role,
instance_count=1,
instance_type='ml.m5.large',
data_location=s3://iot-ml-data/training/motor-vibration/,
output_path=s3://iot-ml-models/motor-vibration/,
num_samples_per_tree=256,
num_trees=50,
eval_metrics=['accuracy', 'precision_recall_fscore'],
)rcf.fit(rcf.record_set(training_data))
predictor = rcf.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
Deploy the trained endpoint and call it from your Lambda processor:
// Lambda: call SageMaker endpoint for real-time scoring
import { SageMakerRuntimeClient, InvokeEndpointCommand } from '@aws-sdk/client-sagemaker-runtime'const sagemaker = new SageMakerRuntimeClient({ region: 'us-east-1' })
async function getMLAnomalyScore(features: VibrationFeatures): Promise {
const csvInput = ${features.rms},${features.peakFreq},${features.peakMag},${features.temp}
const response = await sagemaker.send(
new InvokeEndpointCommand({
EndpointName: 'motor-vibration-rcf-v2',
Body: Buffer.from(csvInput),
ContentType: 'text/csv',
})
)
const result = JSON.parse(Buffer.from(response.Body!).toString())
return result.scores[0].score // RCF anomaly score
}
ROI Calculation
Before pitching predictive maintenance to a client, calculate the ROI concretely:
Unplanned failure cost:
Production downtime: 4 hours × $8,000/hour = $32,000
Emergency repair: $3,500 (parts + overtime)
Total per incident: $35,500Planned maintenance cost (caught by PdM):
Planned downtime: 1 hour (shift change) × $8,000 = $8,000
Standard repair: $1,800
Total per incident: $9,800
Savings per avoided failure: $25,700
Sensor + cloud infrastructure cost per motor: $85/month
Payback: 1 avoided failure pays for 25 years of monitoring
For a 200-motor facility with historically 8 unplanned failures per year, predictive maintenance catching 75% of them delivers $154,200 in annual savings against a monitoring cost of ~$204,000/year — but that number improves dramatically as the fleet grows and the model matures.
Real industrial IoT is a long game. The sensor data you collect today trains the models that prevent failures in year three.
Need help? [Contact Code Caracal](/contact) — we've shipped these systems for clients across 15+ countries.