Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices
A digital twin is a virtual representation of a physical device — not just its current state, but its history, expected behavior, and predicted future. When done well, a digital twin lets you debug a device without touching it, simulate a failure before it happens, and test new firmware against a faithful replica of a deployed fleet.
The term gets overloaded. A cached JSON blob of the last sensor reading is not a digital twin. A model that reflects the device's current state, responds to the same commands, tracks historical state transitions, and can run independently from the physical device — that is a digital twin.
This guide goes from the AWS IoT Device Shadow (the simplest starting point) to a production-grade twin with state history, synchronization guarantees, and real use cases.
AWS IoT Device Shadow: The Foundation
AWS IoT Core provides a built-in digital twin primitive: the Device Shadow. It is a JSON document with three sections:
This gap-tracking model is elegant. When a device is offline, you write to desired. When it reconnects, it receives the delta and acts on it. No message queuing required on your side.
// Backend: update desired state for a device
import { IoTDataPlaneClient, UpdateThingShadowCommand } from '@aws-sdk/client-iot-data-plane'const iotData = new IoTDataPlaneClient({ region: 'us-east-1' })
async function setDeviceSetpoint(deviceId: string, tempSetpoint: number) {
const payload = {
state: {
desired: {
tempSetpoint,
updatedBy: 'dashboard',
updatedAt: new Date().toISOString(),
},
},
}
await iotData.send(
new UpdateThingShadowCommand({
thingName: deviceId,
payload: Buffer.from(JSON.stringify(payload)),
})
)
}
// Firmware side: receive desired state and report back
void onShadowDeltaReceived(const char* payload) {
StaticJsonDocument<512> doc;
deserializeJson(doc, payload);
float newSetpoint = doc["state"]["tempSetpoint"];
applySetpoint(newSetpoint);
// Report the new state back to close the delta
StaticJsonDocument<256> report;
report["state"]["reported"]["tempSetpoint"] = newSetpoint;
report["state"]["reported"]["status"] = "applied";
char buf[256];
serializeJson(report, buf);
mqttClient.publish(
"$aws/things/${DEVICE_ID}/shadow/update", buf
);
}
The Device Shadow handles the current state well. But a production digital twin needs more.
Building a Richer Digital Twin
Beyond the Device Shadow, a full digital twin stores:
State History with DynamoDB Time-Series
// Lambda: IoT Rule triggers this on every shadow update
import { DynamoDBClient, PutItemCommand } from '@aws-sdk/client-dynamodb'
import { marshall } from '@aws-sdk/util-dynamodb'const db = new DynamoDBClient({ region: 'us-east-1' })
interface ShadowUpdateEvent {
deviceId: string
state: {
reported: Record
}
timestamp: number
version: number
}
export const handler = async (event: ShadowUpdateEvent) => {
// Store full state snapshot with TTL for retention policy
const ttl = Math.floor(Date.now() / 1000) + 90 * 24 * 60 * 60 // 90 days
await db.send(
new PutItemCommand({
TableName: 'DeviceTwinHistory',
Item: marshall({
pk: DEVICE#${event.deviceId},
sk: STATE#${event.timestamp},
deviceId: event.deviceId,
state: event.state.reported,
shadowVersion: event.version,
ttl,
}),
})
)
}
DynamoDB table design for the twin:
DEVICE#STATE# for time-ordered state historydeviceType + timestamp for fleet-wide queriesThe Twin API
Expose the digital twin as a REST API that your dashboard, mobile app, and other services consume:
// GET /twins/:deviceId — returns current + recent history
async function getTwin(deviceId: string) {
// Current shadow from AWS IoT
const shadow = await getShadow(deviceId) // Recent state history from DynamoDB
const history = await queryStateHistory(deviceId, {
from: Date.now() - 24 * 60 * 60 * 1000, // last 24h
limit: 1000,
})
// Computed attributes
const computed = {
uptimeHours: calculateUptime(history),
avgTemperature: average(history.map((h) => h.state.temperature)),
anomalyScore: runAnomalyModel(history),
predictedFailureDate: runRULModel(history), // Remaining Useful Life
}
return {
deviceId,
currentState: shadow.state.reported,
desiredState: shadow.state.desired,
lastSeen: shadow.metadata.reported.temperature?.timestamp,
history,
computed,
}
}
Synchronization Patterns
The trickiest part of a digital twin is keeping it in sync with the physical device across unreliable networks.
Pattern 1: Shadow-driven (simplest) Device publishes state → Shadow updates → Lambda records history. Consistent but has eventual consistency lag. Acceptable for monitoring use cases.
Pattern 2: Direct publish + shadow
Device publishes to both telemetry (high-frequency, QoS 0) and updates the shadow (low-frequency, QoS 1). Twin stores telemetry in time-series DB and shadow for current state. Best of both: rich history and guaranteed current state.
Pattern 3: Event sourcing Every state change is an immutable event. The twin is rebuilt by replaying events. Maximum fidelity, highest complexity. Worth it for regulated industries where audit trails are mandatory.
For most projects, Pattern 2 hits the right balance.
Use Case: Simulation and Testing
A digital twin that accurately models device behavior lets you test new firmware without hardware:
// Twin simulator: generate realistic sensor data for integration tests
class DeviceTwinSimulator {
private state: DeviceState
private history: DeviceState[] constructor(deviceId: string, seedHistory: DeviceState[]) {
this.history = seedHistory
this.state = seedHistory[seedHistory.length - 1]
}
// Generate next state based on learned patterns
tick(elapsedSeconds: number): DeviceState {
const hourOfDay = new Date().getHours()
// Temperature follows daily cycle learned from history
const baseTemp = this.learnedBaseline('temperature', hourOfDay)
const noise = (Math.random() - 0.5) * 0.5
this.state = {
...this.state,
temperature: baseTemp + noise,
timestamp: Date.now(),
uptime: this.state.uptime + elapsedSeconds,
}
this.history.push(this.state)
return this.state
}
}
In CI/CD, spin up twin simulators for 50 virtual devices, run your entire backend pipeline against them, and verify data flows correctly — without a single physical device in the loop.
Use Case: Predictive Maintenance
The twin accumulates enough state history to run predictive models. A vibration sensor twin that has tracked motor vibration for six months can flag when vibration patterns deviate from the learned baseline — before the motor fails.
This is covered in depth in our [predictive maintenance guide](/blog/predictive-maintenance-iot-ml), but the key point architecturally: the twin is the data source. The ML model queries the twin API, not the raw telemetry stream. This decoupling means you can improve the model without touching the ingestion pipeline.
Implementation Checklist
The digital twin pattern pays dividends throughout the product lifecycle — faster debugging, safer testing, and eventually the foundation for ML-driven predictive features.
Need help? [Contact Code Caracal](/contact) — we've shipped these systems for clients across 15+ countries.