Back to Blog
Embedded Systems

STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

STM32 HAL makes peripheral initialization easy but adds measurable overhead in interrupt latency and CPU cycles. In motor control, audio processing, and high-speed SPI, that overhead is unacceptable. Here is when to drop down to LL drivers — and how.

April 18, 2024
13 min read
STM32HALLow-Level DriversPerformance

STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much

ST's HAL (Hardware Abstraction Layer) is one of the most controversial topics in the STM32 community. Some engineers swear by it for its readability and portability. Others consider it bloated and unsuitable for production. The truth is more nuanced: HAL is excellent for 80% of use cases and genuinely unsuitable for the remaining 20%.

This guide shows you exactly where that line is, with real overhead numbers and code examples showing the LL alternative.

What HAL Does Under the Hood

HAL adds several layers on top of direct register access:

  • 1. Parameter validation (checking handle pointers, state machine checks)
  • 2. State machine management (HAL_SPI_STATE_BUSY, etc.)
  • 3. Timeout handling via HAL_GetTick()
  • 4. Callback mechanisms for interrupts
  • For a simple GPIO toggle, HAL_GPIO_TogglePin compiles to roughly 8 instructions. The LL equivalent LL_GPIO_TogglePin compiles to 2 instructions (read-modify-write BSRR register). At 168 MHz, that is a 47 ns vs 12 ns difference — negligible for most work.

    Where it matters is inside ISRs and tight control loops.

    ISR Latency: HAL vs LL

    HAL timer interrupt handlers call HAL_TIM_IRQHandler, which dispatches through a switch-case state machine before calling your callback. On an STM32F4 at 168 MHz:

  • Cortex-M4 interrupt entry: ~12 cycles (register stacking)
  • HAL_TIM_IRQHandler overhead: ~40–80 cycles
  • Your callback execution: variable
  • Total minimum latency to first user instruction: ~52–92 cycles (~300–550 ns)
  • With LL and direct ISR implementation:

    // Direct ISR — no HAL dispatch overhead
    void TIM2_IRQHandler(void) {
        if (LL_TIM_IsActiveFlag_UPDATE(TIM2)) {
            LL_TIM_ClearFlag_UPDATE(TIM2);

    // Your control loop code here — first instruction at ~15 cycles from IRQ assert run_control_loop(); } }

  • Cortex-M4 interrupt entry: ~12 cycles
  • LL flag check + clear: ~6 cycles
  • Total minimum latency: ~18 cycles (~107 ns)
  • For a 20 kHz motor control PWM (50 µs period), 550 ns jitter is 1.1% of the period. For a 100 kHz servo update loop, it is 5.5% — unacceptable.

    SPI Performance: HAL Blocking vs LL DMA

    HAL's blocking SPI transfer (HAL_SPI_Transmit) polls the TXE flag in a loop, wasting CPU cycles. For a 256-byte SPI flash page write at 10 MHz:

  • HAL blocking: CPU burns ~2048 cycles polling (256 bytes × 8 clocks, 1-cycle poll)
  • LL DMA: CPU issues DMA start, returns immediately, interrupt fires on completion
  • // LL DMA SPI transmit — non-blocking
    void spi_transmit_dma_ll(const uint8_t *data, uint16_t len) {
        // Configure DMA stream for SPI1 TX
        LL_DMA_SetMemoryAddress(DMA2, LL_DMA_STREAM_3, (uint32_t)data);
        LL_DMA_SetDataLength(DMA2, LL_DMA_STREAM_3, len);

    // Enable DMA transfer complete interrupt LL_DMA_EnableIT_TC(DMA2, LL_DMA_STREAM_3);

    // Start SPI DMA LL_SPI_EnableDMAReq_TX(SPI1); LL_DMA_EnableStream(DMA2, LL_DMA_STREAM_3); LL_SPI_Enable(SPI1); }

    // DMA transfer complete ISR void DMA2_Stream3_IRQHandler(void) { if (LL_DMA_IsActiveFlag_TC3(DMA2)) { LL_DMA_ClearFlag_TC3(DMA2); LL_DMA_DisableStream(DMA2, LL_DMA_STREAM_3); LL_SPI_Disable(SPI1); // Signal completion via semaphore BaseType_t woken; xSemaphoreGiveFromISR(spi_done_sem, &woken); portYIELD_FROM_ISR(woken); } }

    CPU utilization drops from ~100% (HAL polling) to ~2% (LL DMA with interrupt).

    Profiling Techniques

    Before switching to LL, measure. Use a GPIO toggle and an oscilloscope or logic analyzer:

    // Profile an ISR or function with GPIO toggle
    void TIM2_IRQHandler(void) {
        LL_GPIO_SetOutputPin(GPIOC, LL_GPIO_PIN_13);  // Rising edge = ISR start

    if (LL_TIM_IsActiveFlag_UPDATE(TIM2)) { LL_TIM_ClearFlag_UPDATE(TIM2); run_control_loop(); }

    LL_GPIO_ResetOutputPin(GPIOC, LL_GPIO_PIN_13); // Falling edge = ISR end }

    Capture this on a logic analyzer. Measure pulse width (ISR execution time) and period jitter (scheduling consistency). This gives you hard data to justify the switch to LL.

    Mixing HAL and LL: The Practical Approach

    You do not have to choose globally. Use HAL for initialization (peripheral clock enable, GPIO mode config, NVIC setup) — it is excellent at this. Use LL for the hot paths inside ISRs and tight loops.

    void MX_SPI1_Init(void) {
        // HAL for initialization — readable, handles clock enables automatically
        hspi1.Instance               = SPI1;
        hspi1.Init.Mode              = SPI_MODE_MASTER;
        hspi1.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_4;
        hspi1.Init.DataSize          = SPI_DATASIZE_8BIT;
        HAL_SPI_Init(&hspi1);
        // Now SPI1 is configured — switch to LL for transfers
    }

    // Runtime transfers use LL directly — no HAL overhead uint8_t spi_transfer_byte_ll(uint8_t data) { while (!LL_SPI_IsActiveFlag_TXE(SPI1)); LL_SPI_TransmitData8(SPI1, data); while (!LL_SPI_IsActiveFlag_RXNE(SPI1)); return LL_SPI_ReceiveData8(SPI1); }

    When LL Matters in Production

    Use LL drivers (or direct register access) for:

  • Motor control: BLDC/PMSM control loops at 10–100 kHz require deterministic ISR latency under 1 µs
  • Audio codecs: I2S streaming with DMA at 44.1/48 kHz — DMA underrun causes audible glitches
  • High-speed SPI: Displays, ADCs, or SPI flash at >10 MHz with DMA for maximum throughput
  • USB full-speed: USB protocol timing is extremely tight; HAL USB middleware already uses LL internally
  • For everything else — UART debug, I2C sensor reads, slow GPIO control — HAL is the right tool. Its readability, portability across STM32 families, and integration with STM32CubeMX code generation save far more engineering time than LL would recover.

    For projects requiring both real-time control (LL) and cloud connectivity, pair your STM32 with an ESP32 as a WiFi co-processor. See our [ESP32 vs STM32 comparison](/esp32-vs-stm32-microcontroller-comparison) for the dual-chip architecture pattern.

    [Contact Code Caracal](/contact) — we build production firmware for clients across 15+ countries.

    Written by CodeCaracal Engineering

    We write from production experience — every technique in our articles has been deployed to real clients. No academic theory.

    More Articles

    Business · 12 min read

    IoT Device Compliance: FCC, CE, and Product Certification Guide for Hardware Startups

    Business · 11 min read

    What to Look for When Hiring an IoT Development Partner: 8 Critical Criteria

    Business · 11 min read

    IoT MVP to Production: Realistic Timeline and Budget for Hardware Startups

    Business · 11 min read

    IoT Development Agency vs Building In-House: A Decision Framework for Founders

    IoT Dashboard · 13 min read

    Next.js IoT Analytics Dashboard: From Sensor Data to Production App

    Business · 11 min read

    How Much Does It Cost to Build an IoT Product in 2024? A Realistic Breakdown

    IoT Dashboard · 11 min read

    IoT Dashboard UX: Design Principles for Industrial Monitoring Interfaces

    IoT Dashboard · 12 min read

    Node.js WebSocket Server: The Real-Time Backend for IoT Dashboards

    Cloud & DevOps · 12 min read

    Containerizing IoT Backend Services with Docker: From Dev to Production

    IoT Dashboard · 14 min read

    Grafana + InfluxDB IoT Monitoring: Complete Production Setup Guide

    IoT Dashboard · 12 min read

    Building Real-Time IoT Dashboards with React and Recharts

    Cloud & DevOps · 13 min read

    CI/CD for Embedded Firmware: Automated Build, Test, and OTA Release Pipeline

    Mobile Development · 12 min read

    Flutter Offline-First IoT Apps: Hive + Sync Architecture That Works in the Field

    Cloud & DevOps · 14 min read

    Terraform for IoT Infrastructure: Provisioning AWS IoT Core, Lambda, and InfluxDB as Code

    Mobile Development · 10 min read

    Flutter IoT Alerts: Firebase Push Notifications for Device Events

    Cloud & DevOps · 12 min read

    Deploying IoT Backends on AWS: ECS Fargate vs Lambda vs EC2 Decision Guide

    Mobile Development · 11 min read

    Flutter + MQTT: Building Production IoT Mobile Apps That Scale

    Mobile Development · 13 min read

    Flutter BLE: Building a Bluetooth IoT Controller App from Scratch

    Cloud & DevOps · 13 min read

    AWS IoT Core vs Azure IoT Hub vs Google Cloud IoT: 2024 Honest Comparison

    IoT Engineering · 13 min read

    Kafka vs RabbitMQ for IoT: Choosing the Right Message Queue for High-Volume Telemetry

    IoT Engineering · 14 min read

    IoT System Testing: Unit, Integration, Hardware-in-the-Loop, and End-to-End

    IoT Engineering · 14 min read

    Predictive Maintenance with IoT Sensor Data: From Threshold to Machine Learning

    Embedded Systems · 14 min read

    IoT Bootloader Design: Secure Boot, A/B Partitions, and Reliable OTA Recovery

    IoT Engineering · 14 min read

    Multi-Tenant IoT Platform Architecture: Isolation, Scaling, and Data Partitioning

    Embedded Systems · 14 min read

    Memory Management in Embedded Firmware: Avoiding Heap Fragmentation and Stack Overflows

    IoT Engineering · 13 min read

    IoT Cost Optimization: How We Cut AWS IoT Bills by 60% Without Sacrificing Reliability

    IoT Engineering · 12 min read

    Edge Computing in IoT: When to Process On-Device vs In the Cloud

    IoT Engineering · 13 min read

    Digital Twins for IoT: Building a Virtual Mirror of Your Physical Devices

    Embedded Systems · 14 min read

    ESP32 Deep Sleep Mastery: Cutting Power Consumption from 240mA to 10µA

    IoT Engineering · 10 min read

    MQTT QoS 0, 1, and 2 Explained: Choosing the Right Level for IoT

    IoT Engineering · 14 min read

    IoT Monitoring and Observability: Metrics, Logs, and Distributed Tracing

    Embedded Systems · 14 min read

    Debugging Embedded Firmware: JTAG, GDB, Logic Analyzers, and Serial Tracing

    IoT Engineering · 12 min read

    WebSocket vs MQTT vs Server-Sent Events: Real-Time IoT Protocol Deep Dive

    IoT Engineering · 13 min read

    IoT Data Pipeline: From Raw Sensor Reading to Live Dashboard in Under 100ms

    IoT Engineering · 13 min read

    Zero-Touch IoT Device Provisioning: Scaling from 10 to 100,000 Devices

    Embedded Systems · 13 min read

    UART vs SPI vs I2C: Choosing the Right Protocol for Sensor Integration

    IoT Engineering · 12 min read

    Real-Time IoT Alerting: From Simple Thresholds to ML Anomaly Detection

    Embedded Systems · 12 min read

    ESP32 Partition Table: Designing Flash Layout for Production Firmware

    IoT Engineering · 12 min read

    IoT Architecture Patterns: Hub-and-Spoke, Mesh, and Edge-Cloud Hybrid

    Embedded Systems · 13 min read

    IoT Battery Life Optimization: Engineering Devices That Last Years on a Single Charge

    IoT Engineering · 13 min read

    Time-Series Databases for IoT: InfluxDB vs TimescaleDB vs AWS Timestream

    Security · 14 min read

    Zero-Trust Security for Embedded IoT: Why Your Devices Are Probably Vulnerable

    Embedded Systems · 14 min read

    FreeRTOS on ESP32: Task Scheduling, Queues, and Resource Management for IoT

    IoT Engineering · 12 min read

    Building a Production IoT Gateway with Raspberry Pi and Node.js

    Embedded Systems · 13 min read

    ESP32 vs STM32: Choosing the Right Microcontroller for Your IoT Project

    Mobile Development · 10 min read

    Flutter + WebSocket: Building Real-Time IoT Dashboards That Don't Stutter

    IoT Engineering · 13 min read

    IoT Fleet Management at Scale: AWS IoT Core Device Registry and Provisioning

    IoT Engineering · 11 min read

    MQTT vs HTTP for IoT: Which Protocol Wins in Production?

    IoT Engineering · 12 min read

    ESP32 → MQTT → AWS IoT Core: The Production-Grade Architecture Guide

    Let's Build Together

    Got an IoT challenge?
    We've shipped it.

    Whether you need a fleet to track, a factory to monitor, or a farm to automate — our team has done it before and we'd love to build it with you. Typical response time: under 24 hours.

    No upfront commitment99.9% uptime SLANDA on requestFixed-price options