STM32 HAL vs Low-Level Drivers: When the Abstraction Costs You Too Much
ST's HAL (Hardware Abstraction Layer) is one of the most controversial topics in the STM32 community. Some engineers swear by it for its readability and portability. Others consider it bloated and unsuitable for production. The truth is more nuanced: HAL is excellent for 80% of use cases and genuinely unsuitable for the remaining 20%.
This guide shows you exactly where that line is, with real overhead numbers and code examples showing the LL alternative.
What HAL Does Under the Hood
HAL adds several layers on top of direct register access:
HAL_GetTick()For a simple GPIO toggle, HAL_GPIO_TogglePin compiles to roughly 8 instructions. The LL equivalent LL_GPIO_TogglePin compiles to 2 instructions (read-modify-write BSRR register). At 168 MHz, that is a 47 ns vs 12 ns difference — negligible for most work.
Where it matters is inside ISRs and tight control loops.
ISR Latency: HAL vs LL
HAL timer interrupt handlers call HAL_TIM_IRQHandler, which dispatches through a switch-case state machine before calling your callback. On an STM32F4 at 168 MHz:
With LL and direct ISR implementation:
// Direct ISR — no HAL dispatch overhead
void TIM2_IRQHandler(void) {
if (LL_TIM_IsActiveFlag_UPDATE(TIM2)) {
LL_TIM_ClearFlag_UPDATE(TIM2); // Your control loop code here — first instruction at ~15 cycles from IRQ assert
run_control_loop();
}
}
For a 20 kHz motor control PWM (50 µs period), 550 ns jitter is 1.1% of the period. For a 100 kHz servo update loop, it is 5.5% — unacceptable.
SPI Performance: HAL Blocking vs LL DMA
HAL's blocking SPI transfer (HAL_SPI_Transmit) polls the TXE flag in a loop, wasting CPU cycles. For a 256-byte SPI flash page write at 10 MHz:
// LL DMA SPI transmit — non-blocking
void spi_transmit_dma_ll(const uint8_t *data, uint16_t len) {
// Configure DMA stream for SPI1 TX
LL_DMA_SetMemoryAddress(DMA2, LL_DMA_STREAM_3, (uint32_t)data);
LL_DMA_SetDataLength(DMA2, LL_DMA_STREAM_3, len); // Enable DMA transfer complete interrupt
LL_DMA_EnableIT_TC(DMA2, LL_DMA_STREAM_3);
// Start SPI DMA
LL_SPI_EnableDMAReq_TX(SPI1);
LL_DMA_EnableStream(DMA2, LL_DMA_STREAM_3);
LL_SPI_Enable(SPI1);
}
// DMA transfer complete ISR
void DMA2_Stream3_IRQHandler(void) {
if (LL_DMA_IsActiveFlag_TC3(DMA2)) {
LL_DMA_ClearFlag_TC3(DMA2);
LL_DMA_DisableStream(DMA2, LL_DMA_STREAM_3);
LL_SPI_Disable(SPI1);
// Signal completion via semaphore
BaseType_t woken;
xSemaphoreGiveFromISR(spi_done_sem, &woken);
portYIELD_FROM_ISR(woken);
}
}
CPU utilization drops from ~100% (HAL polling) to ~2% (LL DMA with interrupt).
Profiling Techniques
Before switching to LL, measure. Use a GPIO toggle and an oscilloscope or logic analyzer:
// Profile an ISR or function with GPIO toggle
void TIM2_IRQHandler(void) {
LL_GPIO_SetOutputPin(GPIOC, LL_GPIO_PIN_13); // Rising edge = ISR start if (LL_TIM_IsActiveFlag_UPDATE(TIM2)) {
LL_TIM_ClearFlag_UPDATE(TIM2);
run_control_loop();
}
LL_GPIO_ResetOutputPin(GPIOC, LL_GPIO_PIN_13); // Falling edge = ISR end
}
Capture this on a logic analyzer. Measure pulse width (ISR execution time) and period jitter (scheduling consistency). This gives you hard data to justify the switch to LL.
Mixing HAL and LL: The Practical Approach
You do not have to choose globally. Use HAL for initialization (peripheral clock enable, GPIO mode config, NVIC setup) — it is excellent at this. Use LL for the hot paths inside ISRs and tight loops.
void MX_SPI1_Init(void) {
// HAL for initialization — readable, handles clock enables automatically
hspi1.Instance = SPI1;
hspi1.Init.Mode = SPI_MODE_MASTER;
hspi1.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_4;
hspi1.Init.DataSize = SPI_DATASIZE_8BIT;
HAL_SPI_Init(&hspi1);
// Now SPI1 is configured — switch to LL for transfers
}// Runtime transfers use LL directly — no HAL overhead
uint8_t spi_transfer_byte_ll(uint8_t data) {
while (!LL_SPI_IsActiveFlag_TXE(SPI1));
LL_SPI_TransmitData8(SPI1, data);
while (!LL_SPI_IsActiveFlag_RXNE(SPI1));
return LL_SPI_ReceiveData8(SPI1);
}
When LL Matters in Production
Use LL drivers (or direct register access) for:
For everything else — UART debug, I2C sensor reads, slow GPIO control — HAL is the right tool. Its readability, portability across STM32 families, and integration with STM32CubeMX code generation save far more engineering time than LL would recover.
For projects requiring both real-time control (LL) and cloud connectivity, pair your STM32 with an ESP32 as a WiFi co-processor. See our [ESP32 vs STM32 comparison](/esp32-vs-stm32-microcontroller-comparison) for the dual-chip architecture pattern.
[Contact Code Caracal](/contact) — we build production firmware for clients across 15+ countries.