What's Left in This Battery? Four Tiny Models Inside an ESP32

The Question

It is a question every lithium cell eventually gets asked, and almost never answered honestly: how much are you worth, and how long will you last?

Good answers are expensive. A laboratory impedance spectrometer will tell you everything, and it costs more than the product. A datasheet will tell you very little, and it costs nothing. What most products do is somewhere in between: a fuel gauge IC that integrates current over time, drifts silently for a week, and occasionally insists — with perfect confidence — that a clearly flat cell is at 47%.

We wanted the middle ground, done well, on a chip that fits on a postage stamp. An ESP32-S3 with a thermistor, a shunt, a pair of relays, and a small collection of trained models that would tell you four things about the cell it was watching: where it is in its charge cycle, how much capacity it still has, roughly how many cycles before it needs replacing, and — for a brand-new cell the device has never seen — a quick read of its state from a 30-second characterisation pulse.

Four Models, One Cell

One model would have been tidier. Four is what the problem actually needed.

The reason is that the useful questions live on different timescales. State-of-Charge during a discharge is a continuous regression over sampled voltage and current; State-of-Health is a per-cycle classification that only updates when a full charge/discharge has happened; Remaining-Useful-Life is a regression against the same cycle-scale features with a different loss; and a plug-in estimate — "I just met this cell, tell me something useful before I wait an hour" — needs a model trained on rest-to-load transients, not full cycles.

FIG 01·FOUR-MODEL PIPELINE·RANDOM FORESTS FROM SCIKIT-LEARN, EXPORTED AS STANDALONE C

The active SoC model runs continuously during a test, predicting where in its charge cycle the cell is from eight real-time features. The plug-in SoC model is a simpler sibling that works on the first 30 seconds a cell is connected, using only the features you can derive from a single load pulse — OCV, temperature, IR drop, a short dV/dt. The SoH classifier buckets the cell into good / degraded / end-of-life once per cycle, and the RUL regressor estimates how many cycles remain before it falls below 60% of rated capacity.

All four ingest different slices of the same telemetry. All four are random forests — small enough to fit, fast enough to run in under a millisecond, and, critically, exportable to standalone C with no dependencies at all.

Features That Actually Move

The sensors give us three signals: voltage from the INA219 bus input, current from the shunt across 0.1 Ω of sense resistor, and temperature from a 10 kΩ NTC via a voltage divider on the ADC. Everything else is computed.

The most load-bearing feature, in the model-importance plots, is not voltage or current. It is IR drop — the instantaneous voltage delta when current switches on, divided by the current. It captures the cell’s internal resistance, which rises monotonically with age and is surprisingly tolerant to noise. On a cell with 200 milliohms of accumulated impedance, the IR drop at a 500 mA load is a clean 100 mV signal; on a fresh cell it’s half that. The model reads it and knows.

Second is dV/dt during constant-current charge. As cells age, the flat part of the charge curve stops being flat. The voltage climbs a little faster for the same current because the cell has less capacity to fill. It’s not a huge signal — a few millivolts per second difference between a healthy cell and a degraded one — but it shows up cleanly in a Savitzky-Golay derivative with an 11-sample window.

Third is time-to-CV: how many seconds of constant-current charging before the cell hits its termination voltage and the charger transitions to constant-voltage taper. A fresh cell takes a while; a degraded one gets there fast. This was the single best feature for separating the middle class — degraded — from the two extremes.

“The voltage tells you where the cell is. The derivative of the voltage tells you who the cell is.”— stuck on the lab whiteboard, uncredited

Why Not an LSTM

Every time someone says "predict something from a time series," half the room reaches for an LSTM and the other half reaches for a transformer. We did neither. Four reasons.

Quantisation. A random forest is exact. A decision tree compares a feature against a threshold and picks a branch; there is no floating-point weight to quantise and no accumulated error to bound. A neural net, shrunk to INT8, introduces quantisation noise on every multiply. On 2.0 Ah batteries where 60 mAh is the difference between "good" and "degraded," we were not in a mood to accept that noise.

Export. The forests convert to C through micromlgen: each tree becomes a nested if/else walk, each leaf is a constant. No TensorFlow Lite Micro runtime, no interpreter, no ops-table, no CMSIS-NN dependencies. One header file per model, included directly:

C++ · inference shape

// Feature scaler exported alongside each model.
// MinMaxScaler fit on training cells, fixed at build time.
static const float SOC_MIN[8] = { 2.80f, -3.00f, -20.0f, /* ... */ };
static const float SOC_MAX[8] = { 4.20f,  3.00f,  60.0f, /* ... */ };

float runSocActive(float V, float I, float T, /* five more */) {
    float x[8] = { V, I, T, /* ... */ };
    for (int i = 0; i < 8; i++) {
        x[i] = (x[i] - SOC_MIN[i]) / (SOC_MAX[i] - SOC_MIN[i]);
    }
    float soc = SoCActiveModel.predict(x);  // compiled trees
    return clampf(soc, 0.0f, 100.0f);
}

Interpretability. When a field cell gets flagged end-of-life, we can say precisely which features drove the call — feature-importance plots are essentially free for tree models. With an LSTM, we’d be pointing at attention heatmaps and hoping.

Determinism. Trees produce the same output, bit-for-bit, every run. On an MCU with nothing better to do, that makes bench validation trivial — you record the features on the device, run the scikit-learn model against them on a laptop, and confirm zero divergence. With a quantised neural net you’re chasing rounding differences for two days.

Getting It Onto a Chip

The build pipeline, compressed into one diagram, is: train on a laptop, export to a header, include the header, compile. No model weights file to copy, no runtime to link, no chance of a checkpoint-version mismatch in the field.

Sizes of the four generated headers:

TBL 01 · Generated C headers and what’s in them
Header	Size	Lines	Content
`model_1a.h`	~1.5 MB	28,811	SoC active · 20 trees × depth 8, 8 features
`model_1b.h`	~75 KB	1,731	SoC plug-in · 15 trees × depth 6, 5 features
`model_2.h`	~51 KB	1,251	SoH classifier · 15 trees × depth 6, 12 features
`model_rul.h`	~189 KB	4,227	RUL regressor · 15 trees × depth 6, 12 features

All four fit comfortably in the ESP32-S3’s 4 MB app partition — we’re using the huge_app.csv layout and consuming roughly 1.8 MB of it for inference code. Per-inference cost is under a millisecond for the smaller forests and a couple of milliseconds for the active-SoC model; the 500 ms sampling loop has plenty of headroom.

Scaler constants are exported alongside — two float[] arrays per model, min and max, fit on the training cells and frozen at build time. There is no scaler fit to the field data; we decided early that inference-time re-fit was a bug surface we didn’t want.

The Cell-Out Rule

Here is the single biggest thing we got right, and it is not the model.

The datasets — NASA Ames PCoE #5 for cycle-aggregate aging data, #11 for the randomized-usage timestep data — contain multiple cells each. The lazy split is: shuffle all samples across cells, take 80% for training, 20% for test. The model posts excellent numbers. It is also lying. When the same cell appears in both training and test sets, the model is learning the cell, not the chemistry.

We used cell-out cross-validation instead. For every fold, one cell is held out entirely; the model never sees it during training, and it’s the only cell used for evaluation. Performance drops visibly compared to the naive split — and then it becomes honest. The MAE you see on a held-out cell is the MAE you can expect on a cell you ship to a customer. That’s the number that matters.¹

Our targets, measured this way: SoC active model MAE ≤ 3%, plug-in MAE ≤ 6%, SoH weighted F1 ≥ 0.90 with special attention to end-of-life recall. Missing an end-of-life cell is a much worse failure than calling a degraded cell end-of-life early, and the class weights reflect that.

Parameters

The cell profile and a handful of timing constants define the behaviour:

TBL 02 · Cell profile and sampling constants
Name	Value	What it pins
`Q_RATED`	2.0 Ah	Nominal capacity for a fresh 18650; denominator for SoH and RUL.
`V_CHARGE_TERMINATION`	4.20 V	Transition from constant-current to constant-voltage phase.
`V_DISCHARGE_CUTOFF`	2.80 V	0% SoC anchor.
`I_REST_THRESHOLD`	0.05 A	Below this, the cell is considered at rest (enables OCV read).
`OCV_SETTLE_MS`	15 000 ms	Rest duration before sampling open-circuit voltage.
`OCV_LOAD_PULSE_MS`	500 ms	Brief load pulse used to compute IR drop and plug-in features.
`SAMPLE_INTERVAL_MS`	500 ms	Sensor-read cadence; all rolling windows are sized from this.

There’s a nice hardware trick hiding in that table. Most naive BMS designs wire the load resistor permanently in series with the cell, which makes a true open-circuit voltage reading impossible — there is always a shunt voltage. We put the load on a relay. When the state machine wants an OCV read, it opens the relay, waits OCV_SETTLE_MS, and the voltage the INA219 reports is actually the unloaded cell. A 100 kΩ pull-down on V- keeps the ADC reference sane when the load is disconnected. That relay, and the 15 seconds it adds to any plug-in read, is what makes the plug-in model honest.

What We’d Change

Three directions for a second iteration.

1. A real temperature compensation step.

Cells at 0 °C behave like different cells entirely. The current model includes temperature as a feature, which helps, but below 10 °C the IR-drop signal is dominated by temperature effects and the health signal gets buried. The fix is a per-temperature-band sub-model, or a temperature-dependent normalisation before feature extraction. We deferred it because our bench is air-conditioned. A shipped product isn’t.

2. An online re-fit step.

The MinMaxScaler is frozen at build time. In principle, you could ship the device, let it see a few hundred cycles of a specific cell, and re-centre the scaler around that cell’s operating range. Done badly this is a source of drift; done well it sharpens the SoH classifier for an individual cell’s aging trajectory. We have a design sketched for it and no deployment yet to motivate finishing.

3. A data logger feeding back to training.

Every inference the device makes could be logged — features, prediction, eventual outcome — and shipped back periodically. With enough cells in the field, you get a training set that looks like your production fleet, and you can retrain on that instead of a NASA test bench. This is the single highest-leverage thing a team with this device in the field can do, and it is always the last thing that gets built.²

Footnotes

We wish someone had told us this, instead of us having to learn it at 11pm on the Tuesday before a customer demo.
This is because by the time a fleet is big enough to matter, the team is too busy putting out fires to build the thing that would, eventually, stop most of the fires.

← All notesHave a firmware problem like this? [email protected]