Why this page matters

This page explains how Troubleshooting fits into the wider ZeroKernel execution model, what problem it is meant to solve, and what trade-off you are actually accepting when you use it in production firmware. The goal is not to treat Troubleshooting as an isolated API call, but to understand where it sits inside bounded scheduling, queue discipline, fault visibility, and profile selection.

Read this topic as an operational contract. Start from the smallest working path, wire it into a lean profile first, and only expand into richer routing, diagnostics, or transport state after you can prove that the timing outcome is still worth the extra flash and RAM. That mindset is what keeps ZeroKernel useful on small boards instead of turning it into another bloated abstraction.

The safest pattern is always the same: define the runtime boundary, keep the hot path short, measure the effect with compare scripts, and only then scale complexity. The examples below are not filler; they show the smallest repeatable patterns you can lift into real firmware when you need clean integration instead of ad-hoc loops.

Three practical patterns

Full validation sequence

Use this when you need a credible regression pass before publishing numbers or changing docs.

    bash scripts/run_desktop_tests.sh
bash scripts/run_desktop_benchmark.sh --enforce-performance
bash scripts/run_resource_matrix.sh --enforce-budget

Hardware compare pass

Run a focused hardware compare instead of guessing whether a change helped or hurt.

    bash scripts/run_esp32_modules_compare.sh /dev/ttyUSB1
bash scripts/run_esp32_real_project_demo.sh /dev/ttyUSB1

Lean build guard

Lock the build into the intended profile before treating a benchmark or compare as authoritative.

    -DZEROKERNEL_PROFILE_LEAN_NET
-DZEROKERNEL_ENABLE_DIAGNOSTICS=0
-DZEROKERNEL_ENABLE_LEGACY_LABEL_API=0

What to verify while you use it

Validate timing before you validate aesthetics. A cleaner API is not a win if fast misses rise.
Prefer the smallest profile that still matches the workload, then add optional modules only when the measured payoff is obvious.
Keep callbacks and transport steps bounded so watchdog, panic flow, and queue limits remain meaningful.

Common mistakes that make results misleading

Do not copy a demo pattern into production firmware without measuring it on the real board and real build profile you plan to ship.
Do not read success counters without reading queue depth, timing, and workload label next to them.
Do not enable heavier diagnostics and compatibility flags in a lean target just because the defaults looked convenient.

Recommended working sequence

Start from the smallest valid path

Boot the runtime, register the minimum useful task set, and prove that the baseline timing is clean before adding optional layers.

Add one layer, then measure it

Introduce routing, diagnostics, or transport one layer at a time so the cost and payoff remain obvious.

Publish only repeatable results

Update docs, charts, or public claims only after the same workload survives the same validation path more than once.

When timing looks worse than expected

Check whether the workload is truly non-blocking, not just shorter than before.
Check whether queue drain budgets are too small for the publication rate.
Check whether compare output is using the same profile and board clock on both runs.

When transport fail counts look scary

Verify whether the demo injects synthetic failures on purpose.
Read success and fail counts together with total attempts and queue depth.
Use a realistic workload page, not only a synthetic stress page, before judging the module quality.

When serial output looks broken

Confirm first whether the board is actually crashing, or whether the serial monitor simply attached mid-stream. Hardware logging issues can look like runtime issues if you trust the raw output too quickly.

Troubleshooting FAQ

What should I check first when numbers change unexpectedly?

Check the build profile, the exact board target, and the workload definition before changing code.

Should I fix everything in the application first?

No. If the problem is generic—scheduler timing, state logic, queue behavior—fix it in ZeroKernel so every project benefits.

What is the safest way to validate this page on real hardware?

Start from the leanest profile that still matches the topic, run the narrowest compare script for this behavior, and only then move to heavier mixed workloads. Do not jump straight to a fully loaded build if the base timing is not yet proven.