optimize by rijnb · Pull Request #36 · mapcode-foundation/mapcode-cpp

rijnb · May 28, 2026

No description provided.

- Replace memcmp-based NaN/Inf detection with isnan()/isinf() from <math.h> (portable, correct for all NaN bit patterns, and now checks latDeg for Inf too) - Fix off-by-one in encodeLatLonToSelectedMapcode: > should be >= for index bound - Remove always-false ASSERT after *s=0 in encodeExtension - Fix convertFromAbjad/convertToAbjad to null-check strchr return before arithmetic - Allow TERRITORY_NONE/TERRITORY_UNKNOWN in encodeLatLonToSingleMapcode per docs - Fix convertUtf16ToUtf8 to return start pointer instead of post-null end pointer - Explicitly initialize GLOBAL_MAKEISO_PTR (was accidentally correct via NULL) - Replace sprintf with snprintf in convertToRoman (mapcode_legacy.c) - Change UWORD from unsigned short int to uint16_t; add #include <stdint.h> - Replace magic constant 128 with MAX_MAPCODE_RESULT_ASCII_LEN in encoderEngine - Add regression tests for all fixed bugs in testBugFixes() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two-phase plan (A: local hot-path cleanups; B: precomputed companion tables) targeting 20-50% wall-time reduction on `time ./unittest` at -O3, preserving bit-exact output, strict portable C99/C11, and no runtime/heap growth. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Step-by-step plan for Tasks 1-10 (baseline → A1-A5 → B3-B5 → results), each with explicit file/line targets, code blocks, expected outputs, and per-step commits. Also fixes a small inaccuracy in the spec: RECORD_CODEX stores the computed coDex value (10*(c/5)+(c%5+1)), not the raw flags & 31. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Measured `time ./unittest` on -O3 build (best of 3 runs): user time = T0 = 114.13s All subsequent perf commits on this branch quote their best user time and the cumulative delta vs. this baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In encoderEngine and decoderEngine inner loops, read TERRITORY_BOUNDARIES[i].flags once per iteration into a const local and extract bit fields from it, instead of using flag-extraction macros that each re-dereference the same memory. Same change in firstNamelessRecord and countNamelessRecords. Bit-exact: macros stay defined and used in cold paths; only the inner loop body call sites were rewritten to local reads. time ./unittest (best of 3, user): baseline (T0) = 114.13s after A1 (T2) = 111.78s delta = 2.06% cumulative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Most territory rectangles are narrower in longitude than in latitude relative to their bounding ranges, so testing longitude first short- circuits faster on the typical reject case. time ./unittest (best of 3, user): baseline = 114.13s after A2 = 121.10s delta = -6.11% cumulative (regression vs baseline) Note: result is slower than A1 (T2=112.21s); the lon-first ordering did not yield a speedup on this benchmark — likely because isInRange carries more overhead than the simple comparison it replaces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reverts b9b3f2c. isInRange() wraps longitude and has higher overhead than simple comparisons; testing it first regressed user time by ~7.9s vs A1 baseline. Reverting to lat-first ordering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

strcpy + strcat + strcat each re-scans the destination from the start to find the null terminator. Replace with explicit strlen on each source plus a single memcpy to copy result (including NUL). Output bytes are unchanged. time ./unittest (best of 3, user): baseline = 114.13s after A3 = 113.19s delta = 0.82% cumulative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Compute quotient once, derive remainder via subtraction so the loop has one division-class op per character rather than two. time ./unittest (best of 3, user): baseline = 114.13s after A4 = 114.35s delta = -0.19% cumulative (within noise floor) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Static tables RECORD_CODEX / RECORD_REC_TYPE / RECORD_KIND / RECORD_HEADER_LETTER / RECORD_SMART_DIV precomputed once from TERRITORY_BOUNDARIES.flags. Per-territory TERRITORY_FIRST_NAMELESS / TERRITORY_NAMELESS_COUNT replace linear nameless scans later. This commit only defines the tables and calls initCompanionTables at the top of encodeLatLonToMapcodes_internal and decoderEngine. Hot loops still use the existing macros; the switch happens in B4. Memory footprint: ~74 KB of additional static data (.bss). time ./unittest (best of 3, user): baseline = 114.13s after B3 = 109.99s delta = -3.6% cumulative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace per-iteration mask/shift extractions in encoderEngine, decoderEngine, firstNamelessRecord, countNamelessRecords with direct reads from RECORD_KIND / RECORD_CODEX / RECORD_REC_TYPE / RECORD_HEADER_LETTER. The companion tables are byte-sized so each field is a single byte load with friendlier cache behavior than re-deriving from the 4-byte flags field each iteration. Inner j-loops in decoderEngine also switch to RECORD_KIND[j] & KIND_BIT_RESTRICTED instead of the IS_RESTRICTED(j) macro. Values are derived from the same macros they replace (see B3 init); output is bit-exact. time ./unittest (best of 3, user): baseline = 114.13s after B4 = 99.40s delta = -13.1% cumulative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Under -DDEBUG, initCompanionTables asserts that every precomputed value matches the macro-derived one. Free correctness guard during development; zero cost in release. Verified: both -O3 and -O0 -DDEBUG builds pass the unit suite. time ./unittest -O3 (best of 3, user): baseline = 114.13s after B5 = 99.73s delta = -12.6% cumulative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Appends results table to design spec. Final speedup: ~12.6% on `time ./unittest` at -O3. Target was 20-50%; achieved 12.6%. Primary driver: B4 companion-table hot-loop reads. A2 reverted (regression), A4/A5 no-ops at -O3. Binary size delta: +2240 bytes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Per-commit timing table for feat/optimize branch with notes on why each optimization landed, regressed, or was a no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rijnb and others added 15 commits May 28, 2026 08:16

updagted version

ea028ff

docs: add perf results table to docs/superpowers

d5778ab

Per-commit timing table for feat/optimize branch with notes on why each optimization landed, regressed, or was a no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rijnb closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize#36

optimize#36
rijnb wants to merge 15 commits into
mastermapcode-foundation/mapcode-cpp:masterfrom
feat/optimizemapcode-foundation/mapcode-cpp:feat/optimizeCopy head branch name to clipboard

rijnb commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

Conversation

rijnb commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant