Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

optimize#36

Closed
rijnb wants to merge 15 commits into
mastermapcode-foundation/mapcode-cpp:masterfrom
feat/optimizemapcode-foundation/mapcode-cpp:feat/optimizeCopy head branch name to clipboard
Closed

optimize#36
rijnb wants to merge 15 commits into
mastermapcode-foundation/mapcode-cpp:masterfrom
feat/optimizemapcode-foundation/mapcode-cpp:feat/optimizeCopy head branch name to clipboard

Conversation

@rijnb

@rijnb rijnb commented May 28, 2026

Copy link
Copy Markdown
Member

No description provided.

rijnb and others added 15 commits May 28, 2026 08:16
- Replace memcmp-based NaN/Inf detection with isnan()/isinf() from <math.h>
  (portable, correct for all NaN bit patterns, and now checks latDeg for Inf too)
- Fix off-by-one in encodeLatLonToSelectedMapcode: > should be >= for index bound
- Remove always-false ASSERT after *s=0 in encodeExtension
- Fix convertFromAbjad/convertToAbjad to null-check strchr return before arithmetic
- Allow TERRITORY_NONE/TERRITORY_UNKNOWN in encodeLatLonToSingleMapcode per docs
- Fix convertUtf16ToUtf8 to return start pointer instead of post-null end pointer
- Explicitly initialize GLOBAL_MAKEISO_PTR (was accidentally correct via NULL)
- Replace sprintf with snprintf in convertToRoman (mapcode_legacy.c)
- Change UWORD from unsigned short int to uint16_t; add #include <stdint.h>
- Replace magic constant 128 with MAX_MAPCODE_RESULT_ASCII_LEN in encoderEngine
- Add regression tests for all fixed bugs in testBugFixes()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two-phase plan (A: local hot-path cleanups; B: precomputed companion
tables) targeting 20-50% wall-time reduction on `time ./unittest` at -O3,
preserving bit-exact output, strict portable C99/C11, and no runtime/heap
growth.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Step-by-step plan for Tasks 1-10 (baseline → A1-A5 → B3-B5 → results),
each with explicit file/line targets, code blocks, expected outputs,
and per-step commits.

Also fixes a small inaccuracy in the spec: RECORD_CODEX stores the
computed coDex value (10*(c/5)+(c%5+1)), not the raw flags & 31.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Measured `time ./unittest` on -O3 build (best of 3 runs):
  user time = T0 = 114.13s

All subsequent perf commits on this branch quote their best user time
and the cumulative delta vs. this baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In encoderEngine and decoderEngine inner loops, read
TERRITORY_BOUNDARIES[i].flags once per iteration into a const local and
extract bit fields from it, instead of using flag-extraction macros that
each re-dereference the same memory. Same change in firstNamelessRecord
and countNamelessRecords.

Bit-exact: macros stay defined and used in cold paths; only the inner
loop body call sites were rewritten to local reads.

  time ./unittest (best of 3, user):
    baseline (T0) = 114.13s
    after A1 (T2) = 111.78s
    delta         = 2.06% cumulative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Most territory rectangles are narrower in longitude than in latitude
relative to their bounding ranges, so testing longitude first short-
circuits faster on the typical reject case.

  time ./unittest (best of 3, user):
    baseline = 114.13s
    after A2  = 121.10s
    delta     = -6.11% cumulative (regression vs baseline)

Note: result is slower than A1 (T2=112.21s); the lon-first ordering
did not yield a speedup on this benchmark — likely because isInRange
carries more overhead than the simple comparison it replaces.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reverts b9b3f2c. isInRange() wraps longitude and has higher overhead
than simple comparisons; testing it first regressed user time by ~7.9s
vs A1 baseline. Reverting to lat-first ordering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
strcpy + strcat + strcat each re-scans the destination from the start
to find the null terminator. Replace with explicit strlen on each
source plus a single memcpy to copy result (including NUL).

Output bytes are unchanged.

  time ./unittest (best of 3, user):
    baseline = 114.13s
    after A3 = 113.19s
    delta    = 0.82% cumulative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Compute quotient once, derive remainder via subtraction so the loop
has one division-class op per character rather than two.

  time ./unittest (best of 3, user):
    baseline = 114.13s
    after A4  = 114.35s
    delta     = -0.19% cumulative (within noise floor)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Static tables RECORD_CODEX / RECORD_REC_TYPE / RECORD_KIND /
RECORD_HEADER_LETTER / RECORD_SMART_DIV precomputed once from
TERRITORY_BOUNDARIES.flags. Per-territory TERRITORY_FIRST_NAMELESS /
TERRITORY_NAMELESS_COUNT replace linear nameless scans later.

This commit only defines the tables and calls initCompanionTables at
the top of encodeLatLonToMapcodes_internal and decoderEngine. Hot loops
still use the existing macros; the switch happens in B4.

Memory footprint: ~74 KB of additional static data (.bss).

  time ./unittest (best of 3, user):
    baseline = 114.13s
    after B3 = 109.99s
    delta    = -3.6% cumulative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace per-iteration mask/shift extractions in encoderEngine,
decoderEngine, firstNamelessRecord, countNamelessRecords with direct
reads from RECORD_KIND / RECORD_CODEX / RECORD_REC_TYPE /
RECORD_HEADER_LETTER. The companion tables are byte-sized so each
field is a single byte load with friendlier cache behavior than
re-deriving from the 4-byte flags field each iteration.

Inner j-loops in decoderEngine also switch to RECORD_KIND[j] &
KIND_BIT_RESTRICTED instead of the IS_RESTRICTED(j) macro.

Values are derived from the same macros they replace (see B3 init);
output is bit-exact.

  time ./unittest (best of 3, user):
    baseline = 114.13s
    after B4 = 99.40s
    delta    = -13.1% cumulative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Under -DDEBUG, initCompanionTables asserts that every precomputed value
matches the macro-derived one. Free correctness guard during development;
zero cost in release.

Verified: both -O3 and -O0 -DDEBUG builds pass the unit suite.

  time ./unittest -O3 (best of 3, user):
    baseline = 114.13s
    after B5 = 99.73s
    delta    = -12.6% cumulative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Appends results table to design spec. Final speedup: ~12.6% on
`time ./unittest` at -O3. Target was 20-50%; achieved 12.6%.
Primary driver: B4 companion-table hot-loop reads. A2 reverted
(regression), A4/A5 no-ops at -O3. Binary size delta: +2240 bytes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per-commit timing table for feat/optimize branch with notes on why
each optimization landed, regressed, or was a no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rijnb rijnb closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.