Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MaxWolf-01/LNDP

Open more actions menu

Repository files navigation

An attempt at replicating Lifelong Neural Developmental Programs 1 in pytorch.

Visualization of a rollout in cartpole (preceded by spontaneous activity phase):

visualization.mp4

... not quite there yet, regarding performance. 2

What I tried to fix it, my suspicions, and how I would debug it further - tried a bunch of hyperparameter config and ablations of config values which were unclear from the paper & original implementation (partly conflicting) - logged a lot of intermediate values (but couldn't find anything too suspicious, except an unusually high number of edges with some settings) - vibe debugging

As for why it doesn't learn properly, I have some guesses:

  • rng goofed up somewhere
  • misinterpretation of hyperparameters from original implementation & paper
  • subtle bugs around masking, timing (esp add/prune), indexing, initialization, etc.
  • goofed up something specific to the cartpole env?

How I would debug it further:

  • actually think a bit harder about the behavior of the model (visualization, statistics)
  • train in a different env from the paper (e.g. with discrete actions)
  • but first, I would rewrite it (since my understanding made leaps since I wrote this) in jax (ecosystem+parallelism+rng+speed etc.; harder to goof up)
image image

Mean fitness 50, max fitness 350 on cartpole. Seems like a big discrepancy, actually? Maybe this points to a bug with the optimization, as opposed to the architecture implementation?

Some lessons learned:

  • no premature optimization: first make the code run, then make it pretty/efficient/modular/thoroughly typed
  • don't try to be smart: add your own ideas after you have a working baseline
  • put your (own) mind to it, or let it be: even the best LLMs [2025] fail hard at vibe-debugging nontrivial ML code (hallucinations, getting lost in dead ends, etc.)

Setup:

git clone git@github.com:MaxWolf-01/LNDP.git
cd LNDP
uv sync --all-extras
source .venv/bin/activate

Run cartpole with default hparams and wandb logging:

python experiments/cartpole_paper.py --wandb.enabled true

Footnotes

  1. In case you don't know what it is, here's a TLDR. For the conceptual picture, I recommend reading the paper, as my note mostly focuses on implementation details.

  2. Ultimately, I left it here since I had obtained a deeper understanding by getting my hands dirty, and I've since been busy collecting stepping stones towards successors which break with some assumptions of LNDP; getting this to work wouldn't be worth the effort. What's more, I worked on this during conscription - limited time/attention/patience.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.