Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

jaedmunt/mozaic-tech-test

Open more actions menu

Repository files navigation

BNG Triage  ·  Nature Intelligence Micro-Tool

Pre-screening tool for biodiversity net gain (BNG) in England. Drop a pin, draw a boundary, or search an address to instantly surface habitat type, statutory designations (SSSI, SAC, SPA), priority habitats, and flood risk - before commissioning a formal survey.

BNG Triage demo


Contents

Housekeeping

  • Task description is gitignored; it lives in /docs.
  • See Getting Started for deployment instructions.
  • Requires a GROQ_API_KEY in .env for the AI summary drawer (copy .env.example).

Deliverables

  • Git repo
  • Frontend
  • Backend
  • AI feature
  • README.md with:
    • Idea
    • Thinking
    • Reflections
    • Setup

Design philosophy and use of agents. I use AI to search and summarise research effectively. I tend to read and explore a problem myself, plan the architecture and scaffold and contribute early steps by hand, and then gradually ramp up usage. In this way, I prefer to use AI for 'colouring in' rather than definining the boundaries to be coloured, metaphorically.

I use it as much as I can for generating tests ands thinking of edge cases. In tandem, I am starting to use AI for code reviews but this project is sufficiently short not to establish a full review pipeline with CodeRabbit/PRs via Graphite etc which is better when working in a team or with multiple agents. My CI/CD process means all code from the get-go is tested, using local build/test and then github actions as part of this process.

Moreover, I think that LLMs are great at speeding up documentation. While I have snippets in my IDE configured such that I can quickly template function and class hints hints as a docc (documentation for classes) and docf (documentation for functions).

When exploring external API responses I use jnv to interactively filter and inspect JSON in the terminal before writing any parsing logic.

Once I know what problem I am solving, I start with building an early prototype, whether a CLI tool or containerised API constituting a the backend API and integrating external services. The purpose of these is to get it to work first at the simplest possible level, hwoever ugly or rudimentary. I take extra care during these refactors as going from driving raw binaries to API access can leave security vulnerabilties.

Then I build UI/UX on top to match user needs and expectations. Depending on the end user's needs and requirements, I try to opt for low deps UI frameworks. I do this because there is an ever growing list of CVEs, large ands complex footprints of dependencies and others which means maintainability and security is becoming harder. I also don't like waiting, unnecessarily, for deps to download and builds complete. This isn't just personal preference or laziness but is critical to developer momentum and speed. The CVS/maintainance portion is relevant to handover and collaboration. This project should ideally survive with minimal maintenance, in perpetuity. One example choice might be use to use no dep libraries like [Oat.Ink](documentation for classes) instead of ShadCN.

Ideation

I want to keep this task grounded to Mozaic's typical focus, find a problem in a similar space I can decompose and build for, with a narrow focused scope. I don't want to think AI first, but explore the problem and see where AI can add value. I prefer to keep what can be deterministic, so, and use LLMs/AI where they solves a real problem.

Scope:

  • Nature Tech, Climate Adaptation, or Geospatial Data.
  • Clean functional UI
  • Functional API
  • AI integration

The scope focus should be:

  • Site-level
  • AI/CV/ML capability with real value.
  • Strategic choices > scale/perfection

Tool use:

  • Leverage AI tools/services.
  • Document how I guided an AI agent to build a cohesive system.
  • What is the user-AI interaction like?

Idea

Problem Since February 2024, the UK Environment Act 2021 mandates that all new developments deliver at least 10% biodiversity net gain. The statutory metric tool exists, but it requires ecological expertise and significant manual data gathering. Given my initial interview in which we discussed BNG, I think this is a good issue to explore and build for. It has a regulatory aspect, clear scope and gap to address, and a wide user base that a tool could be useful for.

Small developers incl. architects, planning consultants, rural landowners face a compliance burden they cannot readily interpret. Existing tools like BioGain automate reports but don’t give granular, site‑specific insight into likely habitat sensitivity, nearby ecological constraints, and BNG complexity in a way that non‑specialists can understand and act on. In the research below, the constraints are clear about official tool use and this is factored into how this tool is designed and presented.

Solution An easy-to-use initial desktop triage. The way I think about this is that this triage tool is to a formal BNG calculator what a desktop valuation from Zoopla/Rightmove is to a paid survey.

Product flow:

  • Let user enter a site boundary between 0-1 hectares
    • Optionally, look up an approximate parcel/title boundary (bounded by API access/cost)
  • On the backend, intersect that polygon with some open datasets/APis to answer:
    • What habitat types are likely on this site?
    • Does it overlap priority habitats or protected sites (SSSI etc.)?
    • Are there any obvious planning / flood constraints?
    • Is this site likely low / medium / high complexity from a BNG/ecology point of view?
  • Use an LLM to give a short plain English summary
  • Explicitly state this is not a statutory calculator and share useful links to material assisting with this

Research

Collected snippets

Key points:

  • You must use the statutory biodiversity metric tool for mandatory BNG. Do not use previous versions (4.0 or before) as these calculations will not be accepted in planning applications.
  • If you are not able to meet BNG requirements by making on-site or off-site gains, you can buy statutory biodiversity credits as a last resort.
  • The metric tool is not intended to be a one-off step in your design and planning process. It’s advisable to use it repeatedly as you refine your plans.
  • For example, the metric tool might calculate that developing on the woodland on your site would cost you 20 units, plus you would need to generate 2 units to achieve a 10% net gain. But developing on modified grassland would only cost you 8 units, plus you would need to generate 0.8 units to achieve a 10% net gain.
  • Small developments are required to achieve BNG.
  • A small development is defined as:
    • Residential development:
      • 1–9 dwellings on a site of area ≤ 1 hectare, or
      • If dwelling count is unknown, a site area < 0.5 hectares
    • Commercial development:
      • Floor space created < 1,000 m² or total site area < 1 hectare
    • Development that is not:
      • Winning and working of minerals or use of land for mineral‑working deposits
      • Waste development

Data Sources

I used Perplexity to collect the open-access APIs available and their documentation so i could manually explore their offering and interfaces.

Key datasets I care about:

  • Living England Habitat Map
    National habitat / land-cover map that gives a best-guess habitat class for each location in England (e.g. arable, woodland, grassland). Used here to infer likely baseline habitat types inside a site polygon.

  • Priority Habitats Inventory (England)
    Polygons of priority habitats (high‑value for biodiversity). Used here to flag when a site overlaps or is close to sensitive habitat that will likely increase BNG complexity and ecological scrutiny.

  • Protected and designated sites (e.g. SSSI, SAC, SPA)
    Boundaries of Sites of Special Scientific Interest and other designated sites. Used here to detect if a site is on/near a protected area, which is a strong “handle with care” signal.


https://magic.defra.gov.uk/
MAGIC (Defra). A public web map that aggregates hundreds of environmental layers from Natural England, Defra, the Environment Agency and others. In practice, MAGIC points you back to the same underlying datasets and services as the Natural England geoportal, but in a single map UI. I treat it mainly as a reference / validation map rather than as a direct API.


https://www.planning.data.gov.uk/docs
Planning Data API for England. Open JSON/GeoJSON API for planning-related datasets (title boundaries, conservation areas, flood‑risk zones, etc.). I use this where I need live planning context or a rough parcel boundary.

Relevant usage patterns:

  • Title boundary lookup
    Given a location or identifier, return indicative title/parcel geometry. Used as a starting point when the user doesn’t have their own boundary file.

  • Planning constraint overlays
    Query datasets like flood‑risk‑zone, conservation‑area, etc. near the site centroid to add simple planning context (these are secondary to the habitat / designation data).


https://www.api.gov.uk/ea/flood-monitoring/
Environment Agency Flood Monitoring API. Open JSON API with flood warnings, alert areas, stations and measurements. This is optional “extra colour”: I can check if the site falls within any active flood area or near relevant stations, but it’s not core to the BNG logic.

Implications

  • The statutory metric is mandatory for formal BNG submissions;
  • The statutory guidance is very clear that the official metric must be used in planning submissions, which constrains how I describe and scope the tool. Therrefore this tool should only be positioned as pre‑screening, not as a replacement calculator.
  • Small developments are explicitly in‑scope for BNG, and are exactly the segment that struggles most with cost and complexity.
  • The iterative use of the metric suggests a workflow where a pre‑screen can guide early layout decisions before more expensive analysis.

Data / API notes

Natural England Open Data Geoportal

URL:
https://naturalengland-defra.opendata.arcgis.com/

Docs: https://developers.arcgis.com/rest/

What it is:
Public ArcGIS-based portal for Natural England datasets (mostly under the Open Government Licence). You can:

  • Download static files (GeoJSON, Shapefile, FileGDB), or
  • Query hosted ArcGIS REST/feature services.

Example datasets:

  • Living England Habitat Map
    National habitat / land-cover map giving a best-guess “broad habitat” class for each location in England (e.g. arable, woodland, grassland). Good for answering: “what habitat types intersect this polygon?”

  • Priority Habitats Inventory (England)
    Polygons of priority habitats (e.g. lowland meadows, ancient woodland). Attributes include habitat type and other metadata.

  • Protected and designated sites (e.g. SSSI, SAC, SPA)
    Polygons for Sites of Special Scientific Interest and other designations, with fields like site name and designation type.

Interface shape (ArcGIS REST):

  • Query endpoint pattern (simplified):

    .../FeatureServer/0/query?geometry=<your-geometry>&geometryType=esriGeometryPolygon&spatialRel=esriSpatialRelIntersects&outFields=*&f=json
  • Key params:

    • geometry – your site polygon or bbox (often Esri JSON or WKT)
    • geometryTypeesriGeometryPolygon, esriGeometryPoint, etc.
    • spatialRel – usually esriSpatialRelIntersects
    • outFields* or a comma-separated list of attributes
    • fjson
  • Response (simplified):

    {
      "features": [
        {
          "geometry": { "...": "..." },
          "attributes": {
            "HABITAT_CODE": "Arable",
            "OBJECTID": 123,
            "..." : "..."
          }
        }
      ]
    }

MAGIC (Defra map)

URL:
https://magic.defra.gov.uk/

What it is:
Public web map viewer that aggregates hundreds of environmental layers from Defra, Natural England, the Environment Agency and others. It mainly provides a human UI over the same underlying datasets/services.

Interface shape:

  • Map UI only; you choose layers and view them.
  • For programmatic use you normally go to:
    • Natural England Open Data (ArcGIS services), or
    • Other specific data portals referenced from MAGIC.

Planning Data API (planning.data.gov.uk)

Docs:
https://www.planning.data.gov.uk/docs

What it is:
Open HTTP API exposing planning-related datasets for England in JSON/CSV/GeoJSON. It lets you query “entities” (title boundaries, flood zones, conservation areas, etc.) by dataset and filters.

Example datasets:

  • Title boundaries – indicative property/title polygons.
  • Planning/environment overlays – e.g. flood-risk-zone, conservation-area, listed buildings, and many more.

Interface shape:

  • Base pattern: call /entity with query params for dataset and filters.

  • Example patterns (simplified):

    GET /entity.json?dataset=title-boundary&point=<lat,long>
    GET /entity.geojson?dataset=flood-risk-zone&point=<lat,long>
    
  • Typical response (JSON):

    {
      "entities": [
        {
          "geometry": {
            "type": "Polygon",
            "coordinates": [[...]]
          },
          "entries": {
            "dataset": "title-boundary",
            "title_number": "AB123456",
            "local_authority": "Some Council",
            "...": "..."
          }
        }
      ]
    }

Environment Agency Flood Monitoring API

Docs:
https://www.api.gov.uk/ea/flood-monitoring/
(Background: https://defradigital.blog.gov.uk/2015/03/24/near-real-time-flood-data-api/)

What it is:
REST API for live flood-related data in England:

  • Flood warnings and alerts
  • Flood areas (polygons)
  • Monitoring stations
  • Measures (water levels/flows)

Interface shape:

  • Main endpoints (simplified):

    • /floods – current warnings and alerts
    • /stations – monitoring station metadata
    • /measures – measurement series (levels/flows)
  • Typical filters:

    • lat, long, dist – spatial filters
    • station, riverName, etc. – ID/name filters
  • Example:

    GET /floods
    
  • Example response (simplified):

    {
      "items": [
        {
          "fwdCode": "123456",
          "description": "Flood alert for River X",
          "severity": "Flood Alert",
          "severityLevel": 3,
          "timeRaised": "2026-05-04T10:30:00Z",
          "floodArea": {
            "polygon": "MULTIPOLYGON(((...)))",
            "riverOrSea": "River X"
          }
        }
      ]
    }

Thinking

Starting with the data. Before writing any handler code I curled every documented endpoint and piped the responses through jnv to understand the actual shape of the data. That step caught things like ArcGIS returning attributes.NAME (not name), and the EA stations endpoint nesting river names inside items[].riverName. Building on wrong assumptions about a JSON shape is a much slower failure mode than spending 20 minutes reading real responses first.

Thin proxy, safety catches. The Go server is stdlib only (net/http), no frameworks, one handler file per upstream service. For the simple pass-through endpoints (planning data, EA floods) I don't deserialise at all; the handler just forwards the upstream body and Content-Type verbatim. Deserialising introduces a place to get the schema wrong, and skipping it also removes an allocation and a copy on every request, which compounds at scale. The exception is where safety requires it: the 15s client timeout and 10 MB body limit exist regardless, and the /analyse endpoint has to deserialise so it can fan out, reshape, and merge the results. The frontend/backend separation also means both sides can evolve without waiting on each other - which matters more in a team than in a solo prototype. Using an OpenAI-compatible endpoint for the LLM keeps the Rust service portable too: if Groq becomes a bottleneck or cost issue, the inference can move to a local model without touching anything else in the stack.

ArcGIS URL discipline. Natural England publishes datasets on an ArcGIS portal, but guessing FeatureServer URLs is how you end up silently querying the wrong layer or getting 400s. I manually verified each URL with curl before wiring it, which is why the SAC/SPA field names are SAC_NAME/SPA_NAME (not Name or name) and why Living England is Layer/0 not Layer/1. The confirmed URLs are documented in docs/api/external_apis.md so the next person doesn't have to repeat that.

Parallel fan-out. The /analyse endpoint calls 8 upstreams simultaneously with a sync.WaitGroup. One slow or failing upstream shouldn't block the whole response, so errors go into a mutex-protected map and come back in the response body as errors.{key} rather than returning a 500. The client gets whatever data arrived; partial results are more useful than nothing.

Geometry vs centroid. When the user draws a polygon or sends an address radius, the full GeoJSON ring is forwarded to ArcGIS (which has proper spatial intersection), but the centroid is used for the point-based services (planning.data.gov.uk, EA monitoring) since those only support ?point=. The centroid is computed as the average of the open ring vertices, excluding the closing repeat. That's a good enough approximation for triage purposes; for precise intersection on those services a proper spatial join would be needed.

Living England excluded from geometry returns. Living England is a raster-derived dataset with potentially thousands of small polygons per area query. Returning geometries for it would slow the response significantly and produce a cluttered map. The sidebar already shows the dominant habitat type, which is what matters for a triage tool. SSSI, SAC, SPA, and priority habitats are polygon designations with typically a handful of features per site, so returning their geometries is practical.

Geocoding through the backend. Nominatim requires a User-Agent header; browsers sending requests directly don't include one and get rejected. I added a /api/v1/geocode endpoint in Go that sets the header and routes UK postcodes to postcodes.io (which validates them properly) and everything else to Nominatim. This also avoids CORS issues. The frontend shows a clean "not found" message on 404 without the "check the API server" hint that makes sense for other errors.

Vanilla JS frontend. There are no frameworks or build step when used outside of docker, just Leaflet from CDN. The entire oat library itself is ~8kb so the frontend stays small and incredibly fast to load. No dependencies means we minimise maintenancvve associated with React/NextJS and the many dependencies and their inevitable CVEs (the kind that seem to get published every week). As the frontend drives the API and displays results, there is a casde for Oat over other UI's because it doesn;t need to be complex. Three draw modes (pin, draw, address) are all state in a single mode variable. The draw preview uses circleMarker for a single vertex, a dashed polyline for two, and a dashed polygon for three or more so the user can see the shape closing before they commit. After analysis, SSSI, SAC, SPA, and priority habitat features are rendered as coloured GeoJSON overlays on the map so the user can see exactly where each constraint sits within their selected area.

Tests. The test suite splits into unit tests for pure functions (parsePoint, parseGeometry in package handlers) and E2E tests that hit the real upstreams via httptest.NewServer. The E2E tests validate that each endpoint returns the expected status code and that the /analyse response contains all expected fields. Real upstream calls mean the tests are slower but they catch integration failures that mocks wouldn't.

AI summary: streaming over batch. After analysis the app calls a separate Rust service (summarise/) that proxies a streaming request to Groq. The choice to stream rather than batch is deliberate: the user has just clicked analyse and is sitting at the map waiting. Streaming gets the first tokens into the drawer in under a second and gives the interface a sense of activity. A batch call that returns everything at once after 3-4 seconds would feel like a hang. The result quality is identical; the latency perception is not.

Groq via Rig for low-latency inference. Groq's inference hardware runs LLaMA 3.3 70B at hundreds of tokens per second, which is the only reason streaming feels responsive at this model size. A standard hosted API at similar model scale would produce tokens slowly enough that streaming would offer little advantage over batch. Rig (the Rust LLM framework from our friends at 0xPlaygrounds) provides the typed client and SSE handling, keeping the Rust service small while getting the plumbing right. The system prompt is a const baked into the binary - the analysis JSON the user sends can't modify it.

Map bounded to the UK. The tool is England-specific (EA flood data, Natural England designations, planning.data.gov.uk all cover England only). Allowing the user to pan to arbitrary global coordinates and get empty results is confusing. Leaflet's maxBounds with maxBoundsViscosity: 1.0 and minZoom: 5 keep the viewport inside UK extents without any visual treatment - the constraint feels natural rather than imposed, since BNG is England-specific policy anyway.

Guidance reference page. The in-app docs.html page serves official definitions for flood zones, SSSI, SAC, SPA, priority habitats, and the mandatory BNG regime, drawn from GOV.UK, Natural England, and primary legislation. Having it in the app means the AI system prompt and the user-facing reference use the same source material, which reduces the chance of the summary contradicting what a user reads when they click through. The AI disclaimer at the foot of every summary links directly to the official GOV.UK BNG guidance page.

Container footprint and scalability. The total Docker image footprint for the complete stack is approximately 17-18 MB. The Go binary is statically linked and tiny; the Rust summarise service compiles to a static musl binary running on Alpine with no interpreter or runtime. No Node, no JVM, no Python. At that size the whole stack could be replicated many times over on a single node, and horizontal scaling behind a load balancer would be trivial: there is no shared mutable state between instances, so a second replica is a second docker run away.

Reflection

Assumptions the spec left open. The target user is clear - planning consultants, architects, small developers navigating BNG compliance - but the spec intentionally left operating constraints undefined. That's fine for a prototype, but working through a few numbers early would have shaped some choices. Eight upstreams per analysis is a lot; at even modest traffic (say 50 concurrent users each triggering an analysis) you're looking at 400 simultaneous outbound requests. Natural England's ArcGIS services are public and rate-limited; the EA flood API has usage policies; Groq's free tier has per-minute token limits. None of those constraints are catastrophic for a triage tool used by a handful of people, but mapping them out at the start would have prompted a caching layer for repeat queries to the same geometry - something that is conspicuously absent right now and would reduce upstream load significantly.

Upstream dependency risk. The tool works well when all eight upstreams respond quickly, which they mostly do. The fragility is that this is eight third-party services over which there's no control. Any one of them changing an endpoint, field name, or rate limit policy breaks part of the output silently. The ArcGIS URLs for Natural England data are particularly fragile - they're specific FeatureServer paths that aren't versioned in any formal sense. A monitoring job that pings each upstream daily and alerts on schema drift would be cheap to add and would mean breakage is caught before users see it rather than after.

Authentication and client context. Auth is the most obvious missing piece. Anyone who runs the stack can use it without limit. For a real product this matters for rate limiting per user (each analysis hits 8 upstreams), storing a history of sites, and any monetisation path. The right moment to decide on auth is before the API contract is finalised, not after - adding a JWT middleware to Go is not much work but retrofitting it changes the frontend flow. It was a reasonable call to skip it for a prototype; it should be the first thing added before any wider rollout.

Limitations of the AI integration. The summary is only as good as what the analysis returns. If an upstream times out or returns empty data, the model still writes fluent prose that reads authoritative - there's no signal to the user that a dataset is missing. Surfacing upstream errors more visibly in the results panel (a small warning badge per dataset) would let users read the summary with appropriate context. Structured JSON output for the factual fields - flood zone, SSSI overlap, SAC/SPA status - and streamed prose only for the narrative interpretation would also be more reliable than trusting the model to follow style instructions consistently.

Rate limits and failure modes. A 429 from Groq currently surfaces as a vague failure. A specific message and a fallback to a shorter prompt on a smaller model would handle this gracefully. Similarly, the parallel fan-out tolerates ArcGIS timeouts but the AI summary quietly works around the resulting gap rather than flagging it. Most of these failure paths are cheap to make explicit and pay back in user trust.

What held up. Streaming over batch was the right call for user experience. Parallel fan-out with partial-result tolerance was the right reliability trade-off. A static binary stack with no runtimes was the right infrastructure choice. These are the decisions that determine whether a tool holds up under real usage.

Commercial potential. This sits at a genuinely useful intersection: it's environmentally grounded (BNG is mandatory, the compliance burden is real, the target users are underserved) but it's also commercially viable. A tool like this could be monetised straightforwardly - per-report credits, a subscription for consultants, white-labelled for planning software vendors. The current iteration is a solid prototype for a micro-app: enough to demo, enough to put in front of real users and learn what they actually need before building further. The next meaningful step would be a few weeks of user research with planning consultants to find out which part of the output they trust, which they ignore, and what question it still doesn't answer - then building from that rather than from assumptions.

Getting Started

Prerequisites

  • Go 1.22+
  • Rust (stable, for the AI summary service)
  • Docker + Docker Compose (recommended)
  • Task / go-task (optional task runner)
  • Groq developer key

Environment

The AI summary drawer streams results from Groq. You can pick up a key here.

Before starting any services, copy .env.example to .env and add your key:

cp .env.example .env
# then edit .env and set GROQ_API_KEY=gsk_...

The Go API and frontend run without this key; only the right-hand AI summary drawer requires it.

Run with Docker Compose (recommended)

Starts the Go API on :8080, the Rust summary service on :8090, and the frontend on :3000:

docker compose up -d --build

Then open http://localhost:3000.

Run with Task

If you have Task installed, run the default target to see all available commands:

task

Run manually

# Go API
cd api && go run .

# Rust summary service (requires GROQ_API_KEY in environment)
cd summarise && GROQ_API_KEY=your_key cargo run

# Frontend (any static server)
cd frontend && npx serve .

Guidance reference

The in-app guidance page (frontend/docs.html, also linked from the hero) covers official definitions for flood risk zones, SSSI, SAC, SPA, priority habitats, and mandatory BNG requirements under the Environment Act 2021.

Sample data

Sample API responses from each service are stored in api/samples/ for reference and offline development.

About

Pre-screening tool for biodiversity net gain (BNG) in England.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.