Yasli - Municipal Address Search

A public utility that turns undocumented municipal data into reliable address search

Client Public Utility (Personal)
Role Creator & Full-stack Engineer
Timeline 1 week
49.8k Address rows
77 Institutions
98.5% In-city district coverage

Challenge

Parents in Varna need one practical answer: which nursery, kindergarten, or preparatory group applies to their home address. The official source does have the data, but it is spread across municipal portals that were built for administration, not discovery. The pages are slow, hard to search, and organized around one institution at a time.

The data itself added most of the engineering work. The main portal is an Angular SPA backed by undocumented JSON endpoints. Its institution pages are windows-1251 HTML. Some fetches silently truncate unless content length is checked. Standalone nurseries moved to a second portal and do not publish street-level catchments at all. The project needed to turn that into a production-ready public utility, not just a scrape.

My Approach

I split the system into three implementation repos and one spec repo, with the contracts made explicit between them:

  • The scraper fetches the municipal sources, validates a versioned snapshot, and writes it to Cloudflare R2.
  • The backend ingests the latest snapshot into Postgres, stamps district data from GRAO reference files, and exposes a typed FastAPI surface.
  • The frontend is a static Astro site with a React search island. It loads the address reference data once, builds exact-address suggestions in the browser, and calls the backend only after the user selects a known address.
Yasli architecture: municipal portals flow into a scraper, R2 snapshots, backend ingest, Postgres and GRAO district data, then FastAPI and the Astro frontend

Contract-first pipeline

The scraper and backend only share one durable contract: a snapshot JSON file in R2. The scraper writes a timestamped object first and latest.json second, so a partial failure leaves the backend reading the previous good snapshot. The backend vendors the same Pydantic contract and rejects unknown schema versions before writing anything to the database.

The product avoids ambiguous free-text matching. The backend exposes compact bulk dumps for streets and addresses, both with content-derived ETags. The frontend joins them into an exact-address suggestion index and requires the user to select a known address before matching. That keeps the user experience fast and makes stale local data detectable: if an address_id no longer exists, the client can reload its reference data instead of showing a wrong answer.

District routing where the source is incomplete

Kindergartens have street-level catchment rows, so they route through the address_institutions junction table. Standalone nurseries do not publish those rows. For them, the system uses district-level routing: nurseries carry a source district code, addresses are stamped against the GRAO address classifier, and /api/match combines street-level and district-level results in one response.

Preschools use the district path too. In this dataset they are primary schools hosting preparatory classes, so the building address can differ from the catchment district. The backend derives their district from catchment majority instead of trusting the physical school address.

Results & Impact

Yasli is production-ready as a public utility for Varna address search. The local production-style database currently contains:

  • 77 institutions: 53 kindergartens, 12 nurseries, 12 preparatory groups
  • 49,800 address rows and 205,674 address-to-institution coverage edges
  • 2,289 street rows indexed for address suggestion search
  • 47,579 GRAO reference rows for district stamping
  • 98.5% district coverage for in-city Varna addresses

The implementation is intentionally operational, not just functional. It has five Alembic migrations, a weekly scraper and backend-ingest model, a quarterly GRAO refresh runbook, root just workflows for local development, and environment-backed CORS for deployment.

Validation

The full repo validation currently passes across all three implementation repos:

  • Backend: 238 pytest tests
  • Scraper: 91 pytest tests
  • Frontend: 29 Vitest tests
  • Lint: Ruff for Python repos, ESLint for frontend

The build process was also spec-driven. The work is captured across 15 archived OpenSpec changes, from scraper bootstrap and snapshot contract through backend ingest, district routing, frontend search, and local orchestration.

What This Demonstrates

This was not an exercise in building a clean demo around clean data. The useful part was taking brittle public sources, undocumented endpoints, legacy encodings, source vocabulary drift, and incomplete nursery catchments, then shipping a small product that hides that complexity from the user.

The same pattern applies to business automation work: find the real source of truth, make the contracts explicit, automate the repeatable parts, and build the thin interface people actually need.

Tech Stack

PythonFastAPIPostgresAstroReactRailwayCloudflare R2