Yasli - Municipal Address Search
A public utility that turns undocumented municipal data into reliable address search
Challenge
Parents in Varna need one practical answer: which nursery, kindergarten, or preparatory group applies to their home address. The official source does have the data, but it is spread across municipal portals that were built for administration, not discovery. The pages are slow, hard to search, and organized around one institution at a time.
The data itself added most of the engineering work. The main portal is an Angular SPA backed by undocumented JSON endpoints. Its institution pages are windows-1251 HTML. Some fetches silently truncate unless content length is checked. Standalone nurseries moved to a second portal and do not publish street-level catchments at all. The project needed to turn that into a production-ready public utility, not just a scrape.
My Approach
I split the system into three implementation repos and one spec repo, with the contracts made explicit between them:
- The scraper fetches the municipal sources, validates a versioned snapshot, and writes it to Cloudflare R2.
- The backend ingests the latest snapshot into Postgres, stamps district data from GRAO reference files, and exposes a typed FastAPI surface.
- The frontend is a static Astro site with a React search island. It loads the address reference data once, builds exact-address suggestions in the browser, and calls the backend only after the user selects a known address.
Contract-first pipeline
The scraper and backend only share one durable contract: a snapshot JSON file in
R2. The scraper writes a timestamped object first and latest.json second, so a
partial failure leaves the backend reading the previous good snapshot. The
backend vendors the same Pydantic contract and rejects unknown schema versions
before writing anything to the database.
Exact-address search
The product avoids ambiguous free-text matching. The backend exposes compact
bulk dumps for streets and addresses, both with content-derived ETags. The
frontend joins them into an exact-address suggestion index and requires the user
to select a known address before matching. That keeps the user experience fast
and makes stale local data detectable: if an address_id no longer exists, the
client can reload its reference data instead of showing a wrong answer.
District routing where the source is incomplete
Kindergartens have street-level catchment rows, so they route through the
address_institutions junction table. Standalone nurseries do not publish those
rows. For them, the system uses district-level routing: nurseries carry a source
district code, addresses are stamped against the GRAO address classifier, and
/api/match combines street-level and district-level results in one response.
Preschools use the district path too. In this dataset they are primary schools hosting preparatory classes, so the building address can differ from the catchment district. The backend derives their district from catchment majority instead of trusting the physical school address.
Results & Impact
Yasli is production-ready as a public utility for Varna address search. The local production-style database currently contains:
- 77 institutions: 53 kindergartens, 12 nurseries, 12 preparatory groups
- 49,800 address rows and 205,674 address-to-institution coverage edges
- 2,289 street rows indexed for address suggestion search
- 47,579 GRAO reference rows for district stamping
- 98.5% district coverage for in-city Varna addresses
The implementation is intentionally operational, not just functional. It has
five Alembic migrations, a weekly scraper and backend-ingest model, a quarterly
GRAO refresh runbook, root just workflows for local development, and
environment-backed CORS for deployment.
Validation
The full repo validation currently passes across all three implementation repos:
- Backend: 238 pytest tests
- Scraper: 91 pytest tests
- Frontend: 29 Vitest tests
- Lint: Ruff for Python repos, ESLint for frontend
The build process was also spec-driven. The work is captured across 15 archived OpenSpec changes, from scraper bootstrap and snapshot contract through backend ingest, district routing, frontend search, and local orchestration.
What This Demonstrates
This was not an exercise in building a clean demo around clean data. The useful part was taking brittle public sources, undocumented endpoints, legacy encodings, source vocabulary drift, and incomplete nursery catchments, then shipping a small product that hides that complexity from the user.
The same pattern applies to business automation work: find the real source of truth, make the contracts explicit, automate the repeatable parts, and build the thin interface people actually need.