Sep 16, 2025 Thoughts

Learning About Municipal Open Data with a Cambridge Workshop

Information wants to be free and the city of Cambridge wants you to have access.

A couple months ago I wanted to find parcel datasets for land ownership in Arlington, MA. I found the janky Property Assessment Data form, but you can only search for one property at a time. In the process of looking for other sources I found Open Data for Arlington, MA which notes:

This website is not affiliated with town government in Arlington, MA, and is run by volunteers. We are collaborating with the town to obtain relevant data sets and increase the accessibility of information about local government to facilitate greater citizen engagement. Our long-term hope is that the town will capitalize on this work to build an official open data portal for the future, similar to Cambridge or Somerville.

The project is up on Github, and really cool. A bit of a shame though the town doesn't have an official offering.

The mention of other towns got me looking though. I found out that the City of Cambridge was offering an Open Data Workshop, hosted in the wonderful Cambridge Public Library. [^ Public services like libraries are hugely important! Not only did they host the event, but also the library had laptops people could check out to use for the workshop.] The workshop was designed to help equip a non-technical audience with the skills needed to start working with the data assets the city has opened up. Alex Epstein & Reinhard Engels, both former librarians, ran the session. To paraphrase them: "something about librarians makes them like collecting, categorizing and sharing data."

On Open vs Public Data

The presenters did a great job summarizing the difference between public and open data. The idea is that government records should be free and accessible to everyone. Public Data might technically be available, but it could be stored on paper in the basement of some building. Open Data is about making the information as usable as possible.

I wanted to know more about this and found A Guide to the Massachusetts Public Records Law . Some takeaways I learned from the document:

Every government record in Massachusetts is presumed to be public unless it may be withheld under a specifically stated exemption.

... each person has a right of access to public information. This right of access includes the right to inspect, copy or have a copy of records provided upon the payment of a reasonable fee, if any.

The Public Records Law broadly defines “public records” to include “all books, papers, maps, photographs, recorded tapes, financial statements, statistical tabulations, or other documentary materials or data, regardless of physical form or characteristics, made or received by any officer or employee” of any Massachusetts governmental entity.

There are strictly and narrowly construed exemptions and common law privileges to the broad definition of “public records.”

So by the letter of the law you should be able to access these records, but often the process is not easy. Open data aims to reduce the types of friction you can imagine with access:

Using systems so the data can be provided without cost
Automatically making data assets and updates available. That way you do not have to make requests and wait on receiving the information.
Providing the information in commonly used, standard formats

Working with the property dataset

In the workshop we accessed the data assets through the Cambridge Open Data Portal. The presenters mentioned the portal uses the same vendor that cities like NYC and Chicago use. I'll come back to this topic in a bit.

We used the site's interface to work with the data and I think that was the best approach for the wide variety of people that were in the audience. They noted that if you were inclined you could grab the assets as CSVs to work with in a spreadsheet, or you could programmatically access them through an API.

One asset we looked at is the Cambridge Property Dataset. Each property is categorized by land use. They are encoded in esoteric abbreviations, but after the session I was able to find a table of land use categories.

One demo from the class was to:

filter for assessment year of 2025
filter for SNGL-FAM-RES to find single family residential properties
sort by assessed value, descending

These are the top rows:

Address	Land Area	Assessed Value	Sale Price	Sale Date	Owner Name
51-61 Highland St	77,261	$20,142,300.00	$0.00	07/20/2015	NOWISZEWSKI, DANIEL & ANNETTE NOWISZEWSKI TRUSTEE
168R Brattle St	71,531	$18,532,700.00	$0.00	11/07/2006	FIRST NATIONAL BANK OF BOSTON, TR.
157 Brattle St	17,901	$16,582,000.00	$0.00	03/15/2021	QUAGLIAROLI, JAMES J. & KIMBERLY A. HENRY TR.
88 Appleton St	29,649	$15,208,600.00	$1.00	01/31/2018	RIVIERE, MATTHIEU R.
12 Lakeview Ave	30,236	$15,152,500.00	$15,100,000.00	10/31/2018	CASE, JEFFERSON M.
153 Brattle St	22,505	$14,904,200.00	$10,600,000.00	07/12/2011	GINER, A. SILVANA, TRUSTEE OF
1 Reservoir St	30,806	$14,797,800.00	$1.00	03/19/2020	DONOHUE, ROBERT TRUSTEE
12 Hubbard Park Rd	14,217	$14,428,500.00	$15,000,000.00	01/25/2019	PALANDJIAN, PETER, TRUSTEE
163 Brattle St	26,841	$14,348,400.00	$1.00	08/05/2008	MANUS, DEBORAH J. TRUSTEE OF
89 Appleton St	36,412	$13,388,000.00	$1.00	08/27/2021	GROSS-LOH DAVID B
1 Highland St	17,856	$13,075,400.00	$5,750,000.00	09/30/2005	HIGGINS, ROBERT F.
71 Appleton St	27,712	$13,029,300.00	$10.00	05/13/2016	71 APPLETON LLC
70 Sparks St	31,703	$12,812,000.00	$100.00	09/05/2018	MASON, GEOFFREY M., TRUSTEE

Two interesting details the presenters noted about these rows:

You can see the owners name. People are often surprised by that, but the info is on the records and there is no exception in the law to remove them.
Many of the houses have a sale price of $0 or small nominal amount. This is an example of generational wealth: people inheriting the property from their parents.

Another fun detail from exploring the dataset is just much much property MIT and Harvard own in Cambridge. In fact if you look for the highest assessed value across all property types you'll find 1341 Mass Ave, priced at $1,604,103,500.00, which is Harvard Yard. Another quirk of the dataset is each row is per building so there are actually many entries Harvard Yard since they are all assessed as a block.

All in all I really enjoyed these examples since we used a similar dataset to what I was originally looking for in Arlington.

GenAI data exploration demo

For all the problems that come with generative AI, in some use cases the tools can be democratizing. The presenters demonstrated how you could attach a CSV of a dataset to a chat and do some data exploration. Most people are not equipped to make visualization or explore trends with datasets and chat tools can help.

I also really appreciated how the presenters were very clear about the limitations of chat tools and warned to be critical when reviewing their output, noting "this is the fun part of the demo because every month it is different and they always get
something wrong".

They used an example prompt like "Can you tell me something interesting about this dataset? Any trends, surprises? Please use charts and tables to help me understand." and walked through the output of the model. Taking time to detail what the reasoning section was and noting that "every time we've done one of these there is an inconsistency between the story and the graphs".

I think they really struck a nice balance in this section.

The irony of closed monopoly platforms for open data access.

Back to the vendor they use. There was an audience member that wanted to know how long of a contract that city had with the vendor. They wanted to integrate with the API, but were worried that if the city changed vendors they would have to change everything they built. The presenters noted that there is not a very lucrative market for this type of product. There are only three contenders, all of which are not great. The pain of switching is very high and until someone makes something better than their current platform, the city is unlikely they would change vendors.

During the workshop I had also been trying to better understand who was the vendor. In the portal I found a developers link to a page on the Socrata website which said:

Socrata was acquired by Tyler Technologies in 2018 and is now the Data and Insights division of Tyler. The platform is still powered by the same software formerly known as Socrata but you will see references to Data & Insights going forward.

Per a Bloomberg describing Tyler Technologies as "a libertarian’s brain worm" and a near monopoly on certain types of municipal software. Also that they are facing multiple lawsuits across the county.

Also remember that janky form I found for Arlington? Patriot Properties, Inc created the form though they have since consolidated into Catalis which was formed through 30 acquisitions of other companies.

What really stuck me after reading all of this is how familiar the scenario is. There is no sane top down policy to help municipalities handle their data set access. So small private companies stepped into try and create solutions. The market is not profitable so they get rolled up into large companies that leverage their captive audiences to extract as much rent as possible.

There are nice projects like non-profit Code for America aiming to provide civic hacking. [^ I see they have an event at the end of the month that I'll try and attend: https://formfest.org/ ] I really appreciate them, however I feel like things are broken on a fundamental level: the systems towns use will always be patchwork at best if we never collectively decide to do something better.

I haven't had a chance to do this yet, but I've been working to recontextualize what I see in life based on how other countries approach situations. I will often realize that something I assumed was normal is actually distinctly American and we can learn a lot from other.

Who decides what data is in the portal?

A note on this section: this is based on the conversations between the audience and the presenters. I've cleaned up the dialog into paragraphs, but I don't want to misrepresent this as purely my own thoughts. [^ I wish I was as eloquent as the speakers.]

When an audience member asked how Cambridge stacks up to other towns the presenters mentioned that while they are not the bleeding edge, they are definitely better than most and they work hard to make new datasets available.

However they are not the only team that make the decisions. Other people help clarify what data doesn't go up. Legal, privacy issues, and PII can all prevent records from public access. At the same time the law goes both ways, as we saw with the fact that assessment data with PII has to be public by law. Ultimately individual departments are also part of making the decision.

There is also a hesitance by some departments that don't want to publish data that makes them look bad. That is understandable from their perspective, but that's not for the greater good. Transparency helps those departments get better. As an example, there was an internal 311 database for tracking if they meet their SLAs. The department set a goal for 80% of the time. While they started off very far off from their goal, after years of getting closer they felt good enough to make the dataset public.

Another issue is that if you go far enough back, you get to paper records. While the town has some digitization projects like the GIS department scanning in old maps, those take time and money. They will not reasonably be able to process all the records in storage.

The main takeaway for me is that we as citizens can really help out here. The open data team truly strives to make new datasets available, but they can only make requests so many times before other departments stop listening. One thing that we can do ourselves, is to continually to ask departments to release their datasets:

We would love for you to all clamor to the different departments. That makes a difference and they have already heard from us enough".

Other thoughts

I think the presenters did a really good job explaining this to true non-technical citizens. That was super cool. Also that fact that so many people turned out to fill the workshop was inspiring to see.

If you want to learn about the content of the workshop, the slides are available on the website.

If you want to do a workshop or learn more, the team has a newsletter.

On Open vs Public Data

Working with the property dataset

GenAI data exploration demo

The irony of closed monopoly platforms for open data access.

Who decides what data is in the portal?

Other thoughts

Similar topics

(My) Second Year of the Linux Desktop (For Gaming)

Installing Jellyfin Tizen On Your Inlaw's Samsung TV In Just N Simple Steps

Finally Actually (Hopefully) Learning Rust

Testing Out BLE Beacons With beaconDB

I never thought I could be a Maker