Seattle's municipal and civic digital archives hold hundreds of thousands of images accumulated over more than two decades of digitization projects — and a significant share of them are exact or near-exact duplicates. That redundancy is not a minor housekeeping issue. It is burning through storage budgets, slowing search times in public-facing portals, and complicating records requests filed under Washington State's Public Records Act.
The problem is drawing fresh attention in mid-2026 as city departments undertake a broader digital infrastructure review, part of an ongoing push by the Seattle Information Technology department to modernize systems ahead of a planned data-center migration scheduled for the first quarter of 2027. Duplicate image files sit at the center of that effort because they inflate storage costs disproportionately and complicate automated cataloguing tools the city has been piloting since late 2024.
What the Data Actually Shows
Industry benchmarks for large municipal image repositories suggest that unmanaged collections typically carry a duplicate rate of between 20 and 35 percent of total stored files, according to guidelines published by the Digital Preservation Coalition. For a city the size of Seattle — where the Office of Arts and Culture alone manages a digitized collection spanning thousands of public-art documentation photographs taken at sites from the Pike Place Market to the Beacon Hill light-rail station — even a conservative 20 percent duplication rate represents tens of thousands of redundant files.
Storage costs clarify why this matters financially. Commercial cloud storage at enterprise rates commonly runs between $0.02 and $0.05 per gigabyte per month. A photographic archive of 500,000 high-resolution image files — a plausible scale for a combined city department collection — can occupy upward of 50 terabytes. At those rates, eliminating a 25 percent duplication load could reduce annual storage expenditure by thousands of dollars per year, freeing budget that departments such as Seattle Public Utilities and the Seattle Department of Transportation have historically allocated to basic data maintenance rather than new capability.
The Seattle Public Library's digital collections program, which maintains digitized historical photographs through its Special Collections division on Fourth Avenue, has publicly documented its own deduplication work as part of collection-management reports. Libraries nationally have found that deduplication combined with consistent metadata tagging can cut retrieval times for staff handling public records requests by 30 to 40 percent, according to figures cited in American Library Association professional guidance published in 2023.
Neighbourhoods, Projects and the Human Cost of Bad Data
The practical stakes show up in specific places. The Central Seattle Greenway project, which has generated hundreds of documentation photographs of corridor conditions between the University District and Columbia City, is one example where overlapping photo submissions from multiple contractors created a cataloguing mess that staff had to untangle manually. Similar issues have arisen in documentation archives tied to the ongoing redevelopment around South Lake Union, where construction-phase photography from different vendors landed in shared city repositories without deduplication protocols in place.
Residents who file public records requests — a process managed through Seattle's public records portal — sometimes receive responses slowed by staff time spent sorting through redundant files to identify responsive records. Washington's Public Records Act sets a statutory five-business-day acknowledgment requirement, but fulfilment timelines for image-heavy requests routinely stretch far longer, in part because of catalogue disorder.
Seattle IT's current pilot program is testing automated perceptual hashing tools, which compare images mathematically rather than pixel-by-pixel, allowing near-duplicate detection even when file names or metadata differ. The pilot, running across selected datasets within the Seattle Department of Construction and Inspections, is expected to produce a methodology report by the end of the third quarter of 2026.
Departments that want to get ahead of the problem without waiting for city-wide policy can start with consistent file-naming conventions at the point of ingestion, mandatory metadata fields for project name and date, and a quarterly deduplication review cycle. The cost of not acting compounds over time — every month of inaction adds more files, raises the cost of eventual cleanup, and pushes the 2027 data-center migration further behind schedule.