CB Pigeons

NYC Community Boards Dashboard

Pigeon Building Dashboard
Presentation
Dashboard Creation Process

This dashboard was built to surface trends in NYC Community Board transparency, demographics, and policy priorities — using public documents that are rarely analyzed together. There are three separate data pipelines, each targeting a different aspect of community board activity.

Digital Records Scorecard

Each of the 59 community boards across all 5 boroughs was manually audited for the presence of key digital resources on their official websites. The goal was to assess baseline transparency and online accessibility.

What was checked (14 categories):

  • Meeting records: Calendar, Minutes, Agendas, Resolutions
  • Contact information: Staff directory or contact form
  • Social media: Instagram, X (Twitter), Facebook, YouTube
  • Community communications: Newsletters, News/Events
  • Governance docs: By-Laws
  • Civic processes: Permits & Licenses (cannabis, liquor, landmarks, block parties, etc.)

Each category was rated: present on site / exists but not linked / not found. Social media accounts found via search but not linked from the CB's own site are marked as partial credit.

Needs Assessment Analysis

Community boards submit annual needs statements to the Department of City Planning as part of the capital and expense budget process. This pipeline collects, extracts, and compares those statements across three fiscal years (FY2024, FY2025, FY2026) for all 18 Brooklyn community boards.

  1. PDF Collection: Needs statement PDFs were scraped from the NYC DCP website
  2. Text Extraction: pdfplumber extracted full text from each PDF into a SQLite database
  3. Year-Over-Year Comparison: GPT-4o compared each board's three documents, identifying what appeared, disappeared, or shifted in emphasis
  4. Theme Extraction: A second LLM pass identified 4–6 recurring policy themes per board and scored their depth (0–5) in each fiscal year

Theme depth scoring (0–5):

  • 0 = Not mentioned
  • 1 = Brief mention only
  • 2 = Mentioned with examples
  • 3 = Policy asks or commitments requested from city agencies
  • 4 = Data or evidence cited
  • 5 = Structured recommendations or frameworks proposed

CB6 and CB12 are missing FY2025 needs statements from the DCP website; their 3-year analysis covers FY2024 and FY2026 only.

Demographics Report

The Brooklyn Borough President publishes annual demographic reports on all community board appointees (2022–2025). Each report is a PDF combining narrative text and visual charts. Because the charts cannot be read by standard text extraction tools, a hybrid pipeline was used.

  1. Text extraction: pdfplumber extracted narrative pages; 2024/2025 PDFs had doubled-character rendering artifacts that were normalized programmatically
  2. Vision extraction: Pages with charts were converted to PNG images and passed to GPT-4o with a strict no-hallucination instruction — if a number was not clearly visible, it was marked as unavailable rather than estimated
  3. Structured compilation: A final GPT-4o pass synthesized only the verified data points into the structured JSON powering the demographics page

Data caveat: Race/ethnicity breakdowns for 2022, 2024, and 2025 exist only as bar charts in the source PDFs and could not be reliably extracted. The 2023 data (37% Black, 15% Caribbean, 12% Hispanic, 10% Asian) comes from text in that year's report. Gender and first-time appointee figures for all four years were successfully extracted from narrative text.

Technology Stack

Data Pipeline

  • Python 3 — scraping, extraction, orchestration
  • pdfplumber — PDF text extraction
  • pdf2image — PDF page to PNG conversion
  • OpenAI GPT-4o — LLM analysis and vision
  • SQLite — needs statement storage
  • requests — web scraping

Frontend

  • Vanilla HTML, CSS, JavaScript — no framework
  • Chart.js — bar, line, and radar charts
  • D3.js + d3-sankey — Sankey flow diagrams
  • Static JSON files — all analysis pre-computed
  • GitHub Pages — hosting

Data Sources