# VOTD — Vote Sourcing Workflow

How votes get from the real world into the platform. Covers the automated sourcing pipeline, editorial review tiers, source trust management, and the admin tooling required to support it all.

---

## Overview

Vote content enters the platform through two parallel pipelines:

1. **Sourced pipeline** — Claude scans government agendas, legislative trackers, and news outlets to find active civic items, then drafts vote records with supporting links. These land in the admin queue as drafts for editorial review.
2. **Proposed pipeline** — Verified users submit vote ideas through the mobile app. These land in the Proposals queue (already built) and follow the existing `pending → accepted/rejected` flow.

Both pipelines converge at the same place: a `draft` vote record in the `votes` table, assigned to a jurisdiction, with at least one entry in `vote_sources`. From there the editorial workflow is identical.

```
┌─────────────────────────┐     ┌─────────────────────────┐
│   SOURCED PIPELINE      │     │   PROPOSED PIPELINE     │
│                         │     │                         │
│  Claude Pass 1: Source  │     │  User submits proposal  │
│         ↓               │     │         ↓               │
│  Claude Pass 2: Verify  │     │  Editor reviews         │
│         ↓               │     │  accepted → vote draft  │
│  Draft vote + sources   │     │         ↓               │
└────────────┬────────────┘     └────────────┬────────────┘
             │                               │
             ▼                               ▼
     ┌───────────────────────────────────────────────┐
     │           EDITORIAL REVIEW QUEUE              │
     │                                               │
     │  approve  /  reject  /  hold                  │
     │  ↓           ↓           ↓                    │
     │  published   archived    stays in queue        │
     │  (open for   (with       (pending further      │
     │   voting)    reason)      action)              │
     └───────────────────────────────────────────────┘
```

---

## Sourcing pipeline — two-pass model

### Pass 1: Source

Claude scans configured source feeds for a jurisdiction and produces candidate vote records. Each candidate includes:

- `poll_question` — the yes/no question (max 85 chars)
- `proposal_text` — background context
- `topic` — classified using the topic taxonomy (see DESIGN_DECISIONS.md)
- `jurisdiction_id` — mapped from the source
- `sources[]` — at least one URL with OG metadata and trust level
- `scraper_source_url` — the original agenda or bill page
- `scraper_item_id` — external reference ID (Legistar GUID, bill number, etc.)

Output is written to the `votes` table with `status: draft` and `created_by: NULL` (indicates automated sourcing, not manual editor creation).

**Deduplication:** Before inserting, Claude checks `scraper_item_id` and `poll_question` similarity against existing records to avoid duplicates. Items that match an existing vote (any status) are skipped and logged.

### Pass 2: Verify

A second pass runs over all `draft` records from Pass 1:

- Visits each URL in `vote_sources` and confirms it resolves (HTTP 200, not a paywall or dead link)
- Checks that the page content is topically relevant to the vote item
- Validates OG metadata (title, description) against actual page content
- Flags any source that fails verification with a `verification_status` field
- Attempts to find a better or additional source for items with only one link

Records that pass verification are marked `verification_status: verified`. Records that fail are marked `verification_status: needs_review` and surfaced to editors with the specific failure reason.

### Runtime: pilot vs. scale

| Aspect | Pilot | Scale |
|---|---|---|
| **Trigger** | Manual — editor runs a Claude Code session or scheduled task | Automated — Supabase edge function on a cron (daily or twice daily) |
| **Scope** | Sacramento County + California state feeds | All active jurisdictions with configured source feeds |
| **Pass 1** | Single Claude session processes all configured feeds | Parallelized by jurisdiction tier — federal, state, county batches |
| **Pass 2** | Same session, sequential | Separate edge function, can run independently |
| **Output** | Drafts written directly to `votd-prototype` | Drafts written to `votd-live` with audit trail |
| **Cost control** | Manual invocation = natural throttle | Token budget per run, configurable per jurisdiction |

For the pilot, a Claude Code scheduled task is the right fit: simple to set up, easy to iterate on, and the single-editor team can trigger it on demand. At scale, this moves to a Supabase edge function calling the Claude API with structured tool use, running on a cron schedule.

---

## Jurisdiction review tiers

Review tiers exist so that editors can be scoped to the jurisdictions they know and can manage. But tier assignment is driven by volume and expertise, not rigid 1:1 mapping. One person might comfortably handle all federal items plus a handful of states, while another might focus on a few counties and all the districts within them. The system needs to support both patterns.

### Tier model

| Tier | Examples | Typical reviewer profile |
|---|---|---|
| **Federal** | Congress, federal agencies | National editor — low volume, high expertise |
| **State** | State legislature, state agencies (CARB, DHCS, CDE) | State editor — moderate volume, policy expertise |
| **County** | Board of Supervisors, county planning | Regional editor — moderate volume, local knowledge |
| **Municipal** | City council, city commissions | Regional editor — higher volume, community context |
| **District** | School boards, utility districts, transit authorities | Regional editor — niche expertise, lower volume |

### How assignments actually work

Editors are not locked to a single tier. The `allowed_jurisdiction_ids` array on `admin_users` can contain jurisdictions at any mix of tiers:

- **Pilot (now):** One or two people manage everything — federal through district for Sacramento County and California. All editors are `superuser` or `editor` with unrestricted access.
- **Early growth:** One editor handles federal + 3–4 states. Another handles Sacramento County + Elk Grove + Rancho Cordova + all Sacramento districts. Assignments are based on what each person can handle, not formal tier boundaries.
- **At scale:** A federal/state team of 2–3 editors covers national and multi-state items. Regional editors each own a metro area (county + cities + districts within it). Volume determines headcount per region, not tier.

The infrastructure supports this because `allowed_jurisdiction_ids` is a flat array — it doesn't care about tier hierarchy. Assigning an editor to a state jurisdiction automatically gives them visibility into all counties, cities, and districts below it (via the recursive `parent_id` query). Assigning them to a county gives them that county plus everything below. Or they can be assigned to a specific list of individual jurisdictions across tiers if the workload calls for it.

### How it maps to `admin_users`

The existing schema already supports this:

- **Phase 1 (pilot):** All editors are `role: editor` or `superuser` with `allowed_jurisdiction_ids: NULL` (unrestricted). One person sees everything.
- **Phase 2 (multi-state):** Editors split into `editor_live` / `editor_proto`. Jurisdiction filtering still off.
- **Phase 3 (regional scale):** `editor_regional` accounts get `allowed_jurisdiction_ids` populated. The admin queue filters items by the editor's allowed jurisdictions. A county editor for Sacramento sees only Sacramento County votes. A state editor for California sees state-level items plus all counties within California (via the `parent_id` hierarchy in `jurisdictions`).

### Queue filtering logic

When an `editor_regional` loads the Votes queue:

```sql
-- Editor's allowed jurisdictions + all children in the hierarchy
WITH RECURSIVE editor_jurisdictions AS (
  SELECT id FROM jurisdictions
  WHERE id = ANY(current_editor.allowed_jurisdiction_ids)
  UNION ALL
  SELECT j.id FROM jurisdictions j
  JOIN editor_jurisdictions ej ON j.parent_id = ej.id
)
SELECT v.* FROM votes v
WHERE v.jurisdiction_id IN (SELECT id FROM editor_jurisdictions)
  AND v.status = 'draft'
ORDER BY v.created_at DESC;
```

This means assigning a state editor to "California" automatically gives them visibility into all CA counties, cities, and districts below it in the hierarchy. A county editor assigned to "Sacramento County" sees the county board plus all cities and districts within the county.

---

## Editorial review workflow

### Vote states in the sourcing context

Extending the existing vote status flow with sourcing-specific states:

```
                    ┌──── sourced (Pass 1) ────┐
                    ▼                          │
  draft ──→ draft:verified (Pass 2 OK) ──→ published ──→ closed
    │              │
    │              ├──→ draft:needs_review (Pass 2 flagged)
    │              │         │
    │              │         ├──→ editor fixes → draft:verified
    │              │         └──→ rejected → archived
    │              │
    ├──→ hold (editor defers) ──→ back to draft
    └──→ rejected → archived (with reason)
```

Implementation note: `draft:verified` and `draft:needs_review` can be tracked with a new `sourcing_status` column on the `votes` table rather than overloading the existing `vote_status` enum. This keeps the user-facing status flow clean.

### Editor actions on draft votes

| Action | Effect | When to use |
|---|---|---|
| **Approve** | Status → `published`, `published_at` set, appears in feed | Item is accurate, well-sourced, and relevant |
| **Approve with edits** | Editor modifies question/text/topic, then publishes | Item is valid but needs polish |
| **Hold** | Stays in queue, tagged with hold reason | Waiting for more information, timing not right |
| **Reject** | Status → `archived`, reason recorded in `audit_log` | Off-topic, duplicate, or not actionable as a yes/no vote |
| **Request better source** | Triggers Claude to search for alternative/additional sources | Link is weak, paywalled, or not authoritative enough |

### Queue recalibration

The draft queue is ordered dynamically based on editorial actions:

- **Approved items** leave the queue
- **Rejected items** leave the queue (archived)
- **Held items** sink to a "On Hold" section at the bottom
- **Needs review items** (Pass 2 flagged) float to the top with a warning badge
- **New drafts** enter below needs-review but above on-hold
- **Time pressure** — items with approaching relevance deadlines (e.g., a city council vote happening next week) get a "closing soon" badge and sort higher

Sort order: `needs_review` > `new drafts (by urgency)` > `verified drafts` > `on hold`

---

## Source trust management

### The source list

A new `source_domains` table tracks editorial trust decisions about link sources. Claude consults this list during both sourcing passes:

```sql
create table source_domains (
  id          uuid primary key default uuid_generate_v4(),
  domain      text not null unique,       -- "leginfo.legislature.ca.gov"
  trust_level text not null default 'neutral',  -- whitelist / greylist / blacklist / neutral
  notes       text,                       -- editor's reason for classification
  added_by    uuid references admin_users(id),
  created_at  timestamptz not null default now(),
  updated_at  timestamptz not null default now()
);
```

### Trust levels

| Level | Meaning | Claude behavior | Editor action |
|---|---|---|---|
| **whitelist** | Trusted source — government sites, official legislative trackers, established news outlets | Prefer these sources. Auto-verify links from whitelisted domains. | Green badge in admin. One-click to whitelist. |
| **neutral** | Unknown — not yet classified | Use normally. Full verification in Pass 2. | No badge. Default state for new domains. |
| **greylist** | Suspect — paywalled, unreliable, or frequently changes URLs | Avoid as primary source. Flag for editor review if used. | Yellow badge. One-click to greylist. |
| **blacklist** | Banned — spam, misinformation, or consistently dead links | Never use. Skip during sourcing. | Red badge. One-click to blacklist. |

### Editor source tools

These are the actions available to an editor when reviewing a vote's sources in the admin:

1. **Ask Claude for better source** — Editor clicks a button next to a source link. Claude searches for an alternative URL covering the same item from a higher-trust domain. Result appears inline for editor approval.

2. **Whitelist domain** — One click on a source's domain badge. Adds the domain to `source_domains` with `trust_level: whitelist`. All future sourcing runs prefer this domain.

3. **Greylist domain** — One click. Marks domain as suspect. Claude will avoid it as a primary source and flag it if no alternative exists.

4. **Blacklist domain** — One click. Domain is banned from all future sourcing. Existing sources from this domain across all votes are flagged for replacement.

5. **Add source manually** — Editor pastes a URL. System fetches OG metadata automatically. Editor can set it as primary source. Added with `added_by` set to the editor's ID.

6. **Check link** — Editor clicks to verify a source URL is still live and resolving. Shows HTTP status, redirect chain, and whether page content matches the stored metadata. This is a manual trigger for the same check Pass 2 does automatically.

### Learning loop

The whitelist/greylist/blacklist is a persistent learning mechanism:

- Every sourcing run reads the current `source_domains` table before starting
- Whitelisted domains are weighted higher in search results
- Blacklisted domains are excluded from search queries entirely
- When a domain is blacklisted, all existing `vote_sources` rows from that domain are flagged `needs_review` across the database so editors can replace them
- The source list is shared across all jurisdictions — a domain blacklisted for California applies nationally

### Discovery balance — whitelist is a preference, not a constraint

The whitelist must not become a closed loop. If Claude only looks at whitelisted sources, the platform misses new government portals, newly launched legislative trackers, and emerging civic journalism. The rule:

- **Whitelisted domains get preference**, not exclusivity. During Pass 1, Claude should allocate roughly 70% of its search effort to whitelisted/known sources and 30% to open discovery (broader web searches, following links from known sources, exploring adjacent government portals).
- **New domains discovered during open discovery enter as `neutral`** — they surface in the editorial queue with a grey badge. If an editor approves items from a neutral domain two or more times, the admin should prompt them to whitelist it.
- **Sourcing run stats track discovery health.** Each run logs how many items came from whitelisted vs. neutral vs. new domains. If the ratio of whitelisted sources exceeds 90% for three consecutive runs, the system flags it as a discovery drought and the next run shifts to 50/50 known vs. open search.
- **Periodic discovery sweeps.** Once a week (or on-demand), a dedicated sourcing pass runs in pure discovery mode — searching broadly for new civic data sources, agenda portals, and community news outlets across active jurisdictions. Any new domains found get added as neutral for editorial triage.

---

## Engagement scoring — the intake quality gate

Before a sourced item enters the draft queue, Claude scores it for predicted engagement using a rubric. This determines whether the item is worth editorial attention and helps prioritize the queue. The rubric was originally developed for the feed engine and now serves as the sourcing pipeline's quality filter.

### Scoring rubric (1–5 scale)

Start at 1 (base), then add or subtract:

| Modifier | Points | Rationale |
|---|---|---|
| Directly affects household costs (rates, fees, taxes) | +2 | People engage when it hits their wallet |
| Involves high-salience topic (homelessness, police, housing, schools) | +2 | Emotional and community-wide relevance |
| Clear yes/no position possible | +1 | The platform is built on binary votes — items that don't reduce to a clear question don't work |
| Covered by local media | +1 | Media coverage = public awareness = higher engagement |
| Prior similar item had public comment or split vote | +1 | History of controversy predicts engagement |
| Primarily procedural despite being technically substantive | −1 | Zoning variance renewals and budget line transfers don't excite anyone |
| Requires specialist knowledge to form an opinion | −1 | If a voter needs a law degree to understand it, engagement drops |

**Floor: 1 · Cap: 5**

### How it's used in sourcing

- **Pass 1:** Claude assigns an `engagement_score` to every candidate item during sourcing. This score is stored on the draft vote record.
- **Threshold:** Items scoring 1 are auto-skipped (too procedural or obscure). Items scoring 2 are flagged as borderline — they enter the queue but sort to the bottom. Items scoring 3+ are standard candidates.
- **Queue sort:** Within each tier, the editorial queue sorts by engagement score (descending), then by urgency (items with approaching deadlines surface first).
- **Editor override:** Editors can manually adjust the score if they disagree with Claude's assessment. This is important — local knowledge often reveals that a seemingly boring item is actually a hot-button issue.

### Calibration loop

The rubric must evolve. What predicts engagement today may not predict it in six months. The calibration mechanism:

1. **Track actuals.** Once a vote is published and has been open for 7+ days, compare its actual engagement metrics (total votes, vote velocity in first 48 hours, share rate) against its predicted engagement score.

2. **Score accuracy report.** Monthly, generate a report showing predicted vs. actual engagement by score band:
   - Score 5 items: did they actually get top-tier engagement?
   - Score 2 items: did any of them surprise and outperform?
   - Score 1 items (auto-skipped): sample a few and check if they would have performed — are we filtering out winners?

3. **Rubric refinement.** Based on the accuracy report, adjust modifier weights or add new modifiers. Examples of refinements that might emerge:
   - "Affects renters specifically" might deserve its own +1 if renter engagement consistently outperforms
   - "Involves a named local figure" might be a new +1 if personality-driven items spike
   - The −1 for "specialist knowledge" might need to be −2 if those items consistently underperform

4. **Feedback from editors.** Editors who override scores are providing direct signal. If editors consistently bump a certain type of item up (or down), that pattern should be reflected in the rubric. The admin should surface a quarterly summary: "Editors overrode 12 scores this quarter — 8 were school-related items bumped from 2 to 4."

5. **Source domain correlation.** Track whether items from certain source domains consistently score higher or lower. If items from `leginfo.legislature.ca.gov` consistently score 3+ while items from a particular city planning portal consistently score 1–2, that informs both the rubric and the sourcing priority.

### Schema support

```sql
-- Already on votes table from earlier in this doc:
-- sourcing_status, sourcing_run_id, hold_reason

-- Add engagement scoring:
alter table votes add column engagement_score integer check (engagement_score between 1 and 5);
alter table votes add column engagement_score_override integer check (engagement_score_override between 1 and 5);
alter table votes add column engagement_score_notes text;
-- engagement_score: Claude's prediction at sourcing time
-- engagement_score_override: editor's manual adjustment (null if not overridden)
-- engagement_score_notes: editor's reason for override
```

A separate `engagement_calibration` table tracks predicted vs. actual:

```sql
create table engagement_calibration (
  id                uuid primary key default uuid_generate_v4(),
  vote_id           uuid not null references votes(id),
  predicted_score   integer not null,
  override_score    integer,
  actual_votes_7d   integer,          -- total votes after 7 days
  actual_velocity   numeric,          -- votes per hour in first 48h
  actual_share_rate numeric,          -- shares / total votes
  measured_at       timestamptz not null default now()
);
```

---

## Sourcing targets and vote mix

### Target content mix: 25 / 25 / 50

The platform's content mix target is:

| Tier group | Target % | What it includes |
|---|---|---|
| **Federal** | 25% | Congress, federal agencies, executive orders |
| **State** | 25% | State legislature, state agencies, ballot measures |
| **Local** | 50% | County, municipal, special district, school |

Within the local 50%, the target sub-split is roughly equal thirds:

| Local sub-tier | Target % of total | ~% of local pool | What it includes |
|---|---|---|---|
| **County** | ~17% | ~1/3 | Board of supervisors, county planning, county agencies |
| **Municipal** | ~17% | ~1/3 | City council, city commissions, city agencies |
| **District** | ~17% | ~1/3 | School boards, utility districts (SMUD), transit (SacRT), special districts |

School district items roll up into the district third rather than getting their own slice — schools are a type of special district and the volume is lower. If school items consistently outperform other district items in engagement, this split can be revisited.

This is the ratio the sourcing pipeline aims to maintain across the total pool of open items at any given time. It drives how much sourcing effort goes to each tier and sets the threshold for when a tier needs more content.

Why 25/25/50 and not the current 7/15/78 from mock data:

- **Manageability.** A balanced split means editorial workload is predictable. One federal/state editor and one local editor can each own roughly half the queue.
- **Engagement.** Federal and state items generate high engagement — everyone has an opinion on Congress and state policy. Underweighting them leaves value on the table.
- **Differentiation.** The platform's unique value is local, so local gets the largest share. But 50% is plenty to make that case without starving the other tiers.
- **Feed health.** The feed engine has 3 federal, 4 state, and 7 local slots. A 25/25/50 content pool ensures each tier group has enough depth that the scoring algorithm has real choices to make — not just whatever single item happens to exist.

### Volume per jurisdiction

Every jurisdiction produces votes at a fixed monthly rate. Federal and state run at 9/month (higher because each vote reaches more people and justifies more sourcing effort). All local tiers run at 5/month. Special districts are modeled at 8 per county per month (aggregated across the multiple districts overlapping each county, and compensating for municipalities that may not always fill their quota).

| Tier | Jurisdictions (CA) | Votes/Jurisdiction/Mo | Votes/Month | Votes/Year | % of Total |
|---|---|---|---|---|---|
| **Federal** | 1 | 9 | 9 | 108 | 0.3% |
| **State** | 1 | 9 | 9 | 108 | 0.3% |
| **County** | 58 | 5 | 290 | 3,480 | 9.1% |
| **Municipal** | 482 | 5 | 2,410 | 28,920 | 75.7% |
| **Special District** | 58 (by county) | 8 | 464 | 5,568 | 14.6% |
| **Total** | **600** | | **3,182** | **38,184** | **100%** |

### Staffing model: California all-up

Editor processing rate: 5 pre-populated votes reviewed per hour. 4-day work week, 48 weeks/year = 7,680 votes/person/year.

The math says 5.0 FTE to handle 38,184 votes/year. We staff to **8 editors**, which builds in headroom for PTO coverage, proposal reviews (user-submitted votes that need more careful editorial attention), rework, and calibration. At 8 editors the team runs at 62% capacity utilization, with ~2,900 spare votes per person per year.

| Role | Headcount | Scope | Votes/Month | Working Days/Month |
|---|---|---|---|---|
| Federal + State Lead | 1 | All federal + California state | 18 | 0.5 |
| County Editor | 1 | All 58 counties | 290 | 7.2 |
| Municipal Editors | 4 | 482 cities (split by region) | 602 each | 15.1 each |
| District Editor | 1 | Special districts (58 counties) | 464 | 11.6 |
| Flex / QA | 1 | Overflow, PTO cover, quality checks | — | — |
| **Total** | **8** | | | |

Municipal is 76% of the volume and gets 4 of the 8 editors. Federal + state combined is less than one working day per month — the lead has massive capacity for additional states or QA duties. The flex editor absorbs proposal reviews, handles PTO coverage, and runs the engagement calibration loop.

### Rollout phases

| Phase | Scope | Counties | Municipalities | Votes/Month | Editors |
|---|---|---|---|---|---|
| **Pilot** | Sacramento County + CA + Federal | 1 | 7 | 66 | 1–2 |
| **Phase 2** | Sacramento Metro (6 counties) | 6 | 23 | 211 | 2 |
| **Phase 3** | Northern CA (Bay + Sac + NorCal) | 22 | 178 | 1,194 | 4 |
| **Phase 4** | All California | 58 | 482 | 3,182 | 8 |

The detailed model with all 58 counties, regional breakdowns, and tunable assumptions is in `docs/california-sourcing-model.xlsx`.

### Note on proposed votes

User-submitted proposals are not included in these volume numbers. Proposals come pre-written but require more careful editorial review (quality varies, may need source verification, question rewording, or jurisdiction reassignment). The 38% capacity headroom in the 8-person team absorbs this additional workload. As proposal volume grows, it supplements the sourcing pipeline — every approved proposal is one fewer item Claude needs to source.

### Sourcing cadence by tier

Different tiers have different natural rhythms:

- **Federal:** Congress moves slowly. Source weekly. Most items have long windows (30+ days). 2–3 items per run is healthy.
- **State:** Legislative sessions drive volume. During session (Jan–Sep in CA), source 2–3× per week. Off-session, weekly is fine.
- **Local:** City councils and county boards meet regularly (weekly or biweekly). Source after each meeting cycle. Districts and school boards are less frequent — source weekly.

### Current mock data vs. target

The existing mock data set was drawn from `Sacramento_Voting_Items_v2.xlsx`:

| Tier group | Current count | Current % | Target % | Gap |
|---|---|---|---|---|
| **Federal** | 4 | 7% | 25% | Needs ~10 more items |
| **State** | 8 | 15% | 25% | Needs ~6 more items |
| **Local — County** | 12 | 22% | ~17% | Slightly over; on track |
| **Local — Municipal** | 12 | 22% | ~17% | Slightly over; on track |
| **Local — District** | 19 | 35% | ~17% | Over-indexed (includes 5 school items) |
| **Total** | 55 | 100% | 100% | |

The mock data was built for a Sacramento-only pilot, which explains the local skew. As we move toward the target mix, the sourcing pipeline's first priority is building out the federal and state pools. Local is already well-stocked.

### Open questions

1. **Geographic personalization** — Does every user see the same federal and state items, or do state items vary by the user's state? Currently Sacramento-only so this doesn't matter, but at multi-state scale it determines whether the state pool is shared or per-state.
2. **Seasonal rebalancing** — During a quiet congressional recess, should federal's 25% temporarily shift to state or local? Or does the pipeline just source lighter and let the pool draw down toward minimum?
3. **School breakout** — School items are currently rolled into the district third. If school engagement consistently outperforms or underperforms other district items, should schools get their own target slice?

---

## Schema additions

The following additions to the existing schema support this workflow:

### New columns on `votes`

```sql
alter table votes add column sourcing_status text default null;
-- Values: 'sourced', 'verified', 'needs_review', null (manual/proposal-originated)

alter table votes add column sourcing_run_id uuid;
-- Links to the sourcing_runs table for traceability

alter table votes add column hold_reason text;
-- Set by editor when placing a vote on hold
```

### New columns on `vote_sources`

```sql
alter table vote_sources add column trust_level text;
-- 'gov', 'edu', 'press', 'neutral' — from OG metadata or source_domains

alter table vote_sources add column verification_status text default 'pending';
-- 'pending', 'verified', 'failed', 'needs_review'

alter table vote_sources add column verification_note text;
-- Reason for failure or review flag
```

### New table: `source_domains`

```sql
create table source_domains (
  id          uuid primary key default uuid_generate_v4(),
  domain      text not null unique,
  trust_level text not null default 'neutral',
  notes       text,
  added_by    uuid references admin_users(id),
  created_at  timestamptz not null default now(),
  updated_at  timestamptz not null default now()
);
```

### New table: `sourcing_runs`

```sql
create table sourcing_runs (
  id                uuid primary key default uuid_generate_v4(),
  started_at        timestamptz not null default now(),
  completed_at      timestamptz,
  status            text not null default 'running',  -- running, completed, failed
  jurisdiction_ids  uuid[],           -- which jurisdictions were scanned
  items_found       integer default 0,
  items_inserted    integer default 0,
  items_skipped     integer default 0, -- duplicates
  items_verified    integer default 0,
  items_flagged     integer default 0,
  pass              integer not null,  -- 1 = source, 2 = verify
  trigger           text not null,     -- 'manual', 'scheduled', 'editor_request'
  error_log         text,
  created_at        timestamptz not null default now()
);
```

### New table: `jurisdiction_source_feeds`

Configures where Claude looks for each jurisdiction:

```sql
create table jurisdiction_source_feeds (
  id              uuid primary key default uuid_generate_v4(),
  jurisdiction_id uuid not null references jurisdictions(id),
  feed_name       text not null,          -- human label: "CA Legislature", "Sac County Agenda"
  feed_url        text not null,          -- base URL to scan
  feed_type       text not null,          -- 'legislative_tracker', 'agenda', 'news', 'rss'
  is_active       boolean default true,
  scan_frequency  text default 'daily',   -- 'daily', 'weekly', 'on_demand'
  last_scanned_at timestamptz,
  created_at      timestamptz not null default now()
);
```

---

## Admin UI changes

The existing admin app needs the following additions to support the sourcing workflow. These are organized by page.

### Existing stub: EditorialQueue.tsx

The admin app already has an `EditorialQueue.tsx` stub (not yet routed) with the description: "Scraped agenda items needing editorial review." This is exactly the sourcing pipeline's output. Rather than building sourcing features into the existing Votes queue, the Editorial Queue should be resurrected as the primary view for sourced drafts. The Votes queue remains focused on published/active/closed votes and their analytics.

**Decision:** Route `EditorialQueue` at `/:env/editorial` and add it to the sidebar under Queues, between Proposals and Votes.

### 1. Editorial Queue (resurrect EditorialQueue.tsx)

This becomes the primary workspace for the sourcing pipeline:

- **Default view:** All `draft` votes with `sourcing_status` not null, scoped to editor's jurisdictions
- Table columns: poll question, jurisdiction, topic, sourcing status badge, source count, sourcing run date, urgency badge
- Expandable row shows proposal text, sources with trust badges, and all editor action buttons
- Filters: sourcing status, jurisdiction tier, jurisdiction, topic, date range
- Bulk actions: approve selected, reject selected
- Sort: needs_review first, then by urgency, then by date

### 2. Votes queue — sourcing columns and filters

**New columns in the votes table view:**
- **Sourcing status** badge — `sourced` / `verified` / `needs_review` / `on_hold` (color-coded)
- **Sources** count with expandable detail (click to see all attached sources with trust badges)
- **Sourcing run** link — which run produced this item, when

**New filters:**
- Filter by sourcing status (all / needs review / verified / on hold / manual)
- Filter by jurisdiction tier (federal / state / county / municipal / district)
- Filter by jurisdiction (dropdown, scoped to editor's `allowed_jurisdiction_ids`)

**New bulk actions:**
- Approve selected (batch publish)
- Reject selected (batch archive with shared reason)

### 2. Source detail panel

When an editor expands a vote row or clicks into vote detail, they see a **Sources** section:

- Each source shows: favicon, domain, title, description, trust badge (green/yellow/red/grey), verification status
- **Action buttons per source:**
  - Check link (re-verify now)
  - Whitelist domain (one click)
  - Greylist domain (one click)
  - Blacklist domain (one click)
  - Remove source
- **Action buttons for the vote:**
  - Ask Claude for better source (triggers a targeted search, result appears inline)
  - Add source manually (paste URL, auto-fetch metadata)

### 3. New page: Source Domains

A new admin page listing all classified domains:

- Table: domain, trust level (badge), notes, classified by, date
- Filters: whitelist / greylist / blacklist / neutral / all
- Inline editing of trust level and notes
- Search by domain
- Stats row: X whitelisted, Y greylisted, Z blacklisted

### 4. New page: Sourcing Runs

A log of all automated and manual sourcing runs:

- Table: date/time, pass (1 or 2), jurisdictions scanned, items found/inserted/skipped/flagged, status, trigger
- Click to expand and see the individual items from that run
- Re-run button (triggers a new Pass 1 or Pass 2 for the same jurisdiction set)

### 5. Jurisdiction page — source feeds

The existing Jurisdictions stub page needs:

- List of configured source feeds per jurisdiction
- Add/edit/deactivate feeds
- Last scanned date and item count
- Test scan button (runs Pass 1 on a single feed, shows preview without inserting)

### 6. My Queue — sourcing items

My Queue already spans environments. Add:

- Sourced drafts assigned to the editor's jurisdiction scope appear automatically
- "Needs review" items get a warning badge in My Queue
- Quick actions (approve/reject/hold) directly from My Queue without drilling into the Votes page

---

## Pilot setup checklist

To get the sourcing pipeline running for the Sacramento pilot:

- [ ] Apply schema additions (new tables + columns) to `votd-prototype`
- [ ] Seed `source_domains` with initial whitelist: `leginfo.legislature.ca.gov`, `saccounty.gov`, `cityofsacramento.gov`, `planning.saccounty.gov`, `ww2.arb.ca.gov`, `cde.ca.gov`, `dhcs.ca.gov`, `energy.ca.gov`, `calmatters.org`, `edsource.org`
- [ ] Seed `jurisdiction_source_feeds` for Sacramento County + California state
- [ ] Create a Claude Code scheduled task (or skill) for Pass 1 sourcing
- [ ] Create a Claude Code scheduled task (or skill) for Pass 2 verification
- [ ] Add sourcing status column and filters to VotesMaster.tsx
- [ ] Add source detail panel with trust management buttons
- [ ] Add Source Domains page to admin sidebar
- [ ] Add Sourcing Runs page to admin sidebar
- [ ] Wire jurisdiction source feeds to Jurisdictions page
- [ ] Test end-to-end: trigger Pass 1 → review drafts → verify → approve → appears in mobile feed

---

## Appendix: source feed configuration for Sacramento pilot

### Federal tier
| Feed | URL | Type |
|---|---|---|
| Congress.gov | `https://www.congress.gov/search` | legislative_tracker |

### State tier (California)
| Feed | URL | Type |
|---|---|---|
| CA Legislature | `https://leginfo.legislature.ca.gov` | legislative_tracker |
| CA Air Resources Board | `https://ww2.arb.ca.gov` | agenda |
| CA Energy Commission | `https://www.energy.ca.gov` | agenda |
| CalMatters | `https://calmatters.org` | news |

### County tier (Sacramento)
| Feed | URL | Type |
|---|---|---|
| Sac County Board of Supervisors | `https://saccounty.gov` | agenda |
| Sac County Planning | `https://planning.saccounty.gov` | agenda |

### Municipal tier (Sacramento City)
| Feed | URL | Type |
|---|---|---|
| Sacramento City Council | `https://www.cityofsacramento.gov` | agenda |

### District tier
| Feed | URL | Type |
|---|---|---|
| SCUSD | `https://www.scusd.edu` | agenda |
| SMUD | `https://www.smud.org` | agenda |
| SacRT | `https://www.sacrt.com` | agenda |
