QC Collector — Data Intake & Schema Discovery

Purpose: Concrete, send-this-back checklist of API responses, sample data, and answers we need from the customer's Jira + Zephyr Essential instance. Each item lists what we'll learn from it and how the collector will adapt if it changes later.

Audience: Implementer (you) → customer admin / SRE / QA lead.

Companion to: QC_JIRA_COLLECTOR_PLAN.md (architecture & design). This doc is purely about input data we need to lock down DTOs, normalization rules, and field maps.

Why this doc exists: Jira / Zephyr field IDs, status names, and issue-type names are per-instance. Hard-coding them is the #1 reason connectors break in real deployments. We discover at startup, persist a map, and reconcile on a schedule — but we still need a one-time intake to seed everything correctly.


Table of Contents


1. How to send responses back

For each API call below, send:

  1. Exact URL you hit (so we know which Jira host, which params)

  2. HTTP status code

  3. Full response headers — particularly anything starting with X-RateLimit-, Retry-After, X-AREQUESTID, pagination headers, and Content-Type

  4. Full JSON response body (don't trim — even fields you think are irrelevant are useful)

  5. Jira version label if you have multiple instances (e.g., "DC 10.3.5 prod" vs "DC 9.12.5 staging")

File-naming convention that makes review fast:

Tarball them, put them in a private GitHub gist or shared drive — whatever's easiest. Don't paste into chat — large JSON gets clipped and we lose nested fields.

Sanitizing: if any field contains PII / customer data you can't share, replace the value but keep the key and shape (e.g., "summary": "<redacted>", but keep all custom field IDs and structures intact). Field IDs and shapes are the gold; specific summaries / descriptions are not.


Section A — Connectivity & version

A1. GET /rest/api/2/myself

Why: Confirms PAT works. Tells us which user fields are present (name, key, displayName, emailAddress).

What we'll learn:

  • Whether the user identifier is name, key, or both — drives how we store assignee / reporter / transitionedBy in qc_issues and qc_issue_history.

  • Whether the PAT user has display name + email or just a username (helps the dashboard).

Adaptation if it changes later: user-id field name is wrapped in a single normalizer (JiraNormalizer.resolveUser(JsonNode)); if Atlassian changes the canonical field, we change one method.

A2. GET /rest/api/2/serverInfo

Why: Confirms exact Jira version programmatically. Drives version-conditional code paths.

What we'll learn:

  • version (e.g., "10.3.5") and versionNumbers (e.g., [10, 3, 5])

  • deploymentType ("Server" or "DataCenter")

  • buildNumber, buildDate

Adaptation if it changes later: every cycle re-reads serverInfo once a day; if a major version bump introduces new endpoints (e.g., 10.x got /rest/api/2/project/search), the collector picks them up after restart without a code change. Behaviour gated on versionNumbers[0] >= 10.


Section B — Project discovery

B1. GET /rest/api/2/project

Why: Drives the auto-discovery job that populates qc_projects.

What we'll learn:

  • Flat array vs. paginated envelope (values, isLast, nextPage)

  • Per-project fields available without expand: id, key, name, projectTypeKey, archived, lead.name, category.name

  • How many projects the PAT can see (silent permission issues surface here)

Adaptation: project-discovery client tries /rest/api/2/project/search (paginated, 10.x+) first; on 404 falls back to /rest/api/2/project (flat). Both paths fill the same qc_projects schema.

B2. GET /rest/api/2/project/search?startAt=0&maxResults=50 (10.x only)

Why: Confirms paginated variant on 10.3.5. Expected to 404 on 9.12.5 — confirming that is also useful.

What we'll learn:

  • Pagination envelope shape on 10.x

  • Whether nextPage, isLast, total are present and reliable

B3. GET /rest/api/2/project/{KEY}?expand=description,lead,issueTypes,projectKeys,permissions,insight

Why: Rich project shape — the fields that matter when we want to know "which issue types does this project use" (relevant for filtering Zephyr-enabled projects only).

What we'll learn:

  • issueTypes[] per project — tells us which projects actually have Test / Test Execution enabled

  • projectKeys (renamed projects — historical keys still resolve)

  • permissions (which actions the PAT user can perform)

Adaptation: if Atlassian adds new expand options, we re-fetch one project on demand to pick them up.


Section C — Issue search & changelog

C1. GET /rest/api/2/search?jql=project=<KEY>&startAt=0&maxResults=2&fields=summary,issuetype,priority,status,assignee,reporter,created,updated,resolutiondate,labels,fixVersions,sprint,issuelinks&expand=changelog,names,schema

Why: This is the exact shape of the call the collector runs every 10 minutes. Two issues is enough — pick one resolved Bug with a real changelog (≥3 transitions), one open issue.

What we'll learn from this single response — a lot:

  • Search envelope: startAt, maxResults, total, issues[] — the pagination contract

  • priority.name actual values on this instance — drives severity normalization map (see §D2)

  • status.statusCategory.key actual values — drives OPEN / IN_PROGRESS / CLOSED normalization (see §D1)

  • assignee / reporter shape (object vs. null vs. <unassigned> placeholder)

  • fixVersions[] — confirm whether it's an array of {name} objects

  • sprint field — does it appear at top level of fields or only via customfield_NNNNN? Varies by version.

  • issuelinks[] — shape of links (type.name, inwardIssue.key, outwardIssue.key)

  • changelog.histories[] — entry shape, items[] shape, what field=status items look like (from, to, fromString, toString)

  • names map — Rosetta stone for customfield_NNNNN → human-readable name

  • schema map — type info for every field

Adaptation: every field we read goes through a single JsonNode.path("...").asText(null) chain — if a field disappears or moves, the worst case is a null where we had a value, never a crash.

C2. GET /rest/api/2/search?jql=project=<KEY>&startAt=0&maxResults=100&fields=summary,updated

Why: Pagination boundary check. Pick a project with >100 open + closed issues so total > maxResults.

What we'll learn:

  • Whether total is exact, an estimate, or capped (some Jira admins limit this)

  • Effective maxResults ceiling — admins can lower the default 1000 to 100 or 50; we need to know the cap so we don't ask for more

  • Whether startAt + maxResults > total returns empty array or 400

Adaptation: page size is read from qc.page-size config; we already cap at 100 by default. If admin tightens it further, we'd lower the config — no code change.

C3. GET /rest/api/2/issue/{KEY}?expand=changelog,renderedFields,names,schema

Why: Belt-and-suspenders for changelog shape. Pick an issue with at least 3 status transitions (Open → In Progress → In Review → Done). Useful as a sanity check that single-issue and search responses have the same structure for changelog.histories[].items[].

What we'll learn:

  • Whether expand=changelog on a single-issue endpoint returns the same shape as on search — confirms we can use one parser for both paths

  • Whether renderedFields contains anything useful (if customer descriptions are in wiki markup, this gives HTML for the dashboard later)


Section D — Reference data (statuses, priorities, issue types)

This is the data that drives our normalization tables. Without it, we ship a default map that probably maps half the instance's labels to null.

D1. GET /rest/api/2/status

Why: Lists every status the instance defines, with its statusCategory (new / indeterminate / done).

What we'll learn:

  • Custom statuses the customer added (e.g., "Awaiting QA," "Ready for Release," "Deferred")

  • Which statusCategory each is in — that's our anchor for OPEN / IN_PROGRESS / CLOSED

  • Status IDs (sometimes used in JQL status=10001)

Normalization rule we'll lock down:

Status names don't drive normalization — only categories. This means a customer-renamed "Done" → "Shipped" still maps correctly.

Adaptation if it changes later: new status added with the wrong category by an admin → we'd see issues stuck in OPEN. The normalization map is read from a config-overridable qc.status-category-overrides map per status name, so ops can correct it without redeploying.

D2. GET /rest/api/2/priority

Why: Lists every priority. Drives the severity mapping.

What we'll learn:

  • Real priority names on this instance — could be Blocker / Critical / Major / Minor / Trivial (classic), Highest / High / Medium / Low / Lowest (newer default), or custom (P0 / P1 / P2 / P3)

Default map (we'll adjust based on response):

Customer can override via qc.priority-severity-map env var:

D3. GET /rest/api/2/issuetype

Why: Confirms exact issue-type names. Critical because we're filtering on them and Zephyr's "Test" / "Test Execution" can be renamed.

What we'll learn:

  • Standard types: Bug, Story, Task, Epic, Sub-task — and any aliases (Defect instead of Bug)

  • Zephyr Essential types: Test, Test Execution — and any rename (Test Case?)

  • Each type's iconUrl and id

Default normalization map (adjust on response):

Override via qc.issuetype-map:


Section E — Custom fields (Jira + Zephyr Essential)

This is the most important section. Custom-field IDs are per-instance — we cannot guess them.

E1. GET /rest/api/2/field

Why: Returns every field (system + custom) with id, name, schema, custom plugin source.

What we'll learn:

  • Every customfield_NNNNN that exists in the instance

  • Each one's name, schema type (option, string, array, datetime, etc.), and schema.custom (the plugin source — e.g., com.smartbear.zephyrforjira for Zephyr Essential fields)

  • Sprint field ID (usually customfield_10020 but varies)

  • Story-points field ID

  • Any custom severity field separate from priority

This response seeds qc_field_map.

Filter logic the field-discovery job applies:

E2. GET /rest/api/2/issue/<sample-key>?expand=names,schema

Why: The Rosetta stone. The names object in the response is {customfield_NNNNN: human-readable name}.

Pick a sample issue that has every field of interest populated:

  • A Bug with Sprint, Story Points, Severity (if custom)

  • OR if you have it, a Test issue (best — exposes Zephyr fields too — see §F3)

What we'll learn:

  • Authoritative name for every customfield on this issue (what the user sees in Jira UI)

  • schema block: per-field type, items (for arrays), custom (plugin source), customId

Why both E1 and E2: E1 lists every field globally; E2 confirms which ones actually carry data on a real issue. Some customs are defined but never populated — we don't want to map those.

Why: Per-issue-type field metadata. Tells us which custom fields are required on Test issues vs. Bug issues vs. Test Execution issues.

What we'll learn:

  • Which Zephyr fields are mandatory vs. optional on Test issues — drives our normalizer's "missing field is OK / not OK" decisions

  • Field allowed values (for option-typed customs) — useful for the test-status enum

  • Field default values

If createmeta is restricted by permissions, skip — we can derive most of this from sample responses.


Section F — Zephyr Essential test artifacts

F1. GET /rest/api/2/search?jql=project=<KEY> AND issuetype=Test&maxResults=2&fields=*all&expand=changelog,names,schema

Why: Full shape of a Test issue with every field included (fields=*all). Names map tells us which customfield_NNNNN holds what.

What we'll learn:

  • Which custom fields a Test issue actually has populated (steps, expected, preconditions, automation, etc.)

  • Their schema types (option, string, complex object, array)

  • The relationship between Test Steps custom field structure and what the dashboard would display

F2. GET /rest/api/2/search?jql=project=<KEY> AND issuetype="Test Execution"&maxResults=2&fields=*all&expand=changelog,names,schema

Why: Full shape of a Test Execution. Different field set from Test — has execution status, executed-by, executed-at, linked test cases, cycle.

What we'll learn:

  • Where execution status lives (custom field name + schema)

  • How linked tests are represented (array of issue keys? array of objects with key? IDs only?)

  • How the cycle is referenced (option list? string? linked issue?)

  • Whether lastExecutedAt is a custom-field date or derived from changelog

F3. GET /rest/api/2/issue/<TEST_KEY>?fields=*all&expand=names,schema,renderedFields

Why: Single Test issue, fully expanded. The single most useful response in this whole document for Zephyr field mapping. Same data as F1 but for one issue, easier to read.

F4. GET /rest/api/2/issue/<EXEC_KEY>?fields=*all&expand=names,schema,renderedFields

Why: Same for a Test Execution.

F5. (optional) GET /rest/zephyr/latest/test, /rest/zephyr/latest/execution, /rest/zephyr/latest/cycle

Why: Probe whether Zephyr Essential exposes any non-Jira-issue endpoints. Try a couple of paths and report back what they return (200 with data / 404 / 401).

What we'll learn:

  • Whether there's data accessible only via Zephyr endpoints (e.g., execution-step results not exposed on the issue itself)

  • The auth model (does the Jira PAT work on these too, or do they require a separate Zephyr token?)

If everything 404s, that's also a useful answer — confirms "everything via /rest/api/2/..." and we don't need a Zephyr-specific client.


Why: Pick an issue you know has links — ideally a Bug linked to a Test, or vice versa. We need to see how the link object is structured.

What we'll learn:

  • Link shape: type.name, type.inward, type.outward, inwardIssue.key vs outwardIssue.key

  • The actual link-type names used on this instance (tests, is tested by, relates to, blocks, is blocked by, causes, caused by, customer-defined types?)

  • Whether issuelinks are returned in full or just IDs that need a separate fetch

This drives qc_issue_links schema and the link-type normalization (e.g., we want to filter to test-related link types: tests, is tested by, defect, is defect of).

G2. GET /rest/api/2/issueLinkType

Why: Lists every issue-link type defined on the instance.

What we'll learn:

  • The full set of available link types (instance-defined)

  • Their inward and outward directional names

  • Which ones are relevant to test ↔ defect relationships (we filter to those during ingestion)


Section H — Rate limits & headers

H1. Capture full headers from any of the above calls

Why: Server-side rate limiting on Jira DC is per-instance configured. The collector's token-bucket limiter needs to know the actual budget.

What to look for in headers (case-insensitive):

What we'll learn:

  • Whether these headers are exposed at all (10.x: usually yes; 9.x: inconsistent)

  • The actual numbers — we'll set jira.rate-limit-rpm to ~70% of the observed budget for safety

H2. (optional) Trigger a 429 if you can

Why: See exactly what a rate-limit response looks like on this instance — body shape, Retry-After value, recovery time.

If you can't / shouldn't trigger one, skip — we use a conservative default (200 rpm with token bucket) and adapt at runtime.


Section I — PAT & permissions

Not an API call — questions to confirm:

Question
Why it matters

Which Jira user owns the PAT?

If it's a personal account, attrition is a risk — recommend a service account

Does the PAT user have Browse Projects permission on all projects we want to ingest?

Auto-discovery silently misses projects the user can't browse

Does the PAT user have View Read-Only Workflow permission?

Required for changelog access on some configs

Does the PAT user have access to Zephyr Essential fields and issues?

Zephyr Essential has its own permission scheme — a Jira admin isn't automatically a Zephyr admin

What's the PAT expiry policy?

DC PATs can be set to expire — we need a renewal procedure

Can the PAT be scoped down (read-only)?

We only need read; least-privilege reduces risk

Adaptation if access changes later: the project-discovery job logs the count of discovered projects every cycle. If it drops by >10% suddenly, we alert — surfaces silent permission revocations within 6 hours.


Section J — Schema-change handling (what happens later)

This is the part that future-proofs the collector. Concrete answers to "what if X changes after we ship?"

J.1 New custom field added in Jira

Example: customer adds a new "Customer Impact" custom field on Bugs.

What happens:

  • Field-discovery job picks it up within 24 hours (next daily run)

  • Heuristic matchers don't recognize it → stored as source: CUSTOM in qc_field_map

  • Collector continues ingesting — the field is in raw blob on qc_issues but not in any normalized field

  • When the dashboard team wants to use it: add a qc.custom-field-overrides entry mapping it to a logical name, restart collector, done

Manual trigger if customer needs it sooner: admin endpoint POST /admin/jira/discover-fields runs the discovery on demand.

J.2 Custom field renamed

Example: customer renames "Test Status" → "Execution Outcome."

What happens:

  • Field-discovery still finds the field by ID (customfield_12031)

  • Display name in qc_field_map updates

  • Heuristic matcher might un-match it ("Test Status" was a name match)

  • Mitigation: matcher prefers schema.custom source over name when source is unambiguous (com.smartbear.zephyrforjira → it's still the test status field regardless of display name)

J.3 Custom field deleted

What happens:

  • Field-discovery returns no entry for that ID

  • qc_field_map entry is marked deleted=true (soft-delete, preserves historical context)

  • Issue sync logs a warning the first time it tries to read a deleted field

  • New issues stop populating the corresponding test.* sub-field — old data remains intact

J.4 New status added to workflow

Example: customer adds "Awaiting Customer Verification" status.

What happens:

  • Status-discovery (called as part of cycle prep) picks it up

  • Mapped to OPEN/IN_PROGRESS/CLOSED based on its statusCategory automatically

  • If the admin assigned the wrong category → ops can correct via qc.status-category-overrides config

J.5 New priority added

What happens: unknown priorities map to MEDIUM by default. Customer can add an override entry. Dashboard never breaks — just shows that priority as MEDIUM until configured.

J.6 Issue type renamed

Example: customer renames "Test" → "Manual Test Case."

What happens:

  • Issuetype-discovery picks up the new name

  • Default normalization no longer matches ("Test" was hardcoded in the map)

  • Mitigation: ship qc.issuetype-map overrides as a config-only fix; ops adds "Manual Test Case": TEST and restarts. No code change.

J.7 Jira version upgrade (9.x → 10.x or 10.x → 11.x)

What happens:

  • serverInfo re-reads on next cycle, logs new version

  • If 10.x added new endpoints (e.g., /rest/api/2/project/search), collector starts using them automatically (already gated on version check)

  • If a future Jira removes /rest/api/2/... entirely (Atlassian has hinted at this for Cloud, not DC) — we'd need a code change to switch to v3 or beyond. Detected at startup; collector logs an error and refuses to run rather than silently mis-collecting.

J.8 Zephyr Essential upgrade

What happens:

  • Custom field IDs are typically stable across SmartBear plugin upgrades

  • If the plugin changes its field schema (new option in test status, new field added), field-discovery picks it up next run

  • If plugin is removed: discovered customs disappear; test.* sub-doc stops being populated. Old data remains. Dashboard's Test Coverage tab gracefully degrades.

J.9 Field map override precedence

The order of resolution at issue-sync time:

This means ops can always force a value without waiting on a code release.

J.10 Detecting schema drift in production

The collector emits Micrometer metrics:

Set alerts on these so we know when drift starts before it bites.


Section K — Open answers we need (no API call required)

Quick questions we'd like answered alongside the API responses:

  1. Jira version inventory — which versions are deployed (10.3.5? 9.12.5? both?), which is the primary target for v1, and is there a planned upgrade window?

  2. Tenant size — rough numbers: how many projects, how many open issues, how many test cases, how many test executions per month? Drives whether we need pre-aggregation from day one.

  3. First-cycle backfill window — accept our default of "all open issues + closed-180d," or specify different? E.g., regulated customers sometimes want full history.

  4. History retention — accept default 30 days for qc_issue_history, or longer? (This is the TTL on status-transition records, not on issues themselves.)

  5. Project scope — do we ingest every project the PAT can see, or scope down via allowlist? If allowlist, give us the list.

  6. Zephyr-enabled projects — are tests / executions only used in some projects, or company-wide? If subset, we can scope the test-related custom-field reads to those projects (small perf win).

  7. Test status labels — does the customer use the standard Pass / Fail / Blocked / WIP / Unexecuted, or have they added custom states (Retest, Skipped, etc.)?

  8. Test cycle model — is "Test Cycle" an option list, a separate issue type, a structured field, or a label convention? (Varies between Zephyr Essential versions — we need to know which one the customer's on.)

  9. Customer-defined link types — beyond tests / is tested by, are there custom link types we should care about (e.g., verifies, validates)?

  10. Severity field — is severity inferred from priority, or does the customer have a separate custom "Severity" field? (Some regulated industries do.)

  11. Sprint usage — does the customer use Jira Software sprints? If yes, sprint custom field ID. If no, we skip that part of the field map.

  12. Network access — is Jira reachable from where the collector runs (VPN? whitelisted IP? mTLS?). Affects deployment, not code.

  13. Time zone — what time zone is Jira configured for? Affects how we interpret created / updated ISO strings (Jira returns instance-tz, not always UTC). The dashboard should show consistent times.

  14. Holidays / business hours for SLA — the dashboard's SLA computation could be calendar-day or business-hour. Spec implies calendar; confirm.


Minimum viable set

If the customer can only send a subset, prioritize these. With these we can ship the foundation and field map; everything else refines.

Priority
Item
Why

P0

A1 /myself

Confirms PAT works

P0

A2 /serverInfo

Confirms version

P0

B1 /project

Confirms discovery shape

P0

C1 /search core query (with expand=changelog,names,schema)

The single most informative response

P0

E1 /field

Custom field IDs

P0

F3 /issue/<TEST_KEY>?fields=*all&expand=names,schema

Zephyr Rosetta stone

P1

D1 /status

Status normalization

P1

D2 /priority

Severity normalization

P1

D3 /issuetype

Issue-type normalization

P1

F4 /issue/<EXEC_KEY>?fields=*all&expand=names,schema

Test execution shape

P1

G2 /issueLinkType

Link-type catalog

P2

C2 pagination boundary

Confirms total cap

P2

C3 single-issue changelog

Belt-and-suspenders

P2

F1, F2

Useful but redundant with F3, F4

P2

F5 /rest/zephyr/... probe

Useful but we can probe ourselves once we have credentials

P2

H1 headers

Useful but observable at runtime

P0 + P1 = ten responses. That's enough for us to ship steps 1–4 of the implementation plan with high confidence.


Document version: 2026-04-28 — Intake stage. To be retired once responses are received and DTOs / field maps are seeded into the codebase. After that, schema-change handling lives in code (§J) rather than in this doc.

Last updated