QC Collector — Data Intake & Schema Discovery
Purpose: Concrete, send-this-back checklist of API responses, sample data, and answers we need from the customer's Jira + Zephyr Essential instance. Each item lists what we'll learn from it and how the collector will adapt if it changes later.
Audience: Implementer (you) → customer admin / SRE / QA lead.
Companion to:
QC_JIRA_COLLECTOR_PLAN.md(architecture & design). This doc is purely about input data we need to lock down DTOs, normalization rules, and field maps.Why this doc exists: Jira / Zephyr field IDs, status names, and issue-type names are per-instance. Hard-coding them is the #1 reason connectors break in real deployments. We discover at startup, persist a map, and reconcile on a schedule — but we still need a one-time intake to seed everything correctly.
Table of Contents
1. How to send responses back
For each API call below, send:
Exact URL you hit (so we know which Jira host, which params)
HTTP status code
Full response headers — particularly anything starting with
X-RateLimit-,Retry-After,X-AREQUESTID, pagination headers, andContent-TypeFull JSON response body (don't trim — even fields you think are irrelevant are useful)
Jira version label if you have multiple instances (e.g., "DC 10.3.5 prod" vs "DC 9.12.5 staging")
File-naming convention that makes review fast:
Tarball them, put them in a private GitHub gist or shared drive — whatever's easiest. Don't paste into chat — large JSON gets clipped and we lose nested fields.
Sanitizing: if any field contains PII / customer data you can't share, replace the value but keep the key and shape (e.g., "summary": "<redacted>", but keep all custom field IDs and structures intact). Field IDs and shapes are the gold; specific summaries / descriptions are not.
Section A — Connectivity & version
A1. GET /rest/api/2/myself
GET /rest/api/2/myselfWhy: Confirms PAT works. Tells us which user fields are present (name, key, displayName, emailAddress).
What we'll learn:
Whether the user identifier is
name,key, or both — drives how we storeassignee/reporter/transitionedByinqc_issuesandqc_issue_history.Whether the PAT user has display name + email or just a username (helps the dashboard).
Adaptation if it changes later: user-id field name is wrapped in a single normalizer (JiraNormalizer.resolveUser(JsonNode)); if Atlassian changes the canonical field, we change one method.
A2. GET /rest/api/2/serverInfo
GET /rest/api/2/serverInfoWhy: Confirms exact Jira version programmatically. Drives version-conditional code paths.
What we'll learn:
version(e.g., "10.3.5") andversionNumbers(e.g.,[10, 3, 5])deploymentType("Server" or "DataCenter")buildNumber,buildDate
Adaptation if it changes later: every cycle re-reads serverInfo once a day; if a major version bump introduces new endpoints (e.g., 10.x got /rest/api/2/project/search), the collector picks them up after restart without a code change. Behaviour gated on versionNumbers[0] >= 10.
Section B — Project discovery
B1. GET /rest/api/2/project
GET /rest/api/2/projectWhy: Drives the auto-discovery job that populates qc_projects.
What we'll learn:
Flat array vs. paginated envelope (
values,isLast,nextPage)Per-project fields available without
expand:id,key,name,projectTypeKey,archived,lead.name,category.nameHow many projects the PAT can see (silent permission issues surface here)
Adaptation: project-discovery client tries /rest/api/2/project/search (paginated, 10.x+) first; on 404 falls back to /rest/api/2/project (flat). Both paths fill the same qc_projects schema.
B2. GET /rest/api/2/project/search?startAt=0&maxResults=50 (10.x only)
GET /rest/api/2/project/search?startAt=0&maxResults=50 (10.x only)Why: Confirms paginated variant on 10.3.5. Expected to 404 on 9.12.5 — confirming that is also useful.
What we'll learn:
Pagination envelope shape on 10.x
Whether
nextPage,isLast,totalare present and reliable
B3. GET /rest/api/2/project/{KEY}?expand=description,lead,issueTypes,projectKeys,permissions,insight
GET /rest/api/2/project/{KEY}?expand=description,lead,issueTypes,projectKeys,permissions,insightWhy: Rich project shape — the fields that matter when we want to know "which issue types does this project use" (relevant for filtering Zephyr-enabled projects only).
What we'll learn:
issueTypes[]per project — tells us which projects actually haveTest/Test ExecutionenabledprojectKeys(renamed projects — historical keys still resolve)permissions(which actions the PAT user can perform)
Adaptation: if Atlassian adds new expand options, we re-fetch one project on demand to pick them up.
Section C — Issue search & changelog
C1. GET /rest/api/2/search?jql=project=<KEY>&startAt=0&maxResults=2&fields=summary,issuetype,priority,status,assignee,reporter,created,updated,resolutiondate,labels,fixVersions,sprint,issuelinks&expand=changelog,names,schema
GET /rest/api/2/search?jql=project=<KEY>&startAt=0&maxResults=2&fields=summary,issuetype,priority,status,assignee,reporter,created,updated,resolutiondate,labels,fixVersions,sprint,issuelinks&expand=changelog,names,schemaWhy: This is the exact shape of the call the collector runs every 10 minutes. Two issues is enough — pick one resolved Bug with a real changelog (≥3 transitions), one open issue.
What we'll learn from this single response — a lot:
Search envelope:
startAt,maxResults,total,issues[]— the pagination contractpriority.nameactual values on this instance — drivesseveritynormalization map (see §D2)status.statusCategory.keyactual values — drivesOPEN/IN_PROGRESS/CLOSEDnormalization (see §D1)assignee/reportershape (object vs. null vs.<unassigned>placeholder)fixVersions[]— confirm whether it's an array of{name}objectssprintfield — does it appear at top level offieldsor only viacustomfield_NNNNN? Varies by version.issuelinks[]— shape of links (type.name,inwardIssue.key,outwardIssue.key)changelog.histories[]— entry shape,items[]shape, whatfield=statusitems look like (from,to,fromString,toString)namesmap — Rosetta stone forcustomfield_NNNNN→ human-readable nameschemamap — type info for every field
Adaptation: every field we read goes through a single JsonNode.path("...").asText(null) chain — if a field disappears or moves, the worst case is a null where we had a value, never a crash.
C2. GET /rest/api/2/search?jql=project=<KEY>&startAt=0&maxResults=100&fields=summary,updated
GET /rest/api/2/search?jql=project=<KEY>&startAt=0&maxResults=100&fields=summary,updatedWhy: Pagination boundary check. Pick a project with >100 open + closed issues so total > maxResults.
What we'll learn:
Whether
totalis exact, an estimate, or capped (some Jira admins limit this)Effective
maxResultsceiling — admins can lower the default 1000 to 100 or 50; we need to know the cap so we don't ask for moreWhether
startAt + maxResults > totalreturns empty array or 400
Adaptation: page size is read from qc.page-size config; we already cap at 100 by default. If admin tightens it further, we'd lower the config — no code change.
C3. GET /rest/api/2/issue/{KEY}?expand=changelog,renderedFields,names,schema
GET /rest/api/2/issue/{KEY}?expand=changelog,renderedFields,names,schemaWhy: Belt-and-suspenders for changelog shape. Pick an issue with at least 3 status transitions (Open → In Progress → In Review → Done). Useful as a sanity check that single-issue and search responses have the same structure for changelog.histories[].items[].
What we'll learn:
Whether
expand=changelogon a single-issue endpoint returns the same shape as on search — confirms we can use one parser for both pathsWhether
renderedFieldscontains anything useful (if customer descriptions are in wiki markup, this gives HTML for the dashboard later)
Section D — Reference data (statuses, priorities, issue types)
This is the data that drives our normalization tables. Without it, we ship a default map that probably maps half the instance's labels to null.
D1. GET /rest/api/2/status
GET /rest/api/2/statusWhy: Lists every status the instance defines, with its statusCategory (new / indeterminate / done).
What we'll learn:
Custom statuses the customer added (e.g., "Awaiting QA," "Ready for Release," "Deferred")
Which
statusCategoryeach is in — that's our anchor forOPEN/IN_PROGRESS/CLOSEDStatus IDs (sometimes used in JQL
status=10001)
Normalization rule we'll lock down:
Status names don't drive normalization — only categories. This means a customer-renamed "Done" → "Shipped" still maps correctly.
Adaptation if it changes later: new status added with the wrong category by an admin → we'd see issues stuck in OPEN. The normalization map is read from a config-overridable qc.status-category-overrides map per status name, so ops can correct it without redeploying.
D2. GET /rest/api/2/priority
GET /rest/api/2/priorityWhy: Lists every priority. Drives the severity mapping.
What we'll learn:
Real priority names on this instance — could be
Blocker / Critical / Major / Minor / Trivial(classic),Highest / High / Medium / Low / Lowest(newer default), or custom (P0 / P1 / P2 / P3)
Default map (we'll adjust based on response):
Customer can override via qc.priority-severity-map env var:
D3. GET /rest/api/2/issuetype
GET /rest/api/2/issuetypeWhy: Confirms exact issue-type names. Critical because we're filtering on them and Zephyr's "Test" / "Test Execution" can be renamed.
What we'll learn:
Standard types:
Bug,Story,Task,Epic,Sub-task— and any aliases (Defectinstead ofBug)Zephyr Essential types:
Test,Test Execution— and any rename (Test Case?)Each type's
iconUrlandid
Default normalization map (adjust on response):
Override via qc.issuetype-map:
Section E — Custom fields (Jira + Zephyr Essential)
This is the most important section. Custom-field IDs are per-instance — we cannot guess them.
E1. GET /rest/api/2/field
GET /rest/api/2/fieldWhy: Returns every field (system + custom) with id, name, schema, custom plugin source.
What we'll learn:
Every
customfield_NNNNNthat exists in the instanceEach one's name, schema type (
option,string,array,datetime, etc.), andschema.custom(the plugin source — e.g.,com.smartbear.zephyrforjirafor Zephyr Essential fields)Sprint field ID (usually
customfield_10020but varies)Story-points field ID
Any custom severity field separate from priority
This response seeds qc_field_map.
Filter logic the field-discovery job applies:
E2. GET /rest/api/2/issue/<sample-key>?expand=names,schema
GET /rest/api/2/issue/<sample-key>?expand=names,schemaWhy: The Rosetta stone. The names object in the response is {customfield_NNNNN: human-readable name}.
Pick a sample issue that has every field of interest populated:
A Bug with
Sprint,Story Points,Severity(if custom)OR if you have it, a Test issue (best — exposes Zephyr fields too — see §F3)
What we'll learn:
Authoritative name for every customfield on this issue (what the user sees in Jira UI)
schemablock: per-fieldtype,items(for arrays),custom(plugin source),customId
Why both E1 and E2: E1 lists every field globally; E2 confirms which ones actually carry data on a real issue. Some customs are defined but never populated — we don't want to map those.
E3. (optional but recommended) GET /rest/api/2/issue/createmeta?projectKeys=<KEY>&expand=projects.issuetypes.fields
GET /rest/api/2/issue/createmeta?projectKeys=<KEY>&expand=projects.issuetypes.fieldsWhy: Per-issue-type field metadata. Tells us which custom fields are required on Test issues vs. Bug issues vs. Test Execution issues.
What we'll learn:
Which Zephyr fields are mandatory vs. optional on Test issues — drives our normalizer's "missing field is OK / not OK" decisions
Field allowed values (for option-typed customs) — useful for the test-status enum
Field default values
If createmeta is restricted by permissions, skip — we can derive most of this from sample responses.
Section F — Zephyr Essential test artifacts
F1. GET /rest/api/2/search?jql=project=<KEY> AND issuetype=Test&maxResults=2&fields=*all&expand=changelog,names,schema
GET /rest/api/2/search?jql=project=<KEY> AND issuetype=Test&maxResults=2&fields=*all&expand=changelog,names,schemaWhy: Full shape of a Test issue with every field included (fields=*all). Names map tells us which customfield_NNNNN holds what.
What we'll learn:
Which custom fields a Test issue actually has populated (steps, expected, preconditions, automation, etc.)
Their schema types (option, string, complex object, array)
The relationship between Test Steps custom field structure and what the dashboard would display
F2. GET /rest/api/2/search?jql=project=<KEY> AND issuetype="Test Execution"&maxResults=2&fields=*all&expand=changelog,names,schema
GET /rest/api/2/search?jql=project=<KEY> AND issuetype="Test Execution"&maxResults=2&fields=*all&expand=changelog,names,schemaWhy: Full shape of a Test Execution. Different field set from Test — has execution status, executed-by, executed-at, linked test cases, cycle.
What we'll learn:
Where execution status lives (custom field name + schema)
How linked tests are represented (array of issue keys? array of objects with
key? IDs only?)How the cycle is referenced (option list? string? linked issue?)
Whether
lastExecutedAtis a custom-field date or derived from changelog
F3. GET /rest/api/2/issue/<TEST_KEY>?fields=*all&expand=names,schema,renderedFields
GET /rest/api/2/issue/<TEST_KEY>?fields=*all&expand=names,schema,renderedFieldsWhy: Single Test issue, fully expanded. The single most useful response in this whole document for Zephyr field mapping. Same data as F1 but for one issue, easier to read.
F4. GET /rest/api/2/issue/<EXEC_KEY>?fields=*all&expand=names,schema,renderedFields
GET /rest/api/2/issue/<EXEC_KEY>?fields=*all&expand=names,schema,renderedFieldsWhy: Same for a Test Execution.
F5. (optional) GET /rest/zephyr/latest/test, /rest/zephyr/latest/execution, /rest/zephyr/latest/cycle
GET /rest/zephyr/latest/test, /rest/zephyr/latest/execution, /rest/zephyr/latest/cycleWhy: Probe whether Zephyr Essential exposes any non-Jira-issue endpoints. Try a couple of paths and report back what they return (200 with data / 404 / 401).
What we'll learn:
Whether there's data accessible only via Zephyr endpoints (e.g., execution-step results not exposed on the issue itself)
The auth model (does the Jira PAT work on these too, or do they require a separate Zephyr token?)
If everything 404s, that's also a useful answer — confirms "everything via /rest/api/2/..." and we don't need a Zephyr-specific client.
Section G — Issue links (for test ↔ defect graph)
G1. GET /rest/api/2/issue/<KEY>?fields=issuelinks,issuetype,summary&expand=names
GET /rest/api/2/issue/<KEY>?fields=issuelinks,issuetype,summary&expand=namesWhy: Pick an issue you know has links — ideally a Bug linked to a Test, or vice versa. We need to see how the link object is structured.
What we'll learn:
Link shape:
type.name,type.inward,type.outward,inwardIssue.keyvsoutwardIssue.keyThe actual link-type names used on this instance (
tests,is tested by,relates to,blocks,is blocked by,causes,caused by, customer-defined types?)Whether issuelinks are returned in full or just IDs that need a separate fetch
This drives qc_issue_links schema and the link-type normalization (e.g., we want to filter to test-related link types: tests, is tested by, defect, is defect of).
G2. GET /rest/api/2/issueLinkType
GET /rest/api/2/issueLinkTypeWhy: Lists every issue-link type defined on the instance.
What we'll learn:
The full set of available link types (instance-defined)
Their
inwardandoutwarddirectional namesWhich ones are relevant to test ↔ defect relationships (we filter to those during ingestion)
Section H — Rate limits & headers
H1. Capture full headers from any of the above calls
Why: Server-side rate limiting on Jira DC is per-instance configured. The collector's token-bucket limiter needs to know the actual budget.
What to look for in headers (case-insensitive):
What we'll learn:
Whether these headers are exposed at all (10.x: usually yes; 9.x: inconsistent)
The actual numbers — we'll set
jira.rate-limit-rpmto ~70% of the observed budget for safety
H2. (optional) Trigger a 429 if you can
Why: See exactly what a rate-limit response looks like on this instance — body shape, Retry-After value, recovery time.
If you can't / shouldn't trigger one, skip — we use a conservative default (200 rpm with token bucket) and adapt at runtime.
Section I — PAT & permissions
Not an API call — questions to confirm:
Which Jira user owns the PAT?
If it's a personal account, attrition is a risk — recommend a service account
Does the PAT user have Browse Projects permission on all projects we want to ingest?
Auto-discovery silently misses projects the user can't browse
Does the PAT user have View Read-Only Workflow permission?
Required for changelog access on some configs
Does the PAT user have access to Zephyr Essential fields and issues?
Zephyr Essential has its own permission scheme — a Jira admin isn't automatically a Zephyr admin
What's the PAT expiry policy?
DC PATs can be set to expire — we need a renewal procedure
Can the PAT be scoped down (read-only)?
We only need read; least-privilege reduces risk
Adaptation if access changes later: the project-discovery job logs the count of discovered projects every cycle. If it drops by >10% suddenly, we alert — surfaces silent permission revocations within 6 hours.
Section J — Schema-change handling (what happens later)
This is the part that future-proofs the collector. Concrete answers to "what if X changes after we ship?"
J.1 New custom field added in Jira
Example: customer adds a new "Customer Impact" custom field on Bugs.
What happens:
Field-discovery job picks it up within 24 hours (next daily run)
Heuristic matchers don't recognize it → stored as
source: CUSTOMinqc_field_mapCollector continues ingesting — the field is in
rawblob onqc_issuesbut not in any normalized fieldWhen the dashboard team wants to use it: add a
qc.custom-field-overridesentry mapping it to a logical name, restart collector, done
Manual trigger if customer needs it sooner: admin endpoint POST /admin/jira/discover-fields runs the discovery on demand.
J.2 Custom field renamed
Example: customer renames "Test Status" → "Execution Outcome."
What happens:
Field-discovery still finds the field by ID (
customfield_12031)Display name in
qc_field_mapupdatesHeuristic matcher might un-match it ("Test Status" was a name match)
Mitigation: matcher prefers
schema.customsource over name when source is unambiguous (com.smartbear.zephyrforjira→ it's still the test status field regardless of display name)
J.3 Custom field deleted
What happens:
Field-discovery returns no entry for that ID
qc_field_mapentry is markeddeleted=true(soft-delete, preserves historical context)Issue sync logs a warning the first time it tries to read a deleted field
New issues stop populating the corresponding
test.*sub-field — old data remains intact
J.4 New status added to workflow
Example: customer adds "Awaiting Customer Verification" status.
What happens:
Status-discovery (called as part of cycle prep) picks it up
Mapped to
OPEN/IN_PROGRESS/CLOSEDbased on itsstatusCategoryautomaticallyIf the admin assigned the wrong category → ops can correct via
qc.status-category-overridesconfig
J.5 New priority added
What happens: unknown priorities map to MEDIUM by default. Customer can add an override entry. Dashboard never breaks — just shows that priority as MEDIUM until configured.
J.6 Issue type renamed
Example: customer renames "Test" → "Manual Test Case."
What happens:
Issuetype-discovery picks up the new name
Default normalization no longer matches (
"Test"was hardcoded in the map)Mitigation: ship
qc.issuetype-mapoverrides as a config-only fix; ops adds"Manual Test Case": TESTand restarts. No code change.
J.7 Jira version upgrade (9.x → 10.x or 10.x → 11.x)
What happens:
serverInfore-reads on next cycle, logs new versionIf 10.x added new endpoints (e.g.,
/rest/api/2/project/search), collector starts using them automatically (already gated on version check)If a future Jira removes
/rest/api/2/...entirely (Atlassian has hinted at this for Cloud, not DC) — we'd need a code change to switch to v3 or beyond. Detected at startup; collector logs an error and refuses to run rather than silently mis-collecting.
J.8 Zephyr Essential upgrade
What happens:
Custom field IDs are typically stable across SmartBear plugin upgrades
If the plugin changes its field schema (new option in test status, new field added), field-discovery picks it up next run
If plugin is removed: discovered customs disappear;
test.*sub-doc stops being populated. Old data remains. Dashboard's Test Coverage tab gracefully degrades.
J.9 Field map override precedence
The order of resolution at issue-sync time:
This means ops can always force a value without waiting on a code release.
J.10 Detecting schema drift in production
The collector emits Micrometer metrics:
Set alerts on these so we know when drift starts before it bites.
Section K — Open answers we need (no API call required)
Quick questions we'd like answered alongside the API responses:
Jira version inventory — which versions are deployed (10.3.5? 9.12.5? both?), which is the primary target for v1, and is there a planned upgrade window?
Tenant size — rough numbers: how many projects, how many open issues, how many test cases, how many test executions per month? Drives whether we need pre-aggregation from day one.
First-cycle backfill window — accept our default of "all open issues + closed-180d," or specify different? E.g., regulated customers sometimes want full history.
History retention — accept default 30 days for
qc_issue_history, or longer? (This is the TTL on status-transition records, not on issues themselves.)Project scope — do we ingest every project the PAT can see, or scope down via allowlist? If allowlist, give us the list.
Zephyr-enabled projects — are tests / executions only used in some projects, or company-wide? If subset, we can scope the test-related custom-field reads to those projects (small perf win).
Test status labels — does the customer use the standard Pass / Fail / Blocked / WIP / Unexecuted, or have they added custom states (Retest, Skipped, etc.)?
Test cycle model — is "Test Cycle" an option list, a separate issue type, a structured field, or a label convention? (Varies between Zephyr Essential versions — we need to know which one the customer's on.)
Customer-defined link types — beyond
tests/is tested by, are there custom link types we should care about (e.g.,verifies,validates)?Severity field — is severity inferred from
priority, or does the customer have a separate custom "Severity" field? (Some regulated industries do.)Sprint usage — does the customer use Jira Software sprints? If yes, sprint custom field ID. If no, we skip that part of the field map.
Network access — is Jira reachable from where the collector runs (VPN? whitelisted IP? mTLS?). Affects deployment, not code.
Time zone — what time zone is Jira configured for? Affects how we interpret
created/updatedISO strings (Jira returns instance-tz, not always UTC). The dashboard should show consistent times.Holidays / business hours for SLA — the dashboard's SLA computation could be calendar-day or business-hour. Spec implies calendar; confirm.
Minimum viable set
If the customer can only send a subset, prioritize these. With these we can ship the foundation and field map; everything else refines.
P0
A1 /myself
Confirms PAT works
P0
A2 /serverInfo
Confirms version
P0
B1 /project
Confirms discovery shape
P0
C1 /search core query (with expand=changelog,names,schema)
The single most informative response
P0
E1 /field
Custom field IDs
P0
F3 /issue/<TEST_KEY>?fields=*all&expand=names,schema
Zephyr Rosetta stone
P1
D1 /status
Status normalization
P1
D2 /priority
Severity normalization
P1
D3 /issuetype
Issue-type normalization
P1
F4 /issue/<EXEC_KEY>?fields=*all&expand=names,schema
Test execution shape
P1
G2 /issueLinkType
Link-type catalog
P2
C2 pagination boundary
Confirms total cap
P2
C3 single-issue changelog
Belt-and-suspenders
P2
F1, F2
Useful but redundant with F3, F4
P2
F5 /rest/zephyr/... probe
Useful but we can probe ourselves once we have credentials
P2
H1 headers
Useful but observable at runtime
P0 + P1 = ten responses. That's enough for us to ship steps 1–4 of the implementation plan with high confidence.
Document version: 2026-04-28 — Intake stage. To be retired once responses are received and DTOs / field maps are seeded into the codebase. After that, schema-change handling lives in code (§J) rather than in this doc.
Last updated