# debug ### Reading Logs from Podman The two collectors run as separate containers: | Container | Service | | ----------------------- | ------------------------------------------------------------- | | `gitlab-collector` | GitLab DevInsight collection (`GitLabCollectorService`) | | `ci-pipeline-collector` | Pipeline detail enrichment (`GitLabPipelineCollectorService`) | Verify they're running: ```bash podman ps --format "{{.Names}}\t{{.Status}}" | grep -E "gitlab-collector|ci-pipeline-collector" ``` > **macOS note:** these commands use `mktime`, which requires GNU awk. Install with `brew install gawk` and replace `awk` with `gawk`. *** #### A. GitLab Full Collection Cycle — Wall-Clock per Run ```bash podman logs gitlab-collector 2>&1 | awk ' /Starting GitLab DevInsight data collection cycle/ { start_ts = $1 " " $2 start_epoch = mktime(gensub(/[-:]/," ","g",start_ts)) } /Completed GitLab DevInsight data collection cycle/ && start_epoch { end_ts = $1 " " $2 end_epoch = mktime(gensub(/[-:]/," ","g",end_ts)) printf "%s -> %s %d sec\n", start_ts, end_ts, end_epoch - start_epoch start_epoch = 0 }' ``` #### B. GitLab Per-Job Aggregated Stats (runs / avg / min / max in ms) ```bash podman logs gitlab-collector 2>&1 | awk -F'[][]' '/GitLabCollectorService - Job \[/ { job=$2 match($0, /in ([0-9]+)ms/, m); ms=m[1] n[job]++; sum[job]+=ms if (msmax[job]) max[job]=ms } END { printf "%-25s %6s %10s %10s %10s\n","job","runs","avg_ms","min_ms","max_ms" for (j in n) printf "%-25s %6d %10d %10d %10d\n", j, n[j], sum[j]/n[j], min[j], max[j] }' | sort ``` #### C. Pipeline Collector — Per-Cycle Wall-Clock ```bash podman logs ci-pipeline-collector 2>&1 | awk ' /GitLabPipelineCollectorService - Refreshing ALL pipeline details/ { s_ts = $1 " " $2 s = mktime(gensub(/[-:]/," ","g",s_ts)) } /GitLabPipelineCollectorService - Stage 2: enriched/ && s { e_ts = $1 " " $2 e = mktime(gensub(/[-:]/," ","g",e_ts)) match($0, /enriched ([0-9]+) pipelines/, m) printf "%s -> %s %4d sec (%s pipelines)\n", s_ts, e_ts, e-s, m[1] s = 0 }' ``` #### D. Pipeline Collector — Summary Stats ```bash podman logs ci-pipeline-collector 2>&1 | awk ' /GitLabPipelineCollectorService - Refreshing ALL pipeline details/ { s = mktime(gensub(/[-:]/," ","g",$1" "$2)) } /GitLabPipelineCollectorService - Stage 2: enriched/ && s { e = mktime(gensub(/[-:]/," ","g",$1" "$2)) d = e - s; n++; sum += d if (min==0 || dmax) max=d s = 0 } END { if (n) printf "pipeline cycles: %d avg=%.1fs min=%ds max=%ds total=%ds\n", n, sum/n, min, max, sum }' ``` *** ### Capturing Errors & Warnings Run these against whichever container you're inspecting. Replace `` with `gitlab-collector` or `ci-pipeline-collector`. #### E. All ERROR / WARN lines ```bash # GitLab collector podman logs gitlab-collector 2>&1 | grep -E " (ERROR|WARN) " # Pipeline collector podman logs ci-pipeline-collector 2>&1 | grep -E " (ERROR|WARN) " ``` #### F. Errors with stack-trace context (15 lines after each ERROR) ```bash podman logs 2>&1 | grep -A 15 " ERROR " ``` #### G. Distinct errors, deduplicated and counted ```bash podman logs 2>&1 \ | grep -E " ERROR " \ | sed -E 's/^[0-9-]+ [0-9:]+ \[[^]]+\] ERROR //' \ | sort | uniq -c | sort -rn ``` #### H. HTTP failure status codes from the GitLab API ```bash podman logs 2>&1 \ | grep -oE "[0-9]{3} (Not Found|Unauthorized|Forbidden|Bad Request|Too Many Requests|Internal Server Error|Bad Gateway|Service Unavailable|Gateway Timeout)" \ | sort | uniq -c | sort -rn ``` #### I. Per-project soft failures (404, skipped resources) ```bash podman logs 2>&1 \ | grep -oE "Could not fetch [a-z_ ]+ for project [0-9]+: [0-9]+ [A-Za-z ]+" \ | sort | uniq -c | sort -rn ``` #### J. Rate-limiting, timeouts, connection issues ```bash podman logs 2>&1 \ | grep -iE "rate.?limit|429|timeout|timed out|connection reset|connection refused|socket" \ | grep -vE " DEBUG " ``` #### K. Cycle health summary — errors per GitLab cycle ```bash podman logs gitlab-collector 2>&1 | awk ' /Starting GitLab DevInsight data collection cycle/ { in_cycle = 1; start = $1 " " $2; errs = 0 } in_cycle && / ERROR / { errs++ } /Completed GitLab DevInsight data collection cycle/ && in_cycle { printf "%s -> %s errors=%d\n", start, $1 " " $2, errs in_cycle = 0 }' ``` *** ### Tips for Podman Log Capture * **Limit time range:** `podman logs --since 1h gitlab-collector 2>&1 | …`, or since a timestamp: `--since 2026-04-24T09:00:00`. * **Tail recent only:** `podman logs --tail 5000 gitlab-collector 2>&1 | …` for quick spot checks. * **Live monitor errors:** `podman logs -f gitlab-collector 2>&1 | grep --line-buffered -E " (ERROR|WARN) "`. * **Save a snapshot for offline analysis:** ```bash podman logs gitlab-collector > gitlab-collector.log 2>&1 podman logs ci-pipeline-collector > ci-pipeline-collector.log 2>&1 ``` Then run the same awk/grep commands against the files (drop the `podman logs … |` prefix and pass the file as the last argument). * **Why `2>&1`:** Spring Boot writes logs to stderr by default; without `2>&1`, `grep`/`awk` only see stdout. * **Podman timestamps vs app timestamps:** the awk scripts above parse the app's own `YYYY-MM-DD HH:MM:SS` prefix from `$1 $2`. Avoid `--timestamps` (it adds an RFC3339 prefix that shifts the columns and breaks the scripts). ## Pipeline Categorization Rules Runtime-mutable rules stored in MongoDB (`pipeline_category_rules` collection). They're evaluated by the collector during pipeline enrichment and by the `/pipelines/recategorize` endpoint when re-stamping stored pipelines. * **Editable at runtime** via `POST /api/v1/collector/gitlab/classification/rules` — no container rebuild, no redeploy. * **Applied to stored data** via `POST /api/v1/collector/gitlab/pipelines/recategorize?daysBack=N` — pure Mongo, no GitLab calls. * **Evaluated in priority order** (ascending). First matching rule sets `pipelineCategory`. **Every** matching rule contributes labels. *** ### Quick Reference | Priority | Name | Match type | Matched against | Sub-label dimensions | | -------- | ------------ | ---------- | -------------------------------------------------------------- | -------------------- | | 10 | **NPC2** | namespace | `eis-terraform-npc-provisioning` (substring on namespace path) | `env` | | 20 | **TOM** | namespace | `eis-grafana-tom` (substring) | `subteam` | | 30 | **Database** | namespace | `EIS-DBMW-DBENG` (substring) | — | | 40 | **NCD** | include | `.gitlab-ci.yml` `include.project` matches `.*ncd.*pipeline.*` | `subtype` | > **Fallthrough:** pipelines matching no rule receive `pipelineCategory = "custom"`. Pipelines with no `.gitlab-ci.yml` and no namespace match receive `pipelineCategory = "none"`. *** ### Rule 1 — NPC2 (priority 10) Namespace-based. Categorizes any project whose namespace path contains `eis-terraform-npc-provisioning`. Works for both lab (`gitlab.com`) and production (`gitlab.nomura.com`) URLs. #### JSON ```json { "name": "NPC2", "matchType": "namespace", "namespacePattern": "eis-terraform-npc-provisioning", "priority": 10, "enabled": true, "description": "NPC2 — env from namespace suffix on first path segment", "subRules": [ { "field": "namespace", "pattern": "-(nonprodtest|nonprod)(/|$)", "label": "nonprod", "key": "env", "enabled": true }, { "field": "namespace", "pattern": "-(prodtest|prod)(/|$)", "label": "prod", "key": "env", "enabled": true }, { "field": "namespace", "pattern": "-qa(/|$)", "label": "qa", "key": "env", "enabled": true } ] } ``` #### Sub-label table | Namespace suffix | `env` label | | ---------------- | ----------- | | `-prodtest` | `prod` | | `-prod` | `prod` | | `-nonprodtest` | `nonprod` | | `-nonprod` | `nonprod` | | `-qa` | `qa` | > Sub-rule order matters: `nonprod` is listed **before** `prod` because `-prod` would otherwise substring-match `-nonprodtest`. #### Example outputs | URL | `pipelineCategory` | `pipelineLabels` | `pipelineLabelMap` | | -------------------------------------------------------------- | ------------------ | ----------------- | -------------------------------- | | `gitlab.com/eis-terraform-npc-provisioning-prodtest/foo` | NPC2 | `[NPC2, prod]` | `{category: NPC2, env: prod}` | | `gitlab.nomura.com/eis-terraform-npc-provisioning-nonprod/bar` | NPC2 | `[NPC2, nonprod]` | `{category: NPC2, env: nonprod}` | | `gitlab.nomura.com/eis-terraform-npc-provisioning-qa/x` | NPC2 | `[NPC2, qa]` | `{category: NPC2, env: qa}` | *** ### Rule 2 — TOM (priority 20) Namespace-based. Matches `eis-grafana-tom` and distinguishes the `rules` vs `eng` sub-teams. #### JSON ```json { "name": "TOM", "matchType": "namespace", "namespacePattern": "eis-grafana-tom", "priority": 20, "enabled": true, "description": "TOM grafana/observability — sub-team from first-segment suffix (rules / eng)", "subRules": [ { "field": "namespace", "pattern": "-rules(/|$)", "label": "rules", "key": "subteam", "enabled": true }, { "field": "namespace", "pattern": "-eng(/|$)", "label": "eng", "key": "subteam", "enabled": true } ] } ``` #### Sub-label table | Namespace suffix | `subteam` label | | ---------------- | --------------- | | `-rules` | `rules` | | `-eng` | `eng` | #### Example outputs | URL | `pipelineCategory` | `pipelineLabels` | `pipelineLabelMap` | | ----------------------------------------------------------------- | ------------------ | ---------------- | --------------------------------- | | `gitlab.nomura.com/eis-grafana-tom-rules/rules-repo` | TOM | `[TOM, rules]` | `{category: TOM, subteam: rules}` | | `gitlab.nomura.com/eis-grafana-tom-eng/grafana-ui-automation-job` | TOM | `[TOM, eng]` | `{category: TOM, subteam: eng}` | | `gitlab.nomura.com/eis-grafana-tom/whatever` | TOM | `[TOM]` | `{category: TOM}` | *** ### Rule 3 — Database (priority 30) Namespace-based. Catches the DBMW database engineering group. No sub-rules. #### JSON ```json { "name": "Database", "matchType": "namespace", "namespacePattern": "EIS-DBMW-DBENG", "priority": 30, "enabled": true, "description": "DBMW database engineering pipelines" } ``` #### Example outputs | URL | `pipelineCategory` | `pipelineLabels` | `pipelineLabelMap` | | ---------------------------------------------------- | ------------------ | ---------------- | ---------------------- | | `gitlab.nomura.com/EIS-DBMW-DBENG/dbmw-housekeeping` | Database | `[Database]` | `{category: Database}` | *** ### Rule 4 — NCD (priority 40) Include-based. Matches any `.gitlab-ci.yml` whose `include.project` value contains `ncd.*pipeline.*` (e.g. `gts-cta-strategy-innersource/ncd/pipeline-templates`). Sub-rules examine `include.file` to identify Helm CI / Application CI / Dependency CI. #### JSON ```json { "name": "NCD", "matchType": "include", "includeProjectPattern": ".*ncd.*pipeline.*", "priority": 40, "enabled": true, "description": "NCD template family — Helm sub-rule listed first so it outranks Application", "subRules": [ { "field": "templateFile", "pattern": "NCD-Build\\.helm\\.local\\.gitlab-ci\\.yml", "label": "Helm CI", "key": "subtype", "enabled": true }, { "field": "templateFile", "pattern": "NCD-Dependency\\.local\\.gitlab-ci\\.yml", "label": "Dependency CI", "key": "subtype", "enabled": true }, { "field": "templateFile", "pattern": "NCD-Build\\.local\\.gitlab-ci\\.yml", "label": "Application CI", "key": "subtype", "enabled": true } ] } ``` #### Sub-label table | `include.file` | `subtype` label | | ------------------------------------ | ---------------- | | `NCD-Build.helm.local.gitlab-ci.yml` | `Helm CI` | | `NCD-Dependency.local.gitlab-ci.yml` | `Dependency CI` | | `NCD-Build.local.gitlab-ci.yml` | `Application CI` | > Sub-rule order matters: `Helm CI` is checked **before** `Application CI` because `NCD-Build.helm.local.gitlab-ci.yml` would otherwise substring-match the Application pattern. #### Example outputs | `include.project` | `include.file` | `pipelineCategory` | `pipelineLabels` | `pipelineLabelMap` | | ------------------------------------------------------ | ------------------------------------ | ------------------ | ----------------------- | ------------------------------------------ | | `gts-cta-strategy-innersource/ncd/pipeline-templates` | `NCD-Build.helm.local.gitlab-ci.yml` | NCD | `[NCD, Helm CI]` | `{category: NCD, subtype: Helm CI}` | | `gts-cta-strategy-innersource/ncd/pipeline-templates` | `NCD-Build.local.gitlab-ci.yml` | NCD | `[NCD, Application CI]` | `{category: NCD, subtype: Application CI}` | | `gts-cta-strategy-innersource/ncd/pipeline-dependency` | `NCD-Dependency.local.gitlab-ci.yml` | NCD | `[NCD, Dependency CI]` | `{category: NCD, subtype: Dependency CI}` | *** ### Installation — POST all rules in one go ```bash COLLECTOR=http://localhost:8088 # NPC2 curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \ -H 'Content-Type: application/json' -d '{ "name": "NPC2", "matchType": "namespace", "namespacePattern": "eis-terraform-npc-provisioning", "priority": 10, "enabled": true, "description": "NPC2 — env from namespace suffix on first path segment", "subRules": [ { "field": "namespace", "pattern": "-(nonprodtest|nonprod)(/|$)", "label": "nonprod", "key": "env", "enabled": true }, { "field": "namespace", "pattern": "-(prodtest|prod)(/|$)", "label": "prod", "key": "env", "enabled": true }, { "field": "namespace", "pattern": "-qa(/|$)", "label": "qa", "key": "env", "enabled": true } ] }' # TOM curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \ -H 'Content-Type: application/json' -d '{ "name": "TOM", "matchType": "namespace", "namespacePattern": "eis-grafana-tom", "priority": 20, "enabled": true, "description": "TOM grafana/observability — sub-team from first-segment suffix (rules / eng)", "subRules": [ { "field": "namespace", "pattern": "-rules(/|$)", "label": "rules", "key": "subteam", "enabled": true }, { "field": "namespace", "pattern": "-eng(/|$)", "label": "eng", "key": "subteam", "enabled": true } ] }' # Database curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \ -H 'Content-Type: application/json' -d '{ "name": "Database", "matchType": "namespace", "namespacePattern": "EIS-DBMW-DBENG", "priority": 30, "enabled": true, "description": "DBMW database engineering pipelines" }' # NCD curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \ -H 'Content-Type: application/json' -d '{ "name": "NCD", "matchType": "include", "includeProjectPattern": ".*ncd.*pipeline.*", "priority": 40, "enabled": true, "description": "NCD template family — Helm sub-rule listed first so it outranks Application", "subRules": [ { "field": "templateFile", "pattern": "NCD-Build\\.helm\\.local\\.gitlab-ci\\.yml", "label": "Helm CI", "key": "subtype", "enabled": true }, { "field": "templateFile", "pattern": "NCD-Dependency\\.local\\.gitlab-ci\\.yml", "label": "Dependency CI", "key": "subtype", "enabled": true }, { "field": "templateFile", "pattern": "NCD-Build\\.local\\.gitlab-ci\\.yml", "label": "Application CI", "key": "subtype", "enabled": true } ] }' ``` #### Verify ```bash curl -s "$COLLECTOR/api/v1/collector/gitlab/classification/rules" | python3 -c " import json, sys d = json.load(sys.stdin)['data']['activeRules'] print(f\"source={d['source']} count={d['count']}\") for r in d['rules']: print(f\" {r['priority']:>3} {r['name']:<10} matchType={r['matchType']} subRules={len(r['subRules'])}\") " ``` Expected: ``` source=mongo count=4 10 NPC2 matchType=namespace subRules=3 20 TOM matchType=namespace subRules=2 30 Database matchType=namespace subRules=0 40 NCD matchType=include subRules=3 ``` #### Apply to stored data ```bash curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/pipelines/recategorize?daysBack=90" ``` Runs in seconds — pure Mongo + config, no GitLab calls. *** ### Sub-rule field reference The `field` attribute on a sub-rule decides which raw fact the pattern is matched against: | `field` value | Source fact | Notes | | ----------------- | --------------------------------------- | ---------------------------------------------------- | | `templateProject` | `include.project` from `.gitlab-ci.yml` | The template repo path | | `templateRef` | `include.ref` | The branch / tag of the included template | | `templateFile` | `include.file` | The specific file inside the template repo | | `namespace` | parsed from `repoUrl` | E.g. `eis-terraform-npc-provisioning-prod/some-repo` | | `repoUrl` | full project URL | Useful for rare URL-based regexes | #### Sub-rule semantics * All sub-rule patterns are **case-insensitive** and use `Matcher.find()` (substring match). * Sub-rules are evaluated in array order. List the **most specific** patterns first when they can overlap (e.g. Helm before Application; nonprod before prod). * Each matching sub-rule contributes its `label` to `pipelineLabels`. If `key` is set, it also lands in `pipelineLabelMap[key]` (last writer wins on key collisions). * Set `enabled: false` to keep a sub-rule on record without applying it. *** ### Operational notes * The classifier caches rules in-process for 60 s. Mutating endpoints invalidate the cache automatically. * **Namespace rules require `pipeline.repoUrl`** to be populated. That field is stamped during enrichment from the project record — which means projects must already exist in `dev_insight_projects_collection` for namespace rules to fire. Run `/gitlab/collect` once to seed projects, then `/pipelines/refresh` for enrichment. * For include-based rules, only `pipelineTemplateProject` / `pipelineTemplateRef` / `pipelineTemplateFile` are needed — these are captured directly during pipeline enrichment from the parsed `.gitlab-ci.yml`. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.sec1.io/user-docs/9-setup-instructions/debug.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.