# debug

### Reading Logs from Podman

The two collectors run as separate containers:

| Container               | Service                                                       |
| ----------------------- | ------------------------------------------------------------- |
| `gitlab-collector`      | GitLab DevInsight collection (`GitLabCollectorService`)       |
| `ci-pipeline-collector` | Pipeline detail enrichment (`GitLabPipelineCollectorService`) |

Verify they're running:

```bash
podman ps --format "{{.Names}}\t{{.Status}}" | grep -E "gitlab-collector|ci-pipeline-collector"
```

> **macOS note:** these commands use `mktime`, which requires GNU awk. Install with `brew install gawk` and replace `awk` with `gawk`.

***

#### A. GitLab Full Collection Cycle — Wall-Clock per Run

```bash
podman logs gitlab-collector 2>&1 | awk '
  /Starting GitLab DevInsight data collection cycle/ {
    start_ts = $1 " " $2
    start_epoch = mktime(gensub(/[-:]/," ","g",start_ts))
  }
  /Completed GitLab DevInsight data collection cycle/ && start_epoch {
    end_ts = $1 " " $2
    end_epoch = mktime(gensub(/[-:]/," ","g",end_ts))
    printf "%s -> %s   %d sec\n", start_ts, end_ts, end_epoch - start_epoch
    start_epoch = 0
  }'
```

#### B. GitLab Per-Job Aggregated Stats (runs / avg / min / max in ms)

```bash
podman logs gitlab-collector 2>&1 | awk -F'[][]' '/GitLabCollectorService - Job \[/ {
    job=$2
    match($0, /in ([0-9]+)ms/, m); ms=m[1]
    n[job]++; sum[job]+=ms
    if (ms<min[job]||min[job]==0) min[job]=ms
    if (ms>max[job]) max[job]=ms
}
END {
    printf "%-25s %6s %10s %10s %10s\n","job","runs","avg_ms","min_ms","max_ms"
    for (j in n) printf "%-25s %6d %10d %10d %10d\n", j, n[j], sum[j]/n[j], min[j], max[j]
}' | sort
```

#### C. Pipeline Collector — Per-Cycle Wall-Clock

```bash
podman logs ci-pipeline-collector 2>&1 | awk '
  /GitLabPipelineCollectorService - Refreshing ALL pipeline details/ {
    s_ts = $1 " " $2
    s = mktime(gensub(/[-:]/," ","g",s_ts))
  }
  /GitLabPipelineCollectorService - Stage 2: enriched/ && s {
    e_ts = $1 " " $2
    e = mktime(gensub(/[-:]/," ","g",e_ts))
    match($0, /enriched ([0-9]+) pipelines/, m)
    printf "%s -> %s   %4d sec   (%s pipelines)\n", s_ts, e_ts, e-s, m[1]
    s = 0
  }'
```

#### D. Pipeline Collector — Summary Stats

```bash
podman logs ci-pipeline-collector 2>&1 | awk '
  /GitLabPipelineCollectorService - Refreshing ALL pipeline details/ {
    s = mktime(gensub(/[-:]/," ","g",$1" "$2))
  }
  /GitLabPipelineCollectorService - Stage 2: enriched/ && s {
    e = mktime(gensub(/[-:]/," ","g",$1" "$2))
    d = e - s; n++; sum += d
    if (min==0 || d<min) min=d
    if (d>max) max=d
    s = 0
  }
  END {
    if (n) printf "pipeline cycles: %d   avg=%.1fs   min=%ds   max=%ds   total=%ds\n",
                  n, sum/n, min, max, sum
  }'
```

***

### Capturing Errors & Warnings

Run these against whichever container you're inspecting. Replace `<container>` with `gitlab-collector` or `ci-pipeline-collector`.

#### E. All ERROR / WARN lines

```bash
# GitLab collector
podman logs gitlab-collector 2>&1 | grep -E " (ERROR|WARN) "

# Pipeline collector
podman logs ci-pipeline-collector 2>&1 | grep -E " (ERROR|WARN) "
```

#### F. Errors with stack-trace context (15 lines after each ERROR)

```bash
podman logs <container> 2>&1 | grep -A 15 " ERROR "
```

#### G. Distinct errors, deduplicated and counted

```bash
podman logs <container> 2>&1 \
  | grep -E " ERROR " \
  | sed -E 's/^[0-9-]+ [0-9:]+ \[[^]]+\] ERROR //' \
  | sort | uniq -c | sort -rn
```

#### H. HTTP failure status codes from the GitLab API

```bash
podman logs <container> 2>&1 \
  | grep -oE "[0-9]{3} (Not Found|Unauthorized|Forbidden|Bad Request|Too Many Requests|Internal Server Error|Bad Gateway|Service Unavailable|Gateway Timeout)" \
  | sort | uniq -c | sort -rn
```

#### I. Per-project soft failures (404, skipped resources)

```bash
podman logs <container> 2>&1 \
  | grep -oE "Could not fetch [a-z_ ]+ for project [0-9]+: [0-9]+ [A-Za-z ]+" \
  | sort | uniq -c | sort -rn
```

#### J. Rate-limiting, timeouts, connection issues

```bash
podman logs <container> 2>&1 \
  | grep -iE "rate.?limit|429|timeout|timed out|connection reset|connection refused|socket" \
  | grep -vE " DEBUG "
```

#### K. Cycle health summary — errors per GitLab cycle

```bash
podman logs gitlab-collector 2>&1 | awk '
  /Starting GitLab DevInsight data collection cycle/ {
    in_cycle = 1; start = $1 " " $2; errs = 0
  }
  in_cycle && / ERROR / { errs++ }
  /Completed GitLab DevInsight data collection cycle/ && in_cycle {
    printf "%s -> %s   errors=%d\n", start, $1 " " $2, errs
    in_cycle = 0
  }'
```

***

### Tips for Podman Log Capture

* **Limit time range:** `podman logs --since 1h gitlab-collector 2>&1 | …`, or since a timestamp: `--since 2026-04-24T09:00:00`.
* **Tail recent only:** `podman logs --tail 5000 gitlab-collector 2>&1 | …` for quick spot checks.
* **Live monitor errors:** `podman logs -f gitlab-collector 2>&1 | grep --line-buffered -E " (ERROR|WARN) "`.
* **Save a snapshot for offline analysis:**

  ```bash
  podman logs gitlab-collector       > gitlab-collector.log 2>&1
  podman logs ci-pipeline-collector  > ci-pipeline-collector.log 2>&1
  ```

  Then run the same awk/grep commands against the files (drop the `podman logs … |` prefix and pass the file as the last argument).
* **Why `2>&1`:** Spring Boot writes logs to stderr by default; without `2>&1`, `grep`/`awk` only see stdout.
* **Podman timestamps vs app timestamps:** the awk scripts above parse the app's own `YYYY-MM-DD HH:MM:SS` prefix from `$1 $2`. Avoid `--timestamps` (it adds an RFC3339 prefix that shifts the columns and breaks the scripts).

## Pipeline Categorization Rules

Runtime-mutable rules stored in MongoDB (`pipeline_category_rules` collection). They're evaluated by the collector during pipeline enrichment and by the `/pipelines/recategorize` endpoint when re-stamping stored pipelines.

* **Editable at runtime** via `POST /api/v1/collector/gitlab/classification/rules` — no container rebuild, no redeploy.
* **Applied to stored data** via `POST /api/v1/collector/gitlab/pipelines/recategorize?daysBack=N` — pure Mongo, no GitLab calls.
* **Evaluated in priority order** (ascending). First matching rule sets `pipelineCategory`. **Every** matching rule contributes labels.

***

### Quick Reference

| Priority | Name         | Match type | Matched against                                                | Sub-label dimensions |
| -------- | ------------ | ---------- | -------------------------------------------------------------- | -------------------- |
| 10       | **NPC2**     | namespace  | `eis-terraform-npc-provisioning` (substring on namespace path) | `env`                |
| 20       | **TOM**      | namespace  | `eis-grafana-tom` (substring)                                  | `subteam`            |
| 30       | **Database** | namespace  | `EIS-DBMW-DBENG` (substring)                                   | —                    |
| 40       | **NCD**      | include    | `.gitlab-ci.yml` `include.project` matches `.*ncd.*pipeline.*` | `subtype`            |

> **Fallthrough:** pipelines matching no rule receive `pipelineCategory = "custom"`. Pipelines with no `.gitlab-ci.yml` and no namespace match receive `pipelineCategory = "none"`.

***

### Rule 1 — NPC2 (priority 10)

Namespace-based. Categorizes any project whose namespace path contains `eis-terraform-npc-provisioning`. Works for both lab (`gitlab.com`) and production (`gitlab.nomura.com`) URLs.

#### JSON

```json
{
  "name": "NPC2",
  "matchType": "namespace",
  "namespacePattern": "eis-terraform-npc-provisioning",
  "priority": 10,
  "enabled": true,
  "description": "NPC2 — env from namespace suffix on first path segment",
  "subRules": [
    { "field": "namespace", "pattern": "-(nonprodtest|nonprod)(/|$)", "label": "nonprod", "key": "env", "enabled": true },
    { "field": "namespace", "pattern": "-(prodtest|prod)(/|$)",       "label": "prod",    "key": "env", "enabled": true },
    { "field": "namespace", "pattern": "-qa(/|$)",                    "label": "qa",      "key": "env", "enabled": true }
  ]
}
```

#### Sub-label table

| Namespace suffix | `env` label |
| ---------------- | ----------- |
| `-prodtest`      | `prod`      |
| `-prod`          | `prod`      |
| `-nonprodtest`   | `nonprod`   |
| `-nonprod`       | `nonprod`   |
| `-qa`            | `qa`        |

> Sub-rule order matters: `nonprod` is listed **before** `prod` because `-prod` would otherwise substring-match `-nonprodtest`.

#### Example outputs

| URL                                                            | `pipelineCategory` | `pipelineLabels`  | `pipelineLabelMap`               |
| -------------------------------------------------------------- | ------------------ | ----------------- | -------------------------------- |
| `gitlab.com/eis-terraform-npc-provisioning-prodtest/foo`       | NPC2               | `[NPC2, prod]`    | `{category: NPC2, env: prod}`    |
| `gitlab.nomura.com/eis-terraform-npc-provisioning-nonprod/bar` | NPC2               | `[NPC2, nonprod]` | `{category: NPC2, env: nonprod}` |
| `gitlab.nomura.com/eis-terraform-npc-provisioning-qa/x`        | NPC2               | `[NPC2, qa]`      | `{category: NPC2, env: qa}`      |

***

### Rule 2 — TOM (priority 20)

Namespace-based. Matches `eis-grafana-tom` and distinguishes the `rules` vs `eng` sub-teams.

#### JSON

```json
{
  "name": "TOM",
  "matchType": "namespace",
  "namespacePattern": "eis-grafana-tom",
  "priority": 20,
  "enabled": true,
  "description": "TOM grafana/observability — sub-team from first-segment suffix (rules / eng)",
  "subRules": [
    { "field": "namespace", "pattern": "-rules(/|$)", "label": "rules", "key": "subteam", "enabled": true },
    { "field": "namespace", "pattern": "-eng(/|$)",   "label": "eng",   "key": "subteam", "enabled": true }
  ]
}
```

#### Sub-label table

| Namespace suffix | `subteam` label |
| ---------------- | --------------- |
| `-rules`         | `rules`         |
| `-eng`           | `eng`           |

#### Example outputs

| URL                                                               | `pipelineCategory` | `pipelineLabels` | `pipelineLabelMap`                |
| ----------------------------------------------------------------- | ------------------ | ---------------- | --------------------------------- |
| `gitlab.nomura.com/eis-grafana-tom-rules/rules-repo`              | TOM                | `[TOM, rules]`   | `{category: TOM, subteam: rules}` |
| `gitlab.nomura.com/eis-grafana-tom-eng/grafana-ui-automation-job` | TOM                | `[TOM, eng]`     | `{category: TOM, subteam: eng}`   |
| `gitlab.nomura.com/eis-grafana-tom/whatever`                      | TOM                | `[TOM]`          | `{category: TOM}`                 |

***

### Rule 3 — Database (priority 30)

Namespace-based. Catches the DBMW database engineering group. No sub-rules.

#### JSON

```json
{
  "name": "Database",
  "matchType": "namespace",
  "namespacePattern": "EIS-DBMW-DBENG",
  "priority": 30,
  "enabled": true,
  "description": "DBMW database engineering pipelines"
}
```

#### Example outputs

| URL                                                  | `pipelineCategory` | `pipelineLabels` | `pipelineLabelMap`     |
| ---------------------------------------------------- | ------------------ | ---------------- | ---------------------- |
| `gitlab.nomura.com/EIS-DBMW-DBENG/dbmw-housekeeping` | Database           | `[Database]`     | `{category: Database}` |

***

### Rule 4 — NCD (priority 40)

Include-based. Matches any `.gitlab-ci.yml` whose `include.project` value contains `ncd.*pipeline.*` (e.g. `gts-cta-strategy-innersource/ncd/pipeline-templates`). Sub-rules examine `include.file` to identify Helm CI / Application CI / Dependency CI.

#### JSON

```json
{
  "name": "NCD",
  "matchType": "include",
  "includeProjectPattern": ".*ncd.*pipeline.*",
  "priority": 40,
  "enabled": true,
  "description": "NCD template family — Helm sub-rule listed first so it outranks Application",
  "subRules": [
    { "field": "templateFile", "pattern": "NCD-Build\\.helm\\.local\\.gitlab-ci\\.yml", "label": "Helm CI",        "key": "subtype", "enabled": true },
    { "field": "templateFile", "pattern": "NCD-Dependency\\.local\\.gitlab-ci\\.yml",   "label": "Dependency CI",  "key": "subtype", "enabled": true },
    { "field": "templateFile", "pattern": "NCD-Build\\.local\\.gitlab-ci\\.yml",        "label": "Application CI", "key": "subtype", "enabled": true }
  ]
}
```

#### Sub-label table

| `include.file`                       | `subtype` label  |
| ------------------------------------ | ---------------- |
| `NCD-Build.helm.local.gitlab-ci.yml` | `Helm CI`        |
| `NCD-Dependency.local.gitlab-ci.yml` | `Dependency CI`  |
| `NCD-Build.local.gitlab-ci.yml`      | `Application CI` |

> Sub-rule order matters: `Helm CI` is checked **before** `Application CI` because `NCD-Build.helm.local.gitlab-ci.yml` would otherwise substring-match the Application pattern.

#### Example outputs

| `include.project`                                      | `include.file`                       | `pipelineCategory` | `pipelineLabels`        | `pipelineLabelMap`                         |
| ------------------------------------------------------ | ------------------------------------ | ------------------ | ----------------------- | ------------------------------------------ |
| `gts-cta-strategy-innersource/ncd/pipeline-templates`  | `NCD-Build.helm.local.gitlab-ci.yml` | NCD                | `[NCD, Helm CI]`        | `{category: NCD, subtype: Helm CI}`        |
| `gts-cta-strategy-innersource/ncd/pipeline-templates`  | `NCD-Build.local.gitlab-ci.yml`      | NCD                | `[NCD, Application CI]` | `{category: NCD, subtype: Application CI}` |
| `gts-cta-strategy-innersource/ncd/pipeline-dependency` | `NCD-Dependency.local.gitlab-ci.yml` | NCD                | `[NCD, Dependency CI]`  | `{category: NCD, subtype: Dependency CI}`  |

***

### Installation — POST all rules in one go

```bash
COLLECTOR=http://localhost:8088

# NPC2
curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \
  -H 'Content-Type: application/json' -d '{
  "name": "NPC2",
  "matchType": "namespace",
  "namespacePattern": "eis-terraform-npc-provisioning",
  "priority": 10,
  "enabled": true,
  "description": "NPC2 — env from namespace suffix on first path segment",
  "subRules": [
    { "field": "namespace", "pattern": "-(nonprodtest|nonprod)(/|$)", "label": "nonprod", "key": "env", "enabled": true },
    { "field": "namespace", "pattern": "-(prodtest|prod)(/|$)",       "label": "prod",    "key": "env", "enabled": true },
    { "field": "namespace", "pattern": "-qa(/|$)",                    "label": "qa",      "key": "env", "enabled": true }
  ]
}'

# TOM
curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \
  -H 'Content-Type: application/json' -d '{
  "name": "TOM",
  "matchType": "namespace",
  "namespacePattern": "eis-grafana-tom",
  "priority": 20,
  "enabled": true,
  "description": "TOM grafana/observability — sub-team from first-segment suffix (rules / eng)",
  "subRules": [
    { "field": "namespace", "pattern": "-rules(/|$)", "label": "rules", "key": "subteam", "enabled": true },
    { "field": "namespace", "pattern": "-eng(/|$)",   "label": "eng",   "key": "subteam", "enabled": true }
  ]
}'

# Database
curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \
  -H 'Content-Type: application/json' -d '{
  "name": "Database",
  "matchType": "namespace",
  "namespacePattern": "EIS-DBMW-DBENG",
  "priority": 30,
  "enabled": true,
  "description": "DBMW database engineering pipelines"
}'

# NCD
curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/classification/rules" \
  -H 'Content-Type: application/json' -d '{
  "name": "NCD",
  "matchType": "include",
  "includeProjectPattern": ".*ncd.*pipeline.*",
  "priority": 40,
  "enabled": true,
  "description": "NCD template family — Helm sub-rule listed first so it outranks Application",
  "subRules": [
    { "field": "templateFile", "pattern": "NCD-Build\\.helm\\.local\\.gitlab-ci\\.yml", "label": "Helm CI",        "key": "subtype", "enabled": true },
    { "field": "templateFile", "pattern": "NCD-Dependency\\.local\\.gitlab-ci\\.yml",   "label": "Dependency CI",  "key": "subtype", "enabled": true },
    { "field": "templateFile", "pattern": "NCD-Build\\.local\\.gitlab-ci\\.yml",        "label": "Application CI", "key": "subtype", "enabled": true }
  ]
}'
```

#### Verify

```bash
curl -s "$COLLECTOR/api/v1/collector/gitlab/classification/rules" | python3 -c "
import json, sys
d = json.load(sys.stdin)['data']['activeRules']
print(f\"source={d['source']}  count={d['count']}\")
for r in d['rules']:
    print(f\"  {r['priority']:>3}  {r['name']:<10}  matchType={r['matchType']}  subRules={len(r['subRules'])}\")
"
```

Expected:

```
source=mongo  count=4
   10  NPC2        matchType=namespace  subRules=3
   20  TOM         matchType=namespace  subRules=2
   30  Database    matchType=namespace  subRules=0
   40  NCD         matchType=include    subRules=3
```

#### Apply to stored data

```bash
curl -sX POST "$COLLECTOR/api/v1/collector/gitlab/pipelines/recategorize?daysBack=90"
```

Runs in seconds — pure Mongo + config, no GitLab calls.

***

### Sub-rule field reference

The `field` attribute on a sub-rule decides which raw fact the pattern is matched against:

| `field` value     | Source fact                             | Notes                                                |
| ----------------- | --------------------------------------- | ---------------------------------------------------- |
| `templateProject` | `include.project` from `.gitlab-ci.yml` | The template repo path                               |
| `templateRef`     | `include.ref`                           | The branch / tag of the included template            |
| `templateFile`    | `include.file`                          | The specific file inside the template repo           |
| `namespace`       | parsed from `repoUrl`                   | E.g. `eis-terraform-npc-provisioning-prod/some-repo` |
| `repoUrl`         | full project URL                        | Useful for rare URL-based regexes                    |

#### Sub-rule semantics

* All sub-rule patterns are **case-insensitive** and use `Matcher.find()` (substring match).
* Sub-rules are evaluated in array order. List the **most specific** patterns first when they can overlap (e.g. Helm before Application; nonprod before prod).
* Each matching sub-rule contributes its `label` to `pipelineLabels`. If `key` is set, it also lands in `pipelineLabelMap[key]` (last writer wins on key collisions).
* Set `enabled: false` to keep a sub-rule on record without applying it.

***

### Operational notes

* The classifier caches rules in-process for 60 s. Mutating endpoints invalidate the cache automatically.
* **Namespace rules require `pipeline.repoUrl`** to be populated. That field is stamped during enrichment from the project record — which means projects must already exist in `dev_insight_projects_collection` for namespace rules to fire. Run `/gitlab/collect` once to seed projects, then `/pipelines/refresh` for enrichment.
* For include-based rules, only `pipelineTemplateProject` / `pipelineTemplateRef` / `pipelineTemplateFile` are needed — these are captured directly during pipeline enrichment from the parsed `.gitlab-ci.yml`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sec1.io/user-docs/9-setup-instructions/debug.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
