Skip to content

Conversation

@zshi-redhat
Copy link
Contributor

Summary

Adds comprehensive AI.md guide to help AI assistants (Claude, Cursor, GitHub Copilot, etc.) provide better code assistance for the cluster-network-operator codebase.

Why AI.md?

The filename "AI.md" is intentionally generic to support multiple AI tools:

  • Claude Code/API (Anthropic)
  • Cursor IDE
  • GitHub Copilot
  • Continue.dev, Aider
  • Any future AI coding assistants

Benefits

Without AI.md:

  • AI must explore entire codebase from scratch (5-10 minutes per question)
  • May miss critical constraints (image reference rules, bootstrap taints, upgrade ordering)
  • Suggestions may violate conventions or break production systems

With AI.md:

  • AI instantly understands architecture, conventions, and constraints
  • Provides accurate answers in seconds
  • Avoids critical mistakes
  • Follows team conventions automatically
  • Real Impact: 10x faster and significantly more accurate AI assistance

Real-World Usage Examples Documented

The guide includes 4 detailed real-world examples:

  1. Developer adding new configuration fields - Shows exact workflow, commands, critical constraints
  2. QE testing network features - Provides test commands, verification steps, debugging
  3. Other Teams doing bug triage - Helps determine if bugs belong to CNO vs other components
  4. Code Review - AI can check critical requirements (image refs, tolerations, compatibility)

Maintenance

AI.md includes comprehensive update guidelines:

  • When to update (architecture changes, API changes, workflow changes)
  • How to update (step-by-step process)
  • Who should use it (developers, QE, other teams, AI assistants)

Documentation

The file serves dual purpose:

  • AI Assistants: Instant context about codebase
  • Human Developers: Quick reference guide for new team members

Full rationale and examples are in the commit message.

@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

Walkthrough

A new documentation file, AI.md, was added describing the Cluster Network Operator repository: architecture, controllers, configuration model, directory layout, development workflow, testing, debugging, conventions, and platform/operand details for AI tools and developers. (50 words)

Changes

Cohort / File(s) Summary
Documentation - AI Assistant Guide
AI.md
Added a comprehensive guide covering repository context, architecture, controller patterns, configuration levels, directory structure, rendering/applying manifests, code generation, unit/integration testing, debugging commands, coding conventions, troubleshooting, operand references, image configuration, platform differences, and quick-reference commands.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zshi-redhat
Once this PR has been reviewed and has the lgtm label, please assign pliurh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
AI.md (2)

61-61: Add language specifiers to code blocks.

Two fenced code blocks lack explicit language identifiers, which reduces syntax highlighting and clarity:

  • Line 61: Directory structure (use text)
  • Line 351: Commit message template (use text)
🔎 Proposed fixes
Line 61:
-```
+```text
 cluster-network-operator/

Line 351:
-```
+```text
 <type>: <subject>

Also applies to: 351-351


404-405: Format bare URLs as markdown links.

Lines 404–405 contain bare URLs that should be formatted as markdown links for consistency and clickability:

🔎 Proposed fixes
-    - **API Reference**: https://github.com/openshift/api (separate repo)
-    - **CVO Operator Guide**: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md
+    - **API Reference**: [openshift/api](https://github.com/openshift/api) (separate repo)
+    - **CVO Operator Guide**: [CVO Operator Guide](https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/operators.md)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9d8ab48 and be75121.

📒 Files selected for processing (1)
  • AI.md
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • AI.md
🪛 LanguageTool
AI.md

[grammar] ~216-~216: Ensure spelling is correct
Context: ...s ## Configuration Examples ### Basic OVNKubernetes Cluster ```yaml apiVersion: operator.o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~430-~430: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...bility - Configuration changes must be backward compatible - Support running new code on old machi...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.18.1)
AI.md

61-61: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


351-351: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


404-404: Bare URL used

(MD034, no-bare-urls)


405-405: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (1)
AI.md (1)

1-472: Comprehensive and well-structured guide.

This AI.md document provides clear, actionable context for developers and AI tools working with the cluster-network-operator. The structure covers essential topics—architecture, workflow, configuration, debugging, and conventions—with concrete examples and command references.

Key strengths:

  • Detailed controller and directory mapping aids navigation
  • Critical constraints (image references, bootstrap taints, upgrade ordering) are clearly highlighted
  • Configuration examples illustrate real-world usage patterns
  • Troubleshooting section adds practical value
  • Language is direct and avoids unnecessary verbosity

The document appropriately serves dual audiences (AI tools and humans) and aligns well with the PR objectives to reduce time-to-answer and prevent constraint violations.

## Why AI.md?

The filename "AI.md" is intentionally generic to support multiple AI tools:
- Claude Code/API (Anthropic)
- Cursor IDE
- GitHub Copilot
- Continue.dev
- Aider
- Any future AI coding assistants

A tool-specific name like "CLAUDE.md" would limit its usefulness. AI.md
signals to all AI assistants that this file contains codebase context.

## Benefits of Having AI.md

### Without AI.md:
- AI must explore entire codebase from scratch for each question
- Takes 5-10 minutes to understand architecture and conventions
- May miss critical constraints (e.g., image reference rules, bootstrap taints)
- Suggestions may violate project conventions or break production systems
- Cannot provide domain-specific guidance without extensive probing

### With AI.md:
- AI instantly understands architecture, conventions, and constraints
- Provides accurate answers in seconds, not minutes
- Avoids critical mistakes (hardcoded images, missing tolerations, etc.)
- Suggestions follow team conventions (commit format, testing, codegen)
- Can immediately provide debugging commands and troubleshooting steps

**Real Impact**: AI assistance becomes 10x faster and significantly more accurate.

## Real-World Usage Examples

### Example 1: Developer Adding New Configuration Field

**Developer asks**: "Add MTU configuration option for OVN"

**AI with AI.md responds**:
```
I'll follow the documented process from AI.md:
1. Update API type in openshift/api repo (Network.operator.openshift.io)
2. Vendor updated API: go mod vendor
3. Update controller logic in pkg/controller/network/
4. Update rendering in pkg/render/ for OVN templates
5. Add unit tests in pkg/network/ovn_kubernetes_test.go
6. Run: make update-codegen
7. Verify: make check

Critical: Ensure backward compatibility since CNO runs before MCO.
MTU changes should not break existing clusters during upgrade.
```

The AI knows:
- Exact update sequence
- Where to find relevant code (pkg/controller/, pkg/render/)
- Commands to run (update-codegen, check)
- Critical constraint (CNO before MCO upgrade ordering)

### Example 2: QE Testing Network Features

**QE asks**: "How do I test IPsec configuration on a cluster?"

**AI with AI.md responds**:
```
Based on docs/enabling_ns_ipsec.md referenced in AI.md:

1. Enable IPsec in operator config:
   oc patch network.operator cluster --type=merge -p '{
     "spec": {
       "defaultNetwork": {
         "ovnKubernetesConfig": {
           "ipsecConfig": {"mode": "Full"}
         }
       }
     }
   }'

2. Verify operands rolled out:
   oc get pods -n openshift-ovn-kubernetes
   # Wait for ovnkube-node pods to restart

3. Verify IPsec is active on nodes:
   oc debug node/<node-name>
   chroot /host
   ovs-appctl -t ovs-monitor-ipsec tunnels/show

4. Test connectivity between pods across nodes
5. Check network metrics for IPsec indicators

Debug commands if issues:
- Operator logs: oc logs -n openshift-network-operator deployment/network-operator
- OVN logs: oc logs -n openshift-ovn-kubernetes ds/ovnkube-node -c ovn-controller
- Status: oc get clusteroperator network
```

The AI knows:
- IPsec documentation location
- Exact patch command syntax
- Where operands run (namespace, DaemonSet names)
- Debugging commands
- Verification steps

### Example 3: Other Teams - Bug Triage

**Platform team asks**: "Is this networking bug in CNO or MCO?"

```
Bug: "Nodes fail to join cluster after upgrade, network unreachable"
```

**AI with AI.md responds**:
```
This is likely a CNO issue. Analysis based on AI.md context:

CNO Responsibilities (from AI.md):
- Network plugin deployment (OVNKubernetes, Multus)
- Node network configuration during bootstrap
- Removing network-unavailable taints
- Runs at upgrade level 07 (before MCO)

Diagnostic Steps:
1. Check CNO operator status:
   oc get clusteroperator network
   # Look for Degraded=True, Available=False

2. Check node taints:
   oc get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
   # If node.kubernetes.io/network-unavailable persists, CNO issue

3. Check OVN operand pods:
   oc get pods -n openshift-ovn-kubernetes
   # If ovnkube-node pods failing, CNO operand issue

4. Check CNO logs:
   oc logs -n openshift-network-operator deployment/network-operator

Assign to CNO team if:
- network-unavailable taint persists
- OVN pods failing to start
- CNO ClusterOperator degraded

Assign to MCO team if:
- Machine configs failing to apply
- Node configuration issues after networking is up
- Issues appear after CNO completes (runlevel > 07)
```

The AI understands:
- Component boundaries (CNO vs MCO)
- Upgrade ordering (CNO level 07 before MCO)
- Bootstrap process and node taints
- Diagnostic commands for each component
- Clear criteria for team assignment

### Example 4: Developer - Code Review

**Developer**: "Review this PR that adds proxy support to Multus"

**AI with AI.md**:
```
Code review checklist from AI.md:

✅ Verified:
- Image reference uses os.Getenv("MULTUS_IMAGE") [CRITICAL]
- Bootstrap taints included in DaemonSet tolerations
- Unit tests added in pkg/network/multus_test.go
- Update codegen was run (make update-codegen)

⚠️  Issues found:
1. Missing proxy environment variables in bindata/network/multus/multus.yaml
   - Need HTTP_PROXY, HTTPS_PROXY, NO_PROXY from cluster proxy config
   - See pkg/controller/proxyconfig/ for reference

2. Backward compatibility concern:
   - New proxy fields should be optional
   - Clusters without proxy must continue working
   - Consider: what happens on upgrade from 4.20 to 4.21?

3. Platform differences not considered:
   - Check pkg/platform/ - some platforms have different proxy requirements
   - AWS clusters may need VPC endpoint NO_PROXY exceptions

4. Documentation missing:
   - Update docs/operands.md to document proxy behavior
   - Add example configuration to README.md

Recommendation: Request changes for items openshift#1-3, documentation optional.
```

The AI knows:
- Critical requirements (image refs, bootstrap taints)
- Project patterns (proxy config controller)
- Upgrade compatibility requirements
- Platform differences
- Documentation standards

## Updating AI.md Over Time

AI.md should be updated when:

### 1. Architecture Changes
- New controllers added to pkg/controller/
- New network plugins added to bindata/network/
- Major operand changes (new DaemonSets, Deployments)

**Update**: Add to "Key Controllers" or "Network Operands" sections

### 2. Configuration API Changes
- New fields in Network.operator.openshift.io
- New CRDs or configuration objects
- Deprecated features

**Update**: Add to "Configuration Examples" with YAML samples

### 3. Development Workflow Changes
- New make targets
- New testing requirements
- CI/CD pipeline changes
- New code generation tools

**Update**: Update "Development Workflow" and "Common Commands" sections

### 4. Critical Constraints Change
- Image reference requirements
- Bootstrap requirements (taints, tolerations)
- Upgrade ordering changes
- Platform support changes

**Update**: Update "Special Considerations" and "Key Concepts" sections

### 5. Common Bugs/Issues Emerge
- Recurring questions from developers
- Common mistakes in PRs
- Frequent debugging scenarios

**Update**: Add to "Troubleshooting Tips" section

### Update Process:

```bash
# 1. Edit AI.md with new information
vi AI.md

# 2. Test with AI assistant
# Ask AI a question that requires the new information
# Verify AI can answer correctly

# 3. Commit with clear message
git add AI.md
git commit -m "Update AI.md: Add XYZ documentation

- Added new controller information
- Updated configuration examples
- Added troubleshooting for common issue openshift#123
"

# 4. Include in feature PRs
# When adding new features, update AI.md in the same PR
```

### Maintenance Guidelines:

- **Keep examples concrete**: Real YAML, real commands, real file paths
- **Update with PRs**: When adding features, update AI.md documentation
- **Quarterly review**: Review for outdated information every 3 months
- **New team members**: Ask them to read AI.md and suggest improvements
- **Track common questions**: If developers/QE repeatedly ask something, add it

## Who Should Use AI.md?

**Developers**:
- Getting started with the codebase
- Adding new features
- Understanding architecture
- Debugging issues

**QE/Testing**:
- Understanding how to test features
- Debugging test failures
- Understanding operator behavior
- Verifying configurations

**Other Teams**:
- Bug triage (is this CNO or another component?)
- Understanding CNO's role in cluster lifecycle
- Integration questions

**AI Assistants**:
- All of the above, automatically and instantly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @AI.md:
- Around line 434-445: The "Bootstrap Taints" YAML example is missing the
node.kubernetes.io/unreachable toleration; update the YAML under the "Bootstrap
Taints" section to include a fourth toleration entry with key
"node.kubernetes.io/unreachable" and operator "Exists" alongside the existing
keys (node.kubernetes.io/not-ready, node.kubernetes.io/network-unavailable,
node-role.kubernetes.io/master) so the documented tolerations match the
ovnkube-control-plane manifest.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between be75121 and c408419.

📒 Files selected for processing (1)
  • AI.md
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • AI.md
🪛 LanguageTool
AI.md

[grammar] ~216-~216: Ensure spelling is correct
Context: ...s ## Configuration Examples ### Basic OVNKubernetes Cluster ```yaml apiVersion: operator.o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~430-~430: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...bility - Configuration changes must be backward compatible - Support running new code on old machi...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (1)
AI.md (1)

398-405: All referenced documentation files exist in the repository and are properly maintained. No issues found.

Comment on lines +434 to +445
### Bootstrap Taints

All critical operands must tolerate:
```yaml
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
- key: node.kubernetes.io/network-unavailable
operator: Exists
- key: node-role.kubernetes.io/master
operator: Exists
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for toleration definitions in bindata manifests
echo "=== Searching for toleration definitions in bindata ==="
find . -path "*/bindata*" -name "*.yaml" -o -name "*.yml" 2>/dev/null | head -20

Repository: openshift/cluster-network-operator

Length of output: 1131


🏁 Script executed:

#!/bin/bash
# Search for toleration references across the codebase
echo "=== Searching for tolerations in manifest files ==="
rg -l 'tolerations:' . --type yaml 2>/dev/null | head -20

Repository: openshift/cluster-network-operator

Length of output: 1239


🏁 Script executed:

#!/bin/bash
# Look for specific bootstrap-related taints
echo "=== Searching for bootstrap taint specifications ==="
rg 'node\.kubernetes\.io|node-role\.kubernetes\.io' . --type yaml -A 2 -B 2 2>/dev/null | head -80

Repository: openshift/cluster-network-operator

Length of output: 7087


Update bootstrap taint documentation to include node.kubernetes.io/unreachable.

The documented tolerations at lines 434-445 are incomplete. The OVN-Kubernetes control-plane manifest (./bindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yaml) includes a fourth required toleration that is missing from the guide:

- key: "node.kubernetes.io/unreachable"
  operator: "Exists"

Update the YAML example to include this taint alongside the three currently listed tolerations to ensure operands tolerate all bootstrap phase conditions.

🤖 Prompt for AI Agents
In @AI.md around lines 434-445, The "Bootstrap Taints" YAML example is missing
the node.kubernetes.io/unreachable toleration; update the YAML under the
"Bootstrap Taints" section to include a fourth toleration entry with key
"node.kubernetes.io/unreachable" and operator "Exists" alongside the existing
keys (node.kubernetes.io/not-ready, node.kubernetes.io/network-unavailable,
node-role.kubernetes.io/master) so the documented tolerations match the
ovnkube-control-plane manifest.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 5, 2026

@zshi-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-images c408419 link true /test okd-scos-images
ci/prow/e2e-aws-ovn-hypershift-conformance c408419 link true /test e2e-aws-ovn-hypershift-conformance
ci/prow/e2e-ovn-ipsec-step-registry c408419 link true /test e2e-ovn-ipsec-step-registry
ci/prow/security c408419 link false /test security
ci/prow/e2e-gcp-ovn-upgrade c408419 link true /test e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant