Skip to content

Commit d86c508

Browse files
committed
LABDCN-1038
1 parent 9ddb003 commit d86c508

File tree

8 files changed

+250
-6
lines changed

8 files changed

+250
-6
lines changed

docs/LABDCN-1038/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# LABDCN-1038: Open Source Monitoring for Cisco ACI
2+
3+
This section contains specific instruction on how to run the *LABDCN-1038* Walk In Lab for Cisco Live San Diego 2025.
4+
5+
## Task 1 - Getting Familiar with the ACI Monitoring Stack
6+
7+
If this is your first time learning about the ACI monitoring stack you should start with the [Overview](overview.md) that provides an overview of the Stack Architecture.
8+
You do not need to deep dive in the details, unless you want to, but is good to have a generic understanding of the components used in the Stack.
9+
10+
Next head over the [Demo Environment](../demo-environment.md) documentation, as you read this section explore the dashboard that are available in the Demo Environment.
11+
12+
## Task 2 - Create a Dashboard
13+
14+
[Lab1](../labs/lab1.md): In this lab we are going to re-built the ACI Fault Dashboard
15+
16+
## Task 3 - Explore The Logs
17+
18+
[Lab2](../labs/lab2.md): In this lab we are going to use `Explore` to visualize the Logs Received by our ACI fabrics.
19+
20+
## Task 4 - Explore the ACI Configs
21+
22+
The ACI Monitoring Stack introduced a new feature in its last release that automatically generates a Config Snapshot every 15 minutes (by default) and seamlessly loads it into a Graph Database. This allow the user to then query the ACI config directly from Grafana.
23+
24+
[Lab3](../labs/lab3.md)

docs/LABDCN-1038/overview.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
aci-monitoring-stack - Open Source Monitoring for Cisco ACI
2+
------------
3+
4+
# Overview
5+
6+
Harness the power of open source to efficiently monitor your Cisco ACI environment with the ACI-Monitoring-Stack. This lightweight, yet robust, monitoring solution combines top-tier open source tools, each contributing unique capabilities to ensure comprehensive visibility into your ACI infrastructure.
7+
8+
The ACI-Monitoring-Stack integrates the following key components:
9+
10+
- [Grafana](https://grafana.com/oss/grafana/): The leading open-source analytics and visualization platform. Grafana allows you to create dynamic dashboards that provide real-time insights into your network's performance, health, and metrics. With its user-friendly interface, you can easily visualize and correlate data across your ACI fabric, enabling quicker diagnostics and informed decision-making.
11+
12+
- [Prometheus](https://prometheus.io/): A powerful open-source monitoring and alerting toolkit. Prometheus excels in collecting and storing metrics in a time-series database, allowing for flexible queries and real-time alerting. Its seamless integration with Grafana ensures that your monitoring stack provides a detailed and up-to-date view of your ACI environment.
13+
14+
- [Loki](https://grafana.com/oss/loki/): Designed for efficiently aggregating and querying logs from your entire ACI ecosystem. Loki complements Prometheus by focusing on log aggregation, providing a unified stack for metrics and logs. Its integration with Grafana enables you to correlate log data with metrics and create a holistic monitoring experience.
15+
16+
- [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/): the agent responsible for gathering and shipping the log files to the Loki server.
17+
18+
- [Syslog-ng](https://github.com/syslog-ng/syslog-ng): is an open-source implementation of the Syslog protocol, its role in this stack is to translate syslog messages from RFC 3164 to 5424. This is needed because Promtail only support Syslog RFC 5424 over TCP and this capability is only available in ACI 6.1 and above.
19+
20+
- [aci-exporter](https://github.com/opsdis/aci-exporter): A Prometheus exporter that serves as the bridge between your Cisco ACI environment and the Prometheus monitoring ecosystem. The aci-exporter translates ACI-specific metrics into a format that Prometheus can ingest, ensuring that all crucial data points are captured and monitored effectively.
21+
22+
- [backup2graph](apps/backup2graph/README.md): Convert an ACI Backup into a Graph Database
23+
24+
- [Memgraph](https://github.com/memgraph/memgraph): An open source graph database implemented in C/C++ and leverages an in-memory first architecture. This will be used in the ACI-Monitoring-Stack to explore the ACI configurations imported by backup2graph
25+
26+
- Pre-configured ACI data collections queries, alerts, and dashboards (Work In Progress): The ACI-Monitoring-Stack provides a solid foundation for monitoring an ACI fabric with its pre-defined queries, dashboards, and alerts. While these tools are crafted based on best practices to offer immediate insights into network performance, they are not exhaustive. The strength of the ACI-Monitoring-Stack lies in its community-driven approach. Users are invited to contribute their expertise by providing feedback, sharing custom solutions, and helping enhance the stack. Your input helps to refine and expand the stack's capabilities, ensuring it remains a relevant and powerful tool for network monitoring.
27+
28+
# Your Stack
29+
30+
To gain a comprehensive understanding of the ACI Monitoring Stack and its components it is helpful to break down the stack into separate functions. Each function focuses on a different aspect of monitoring the Cisco Application Centric Infrastructure (ACI) environment.
31+
32+
## Fabric Discovery:
33+
34+
The ACI monitoring stack uses Prometheus Service Discovery (HTTP SD) to dynamically discover and scrape targets by periodically querying a specified HTTP endpoint for a list of target configurations in JSON format.
35+
36+
The ACI Monitoring Stack needs only the IP addresses of the APICs, the Switches will be Auto Discovered. If switches are added or removed from the fabric no action is required from the end user.
37+
38+
```mermaid
39+
flowchart-elk RL
40+
P[("Prometheus")]
41+
A["aci-exporter"]
42+
APIC["APIC"]
43+
44+
APIC -- "API Query" --> A
45+
A -- "HTTP SD" --> P
46+
```
47+
48+
## ACI Object Scraping:
49+
50+
`Prometheus` scraping is the process by which `Prometheus` periodically collects metrics data by sending HTTP requests to predefined endpoints on monitored targets. The `aci-exporter` translates ACI-specific metrics into a format that `Prometheus` can ingest, ensuring that all crucial data points are captured and monitored effectively.
51+
52+
```mermaid
53+
flowchart-elk RL
54+
P[("Prometheus")]
55+
A["aci-exporter"]
56+
subgraph ACI
57+
S["Switches"]
58+
APIC["APIC"]
59+
end
60+
A--"Scraping"-->P
61+
S--"API Queries"-->A
62+
APIC--"API Queries"-->A
63+
```
64+
## Syslog Ingestion:
65+
66+
The syslog config is composed of 3 components: `promtail`, `loki` and `syslog-ng`.
67+
Prior to ACI 6.1 `syslog-ng` is required between `ACI` and `Promtail` to convert from RFC 3164 to 5424 syslog message format.
68+
69+
```mermaid
70+
flowchart-elk LR
71+
L["Loki"]
72+
PT["Promtail"]
73+
SL["Syslog-ng"]
74+
PT-->L
75+
SL-->PT
76+
subgraph ACI
77+
S["Switches"]
78+
APIC["APIC"]
79+
end
80+
V{Ver >= 6.1}
81+
S--"Syslog"-->V
82+
APIC--"Syslog"-->V
83+
V -->|Yes| PT
84+
V -->|No| SL
85+
```
86+
87+
## Config Explorer:
88+
89+
ACI-Monitoring-Stack will generate a Config Snapshot every 15min (By default) and automatically load it into Memgraph.
90+
Backup2Graph uses ACI API Call to:
91+
- Create a new snapshot policy
92+
- Trigger a snapshot
93+
- Delete the snapshot policy and snapshot (once transferred out of the APIC)
94+
95+
and then uses `scp` to copy it over for processing. Once the Snapshot is copied the APIC config is cleaned up
96+
97+
```mermaid
98+
flowchart-elk RL
99+
U["User"]
100+
G["Grafana"]
101+
A["APIC"]
102+
B2G["Backup2Graph"]
103+
MG["Memgraph"]
104+
A--"Backup"-->B2G
105+
B2G--"Push"-->MG
106+
MG--"Cypher Queries"-->G
107+
G-->U
108+
```
109+
110+
## Data Visualization
111+
112+
The Data Visualization is handled by `Grafana`, an open-source analytics and monitoring platform that allows users to visualize, query, and analyze data from various sources through customizable and interactive dashboards. It supports a wide range of data sources, including `Prometheus` and `Loki` enabling users to create real-time visualizations, alerts, and reports to monitor system performance and gain actionable insights.
113+
114+
```mermaid
115+
flowchart-elk RL
116+
G["Grafana"]
117+
L["Loki"]
118+
P[("Prometheus")]
119+
U["User"]
120+
121+
P--"PromQL"-->G
122+
L--"LogQL"-->G
123+
G-->U
124+
```
125+
## Alerting
126+
127+
`Alertmanager` is a component of the `Prometheus` ecosystem designed to handle alerts generated by `Prometheus`. It manages the entire lifecycle of alerts, including deduplication, grouping, silencing, and routing notifications to various communication channels like email, `Webex`, `Slack`, and others, ensuring that alerts are delivered to the right people in a timely and organized manner.
128+
129+
In the ACI Monitoring Stack both `Prometheus` and `Loki` are configured with alerting rules.
130+
```mermaid
131+
flowchart-elk LR
132+
L["Loki"]
133+
P["Prometheus"]
134+
AM["Alertmanager"]
135+
N["Notifications (Mail/Webex etc...)"]
136+
L --> AM
137+
P --> AM
138+
AM --> N
139+
```
140+
141+
[Back](README.md)

docs/demo-environment.md

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,13 @@ The stack is pre-provisioned with the following Dashboards. Feel free to explore
3131
- [Nodes Interfaces](#nodes-interfaces)
3232
- [Power Usage](#power-usage)
3333
- [Routing Protocols](#routing-protocols)
34-
- [Vlans](#vlans)
3534
- [Loki backed Dashboards](#loki-backed-dashboards)
3635
- [Contract Drops Logs](#contract-drops-logs)
37-
36+
- [Config Export Dashboards](#config-export-dashboards)
37+
- [Contract Explorer](#contract-explorer)
38+
- [Fabric Policies - Port Group](#fabric-policies-port-group)
39+
- [Missing Targets](#missing-targets)
40+
- [Vlans](#vlans)
3841

3942
## Prometheus backed Dashboards
4043

@@ -116,10 +119,6 @@ This dashboard contains the following graphs:
116119
- BGP Advertised/Received Paths: For every BGP peering we display the number of paths received/advertised
117120
- BGP Accepted Paths: Time series graph of **received** BGP prefixes
118121

119-
### Vlans
120-
121-
Display the APIC config for VLAN Pools and VMM Custom Trunk Ports in filterable tables.
122-
123122
## Loki backed Dashboards
124123

125124
These dashboards are using `Loki` as data source meaning the data we are visualizing came from an ACI Syslog Message
@@ -128,4 +127,34 @@ These dashboards are using `Loki` as data source meaning the data we are visuali
128127

129128
This dashboard parses the logs received by the switches and extract infos on the Contract Drop Logs. This requires a specific [config](syslog.md) on ACI and is limited to 500 Messages/s per switch
130129

130+
## Config Export Dashboards
131+
These dashboard are based on data extracted from ACI Config Snapshot and converted in a Graph Database.
132+
133+
### Contract Explorer
134+
135+
This dashboard allows the user to select a contract and will display how a contract is deployed and what EPG/ESGs are providing or consuming it.
136+
137+
<img src=images/contract-explorer.png width="1200">
138+
139+
### Fabric Policies - Port Group
140+
141+
This dashboard displays detailed information about a port group allowing the user to understand the mappings of:
142+
143+
- VLANs
144+
- Domains
145+
- AAEP
146+
- Leaves and ports
147+
148+
<img src=images/port-group.png width="1200">
149+
150+
### Missing Targets
151+
152+
Detects and Show missing targets. This is still a bit of a work in progress and should be improved a bit!
153+
![alt text](images/missing-targets.png)
154+
155+
### Vlans
156+
157+
Display the APIC config for VLAN Pools and VMM Custom Trunk Ports in filterable tables.
158+
![alt text](images/vlans.png)
159+
131160
[Next - Lab1](labs/lab1.md)

docs/images/contract-explorer.png

352 KB
Loading

docs/images/missing-targets.png

175 KB
Loading

docs/images/port-group.png

428 KB
Loading

docs/images/vlans.png

556 KB
Loading

docs/labs/lab3.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Overview
2+
3+
All the dashboards that are tagged as `cisco-aci-config` are generated by creating a backup of the ACI confing and importint it into a graph database.
4+
5+
A graph database is a type of database designed to represent and store data as a network of interconnected nodes (entities) and edges (relationships). Unlike traditional relational databases that use tables, graph databases use graph structures to model relationships between data, making them highly efficient for querying and analyzing complex, interconnected data. Each node represents an entity (e.g., a person, product, or location), while edges define relationships (e.g., "friend of," "purchased," or "located at").
6+
7+
This works great to represent the ACI Configuration and allows us to create custom dashboards by using the `Cypher` language.
8+
9+
Cypher is a query language specifically designed for working with graph databases. It is declarative, meaning you describe what you want to retrieve or manipulate in the graph, and the database engine determines the best way to execute the query. Cypher is similar in concept to SQL for relational databases but is optimized for graph structures, enabling intuitive and powerful querying of nodes (entities), relationships (edges), and their properties.
10+
11+
Cypher uses a pattern-matching syntax that resembles ASCII art, making it easy to visualize and query graphs. For example, (a)-[r]->(b) represents a node a connected to node b by a relationship r. You can use Cypher to perform a variety of graph operations, such as finding shortest paths, traversing relationships, filtering based on properties, and creating or modifying nodes and edges.
12+
13+
Feel free to explore the pre-existing dashboard once you are done if you want to experiment head over to:
14+
15+
You can find `Explore` in the left panel of the Grafana UI and from the Drop Down Select `memgraph`
16+
![alt text](images/lab2/explore.png)
17+
18+
Now let's try writing a simple query:
19+
20+
```sql
21+
MATCH (t:fvTenant)-[r]-(vrf:fvCtx)
22+
WHERE t.fabric="site3"
23+
return *
24+
```
25+
26+
This will return a mapping of Tenants to VRF. Try now to take a look at the `Contract Explorer` dashboard and edit it. You will see that the cypher query is a bit more complex:
27+
28+
```sql
29+
MATCH (provider)-[r1:fvRsProv|vzRsAnyToProv]-(contract:vzBrCP)-[r2:fvRsCons|vzRsAnyToCons]-(consumer)
30+
WHERE contract.dn="uni/tn-$tenant/brc-$contract" and contract.fabric='$fabric'
31+
32+
RETURN provider.dn as ProviderDN, consumer.dn as ConsumerDN
33+
```
34+
35+
1. MATCH: The query is looking for a specific structure (or pattern) in the graph.
36+
2. Nodes:
37+
- (provider): A node representing the provider (this could be any entity or object, depending on the graph's context). I do like this as this can be a EPG/ESG or a VRF
38+
- (contract:vzBrCP): A node representing a contract, specifically an ACI Class of type vzBrCP
39+
- (consumer): A node representing the consumer (another entity or object). I do like this as this can be a EPG/ESG or a VRF
40+
3. Relationships:
41+
- [r1:fvRsProv|vzRsAnyToProv]: There is a relationship between the `provider` and the `contract`, which can be of type fvRsProv or vzRsAnyToProv.
42+
- [r2:fvRsCons|vzRsAnyToCons]: There is a relationship between the `contract` and the `consumer`, which can be of type fvRsCons or vzRsAnyToCons.
43+
4. Pattern:
44+
- The query is looking for a provider node that is connected to a contract node via one of the specified relationships (fvRsProv or vzRsAnyToProv).
45+
- Then, it looks for a consumer node that is also connected to the same contract node via one of the specified relationships (fvRsCons or vzRsAnyToCons).
46+
47+
5. WHERE: It simply filter the result by fabric and contract name that you can select from the grafana dashboard.
48+
6. RETURN: Instead of returning everything we return only the Distinguisher names of the ACI Objects.
49+
50+
If you want to challenge yourself you can take a look at the `Fabric Policies - Port Group` dashboard query.

0 commit comments

Comments
 (0)