diff --git a/skills/README.md b/skills/README.md new file mode 100644 index 000000000..2cec9e760 --- /dev/null +++ b/skills/README.md @@ -0,0 +1,98 @@ +# MAGE Algorithm Skills for GitHub Copilot + +This directory contains [Agent Skills](https://agentskills.io/) that provide GitHub Copilot with expertise in Memgraph's MAGE (Memgraph Advanced Graph Extensions) algorithms. + +## What are Agent Skills? + +Agent Skills are folders of instructions that GitHub Copilot can load when relevant to perform specialized tasks. Each skill teaches Copilot how to use specific graph algorithms with Memgraph. + +## Using These Skills + +These skills work automatically with: +- **GitHub Copilot coding agent** (in VS Code and other editors) +- **GitHub Copilot CLI** +- **VS Code Insiders agent mode** + +When you ask Copilot about graph algorithms, it will automatically discover and use these skills to help you write correct Cypher queries and understand algorithm usage. + +## Available Skills + +### Graph Analytics & Centrality +- **pagerank** - Measure node influence based on connections +- **betweenness-centrality** - Find bridge nodes and connectors +- **degree-centrality** - Count node connections +- **katz-centrality** - Influence based on all paths + +### Community & Clustering +- **community-detection** - Louvain algorithm for finding communities +- **leiden-community-detection** - Improved community detection +- **weakly-connected-components** - Find connected groups + +### Path Finding & Traversal +- **shortest-path** - Find optimal paths between nodes +- **bfs** - Breadth-first search traversal +- **tsp** - Traveling Salesman Problem solver + +### Graph Structure +- **bridges** - Detect critical edges +- **cycles** - Find cycles in graphs +- **graph-analyzer** - Graph statistics and profiling + +### Similarity & Machine Learning +- **node-similarity** - Calculate node similarity +- **node2vec** - Generate node embeddings + +### Optimization +- **max-flow** - Maximum flow in networks + +### Utilities +- **collections** - List manipulation utilities +- **json-util** - JSON import/export + +## Example Usage + +Simply ask Copilot natural language questions like: + +``` +"Find the most influential nodes in my graph using PageRank" +"Detect communities in the social network" +"What's the shortest path between node A and node B?" +"Calculate betweenness centrality to find bridge nodes" +``` + +Copilot will: +1. Recognize the request matches a skill +2. Load the skill instructions +3. Generate appropriate Cypher queries +4. Help you understand and use the results + +## Skill Structure + +Each skill follows the [Agent Skills specification](https://agentskills.io/specification): + +``` +skill-name/ +└── SKILL.md # Skill definition with frontmatter + instructions +``` + +The `SKILL.md` file contains: +- **YAML frontmatter** with name, description, and metadata +- **Detailed instructions** on when and how to use the algorithm +- **Cypher query examples** with Memgraph MAGE procedures +- **Common edge cases** and tips + +## Requirements + +- GitHub Copilot Pro, Pro+, Business, or Enterprise +- Memgraph database with MAGE library installed +- These skills are specifically designed for Memgraph's Cypher dialect and MAGE procedures + +## Learn More + +- [Agent Skills Specification](https://agentskills.io/) +- [GitHub Copilot Skills Documentation](https://docs.github.com/en/copilot/concepts/agents/about-agent-skills) +- [Memgraph MAGE Documentation](https://memgraph.com/docs/advanced-algorithms/available-algorithms) + +## License + +Apache-2.0 diff --git a/skills/betweenness-centrality/SKILL.md b/skills/betweenness-centrality/SKILL.md new file mode 100644 index 000000000..062c2a1d5 --- /dev/null +++ b/skills/betweenness-centrality/SKILL.md @@ -0,0 +1,115 @@ +--- +name: betweenness-centrality +description: Calculate betweenness centrality to find nodes that act as bridges or connectors in the graph by measuring how often they appear on shortest paths between other nodes. Use when finding bridge nodes, connectors, bottlenecks, or nodes that control information flow. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: centrality + complexity: "O(V * E)" +--- + +# Betweenness Centrality + +Betweenness centrality measures the extent to which a node lies on paths between other nodes. Nodes with high betweenness centrality act as bridges or gatekeepers in the network. + +## When to Use This Skill + +Use betweenness centrality when: +- Finding bridge nodes that connect different parts of a network +- Identifying bottlenecks or critical points +- Detecting nodes that control information flow +- Finding connectors between communities +- Analyzing network resilience (removing high-betweenness nodes) + +## Basic Usage + +### Calculate Betweenness Centrality + +```cypher +CALL betweenness_centrality.get() +YIELD node, betweenness_centrality +RETURN node, betweenness_centrality +ORDER BY betweenness_centrality DESC +LIMIT 10; +``` + +### Normalized Betweenness Centrality + +```cypher +CALL betweenness_centrality.get(True, True) +YIELD node, betweenness_centrality +RETURN node, betweenness_centrality +ORDER BY betweenness_centrality DESC; +``` + +**Parameters:** +- `directed` (default: True) - Whether to treat the graph as directed +- `normalized` (default: True) - Normalize scores to [0, 1] range + +## Advanced Usage + +### Betweenness on Subgraph + +```cypher +MATCH (n:Person)-[r:WORKS_WITH]->(m:Person) +WITH collect(n) + collect(m) AS nodes, collect(r) AS rels +CALL betweenness_centrality.get_subgraph(nodes, rels) +YIELD node, betweenness_centrality +RETURN node.name AS name, betweenness_centrality +ORDER BY betweenness_centrality DESC; +``` + +### Filter by Label + +```cypher +CALL betweenness_centrality.get() +YIELD node, betweenness_centrality +WHERE node:Router +RETURN node.name AS router, betweenness_centrality +ORDER BY betweenness_centrality DESC; +``` + +### Compare with Other Centralities + +```cypher +CALL betweenness_centrality.get() YIELD node, betweenness_centrality +WITH node, betweenness_centrality +CALL pagerank.get() YIELD node AS pr_node, rank +WHERE node = pr_node +RETURN node.name, betweenness_centrality, rank AS pagerank +ORDER BY betweenness_centrality DESC; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| betweenness_centrality | Float | Centrality score (higher = more important as bridge) | + +## Example Results + +``` +╔════════════════════╦═══════════════════════════╗ +║ name ║ betweenness_centrality ║ +╠════════════════════╬═══════════════════════════╣ +║ "Gateway_Router" ║ 0.892 ║ +║ "Central_Hub" ║ 0.734 ║ +║ "Bridge_Node" ║ 0.651 ║ +╚════════════════════╩═══════════════════════════╝ +``` + +## Common Edge Cases + +1. **Disconnected graphs**: Nodes in separate components have 0 betweenness relative to each other +2. **Star topology**: Central node has maximum betweenness +3. **Complete graphs**: All nodes have equal (low) betweenness +4. **Linear chains**: Middle nodes have highest betweenness + +## Interpretation Tips + +- High betweenness = critical for connectivity +- Removing high-betweenness nodes may fragment the network +- Compare normalized scores across different graphs +- Often complements PageRank (different aspects of importance) diff --git a/skills/bfs/SKILL.md b/skills/bfs/SKILL.md new file mode 100644 index 000000000..152e37d79 --- /dev/null +++ b/skills/bfs/SKILL.md @@ -0,0 +1,126 @@ +--- +name: bfs +description: Perform breadth-first search traversal starting from a node, visiting neighbors level by level. Use when exploring graphs layer by layer, finding shortest unweighted paths, or discovering all nodes within a certain distance. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: traversal + complexity: "O(V + E)" +--- + +# Breadth-First Search (BFS) + +Traverse a graph starting from a source node, exploring all neighbors at the current depth before moving to nodes at the next depth level. Fundamental for shortest paths in unweighted graphs. + +## When to Use This Skill + +Use BFS when: +- Finding shortest path in unweighted graph +- Exploring nodes level by level +- Finding all nodes within N hops +- Discovering connected components +- Web crawling or social network exploration + +## Basic Usage + +### BFS Shortest Path + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS]-(b:Person {name: "Bob"}) +RETURN p, length(p) AS distance; +``` + +### BFS with Depth Limit + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS..3]-(b) +RETURN b.name AS reachable_node, length(p) AS distance +ORDER BY distance; +``` + +### BFS with Relationship Filter + +```cypher +MATCH p = (a:Person {name: "Alice"})-[:KNOWS|WORKS_WITH *BFS]-(b) +RETURN b, length(p) AS distance; +``` + +## Advanced Usage + +### Find All Nodes at Specific Distance + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS]-(b) +WHERE length(p) = 2 +RETURN b.name AS two_hops_away; +``` + +### BFS with Filter Lambda + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS (e, n | n.active = true)]-(b) +RETURN b; +``` + +### Count Nodes by Distance + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS..5]-(b) +WHERE a <> b +RETURN length(p) AS distance, count(b) AS nodes_at_distance +ORDER BY distance; +``` + +### BFS for Finding Shortest Path Length + +```cypher +MATCH p = shortestPath((a:Person {name: "Alice"})-[*]-(b:Person {name: "Bob"})) +RETURN length(p) AS shortest_distance; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| path | Path | The BFS path from source | +| node | Node | Discovered node | +| distance | Integer | Number of hops from source | + +## BFS vs DFS + +| Aspect | BFS | DFS | +|--------|-----|-----| +| Strategy | Level by level | Deep first | +| Shortest path | Yes (unweighted) | No guarantee | +| Memory | O(width) | O(depth) | +| Use case | Shortest paths, nearby nodes | Full exploration, cycles | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ reachable_node ║ distance ║ +╠════════════════════╬═══════════════╣ +║ "Bob" ║ 1 ║ +║ "Charlie" ║ 1 ║ +║ "David" ║ 2 ║ +║ "Eve" ║ 2 ║ +║ "Frank" ║ 3 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **No path exists**: Empty result +2. **Source equals target**: Returns empty path (distance 0) +3. **Cycles**: BFS naturally handles cycles (visits each node once) +4. **Disconnected graph**: Only reaches nodes in same component + +## Tips + +- Use `*BFS` for guaranteed shortest unweighted path +- Add depth limit (`..N`) to avoid exploring entire graph +- Filter relationship types to constrain traversal +- Use filter lambda for property-based traversal rules +- Consider indexes on starting node properties diff --git a/skills/bridges/SKILL.md b/skills/bridges/SKILL.md new file mode 100644 index 000000000..41cb8d473 --- /dev/null +++ b/skills/bridges/SKILL.md @@ -0,0 +1,124 @@ +--- +name: bridges +description: Find bridge edges in a graph - edges whose removal would disconnect the graph or increase the number of connected components. Use when identifying critical connections, finding vulnerabilities, or analyzing network resilience. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: structure + complexity: "O(V + E)" +--- + +# Bridge Detection + +Find bridge edges - edges that are critical for maintaining graph connectivity. Removing a bridge edge increases the number of connected components. + +## When to Use This Skill + +Use bridge detection when: +- Identifying critical infrastructure connections +- Finding single points of failure +- Analyzing network resilience +- Understanding graph vulnerabilities +- Planning redundancy in networks + +## Basic Usage + +### Find All Bridges + +```cypher +CALL bridges.get() +YIELD node1, node2 +RETURN node1, node2; +``` + +### Count Bridges + +```cypher +CALL bridges.get() +YIELD node1, node2 +RETURN count(*) AS bridge_count; +``` + +## Advanced Usage + +### Bridges with Full Edge Information + +```cypher +CALL bridges.get() +YIELD node1, node2 +MATCH (n1)-[r]-(n2) +WHERE id(n1) = id(node1) AND id(n2) = id(node2) +RETURN n1, r, n2; +``` + +### Find Nodes Connected Only by Bridges + +```cypher +CALL bridges.get() +YIELD node1, node2 +WITH collect(node1) + collect(node2) AS bridge_nodes +UNWIND bridge_nodes AS node +RETURN DISTINCT node.name AS vulnerable_node; +``` + +### Check if Specific Edge is a Bridge + +```cypher +MATCH (a:Router {name: "R1"})-[r]-(b:Router {name: "R2"}) +CALL bridges.get() +YIELD node1, node2 +WITH a, b, collect([id(node1), id(node2)]) AS bridges +RETURN [id(a), id(b)] IN bridges OR [id(b), id(a)] IN bridges AS is_bridge; +``` + +### Bridges by Relationship Type + +```cypher +CALL bridges.get() +YIELD node1, node2 +MATCH (node1)-[r]-(node2) +RETURN type(r) AS relationship_type, count(*) AS bridge_count; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node1 | Node | First endpoint of bridge edge | +| node2 | Node | Second endpoint of bridge edge | + +## Example Results + +``` +╔═══════════════════╦═══════════════════╗ +║ node1 ║ node2 ║ +╠═══════════════════╬═══════════════════╣ +║ Router_A ║ Router_B ║ +║ Switch_1 ║ Switch_2 ║ +╚═══════════════════╩═══════════════════╝ +``` + +## Common Edge Cases + +1. **Tree structures**: All edges are bridges +2. **Complete graphs**: No bridges (highly redundant) +3. **Cycles**: Edges in cycles are not bridges +4. **Single edge graphs**: The edge is a bridge + +## Interpretation + +| Bridge Count | Graph Characteristic | +|--------------|---------------------| +| 0 | Highly resilient, no single points of failure | +| Low | Good redundancy | +| High | Vulnerable to disconnection | +| All edges | Tree-like structure (fragile) | + +## Tips + +- Bridges indicate potential vulnerabilities +- Add redundant connections to eliminate bridges +- In network design, minimize bridges for reliability +- Bridges often connect different communities +- Combine with biconnected components analysis diff --git a/skills/collections/SKILL.md b/skills/collections/SKILL.md new file mode 100644 index 000000000..436c22f18 --- /dev/null +++ b/skills/collections/SKILL.md @@ -0,0 +1,186 @@ +--- +name: collections +description: Utility functions for manipulating lists and collections in Cypher queries, including filtering, sorting, union, intersection, and other list operations. Use when you need to work with arrays, combine lists, or perform set operations on collections. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: utility + complexity: "O(n)" +--- + +# Collections Utilities + +A comprehensive set of functions for manipulating lists and collections in Cypher queries. Essential for data transformation and list operations. + +## When to Use This Skill + +Use collections utilities when: +- Combining or intersecting lists +- Removing duplicates from lists +- Filtering or transforming collections +- Performing set operations (union, difference) +- Partitioning or chunking data + +## Basic Functions + +### Union (Remove Duplicates) + +```cypher +WITH [1, 2, 3] AS list1, [2, 3, 4, 5] AS list2 +CALL collections.union(list1, list2) +YIELD result +RETURN result; +// Result: [1, 2, 3, 4, 5] +``` + +### Union All (Keep Duplicates) + +```cypher +WITH [1, 2, 3] AS list1, [2, 3, 4] AS list2 +CALL collections.union_all(list1, list2) +YIELD result +RETURN result; +// Result: [1, 2, 3, 2, 3, 4] +``` + +### Remove All + +```cypher +WITH [1, 2, 3, 4, 5] AS original, [2, 4] AS to_remove +CALL collections.remove_all(original, to_remove) +YIELD result +RETURN result; +// Result: [1, 3, 5] +``` + +### Contains + +```cypher +WITH [1, 2, 3, 4, 5] AS list +CALL collections.contains(list, 3) +YIELD result +RETURN result; +// Result: true +``` + +## Advanced Functions + +### Flatten Nested Lists + +```cypher +WITH [[1, 2], [3, 4], [5]] AS nested +CALL collections.flatten(nested) +YIELD result +RETURN result; +// Result: [1, 2, 3, 4, 5] +``` + +### Frequency Map + +```cypher +WITH ["a", "b", "a", "c", "b", "a"] AS items +CALL collections.frequencies_as_map(items) +YIELD result +RETURN result; +// Result: {a: 3, b: 2, c: 1} +``` + +### Create Pairs + +```cypher +WITH [1, 2, 3, 4] AS list +CALL collections.pairs(list) +YIELD result +RETURN result; +// Result: [[1,2], [2,3], [3,4]] +``` + +### To Set (Remove Duplicates) + +```cypher +WITH [1, 2, 2, 3, 3, 3] AS list +CALL collections.to_set(list) +YIELD result +RETURN result; +// Result: [1, 2, 3] +``` + +### Sum + +```cypher +WITH [1, 2, 3, 4, 5] AS numbers +CALL collections.sum(numbers) +YIELD result +RETURN result; +// Result: 15 +``` + +### Partition (Chunk) + +```cypher +WITH [1, 2, 3, 4, 5, 6, 7] AS list +CALL collections.partition(list, 3) +YIELD result +RETURN result; +// Result: [[1,2,3], [4,5,6], [7]] +``` + +## Practical Examples + +### Combine Node Properties from Multiple Queries + +```cypher +MATCH (p:Person)-[:KNOWS]->(friend) +WITH p, collect(friend.name) AS friends +MATCH (p)-[:WORKS_WITH]->(colleague) +WITH p, friends, collect(colleague.name) AS colleagues +CALL collections.union(friends, colleagues) +YIELD result AS all_connections +RETURN p.name, all_connections; +``` + +### Find Common Interests + +```cypher +MATCH (a:Person {name: "Alice"})-[:LIKES]->(item) +WITH collect(item.name) AS alice_likes +MATCH (b:Person {name: "Bob"})-[:LIKES]->(item) +WITH alice_likes, collect(item.name) AS bob_likes +WITH [x IN alice_likes WHERE x IN bob_likes] AS common +RETURN common AS shared_interests; +``` + +### Batch Processing with Partition + +```cypher +MATCH (n:Node) +WITH collect(n) AS all_nodes +CALL collections.partition(all_nodes, 100) +YIELD result AS batch +// Process each batch +RETURN size(batch) AS batch_size; +``` + +## Output Format + +| Function | Return Type | Description | +|----------|-------------|-------------| +| union | List | Combined list without duplicates | +| union_all | List | Combined list with duplicates | +| remove_all | List | List with elements removed | +| contains | Boolean | Whether element exists | +| flatten | List | Flattened nested list | +| frequencies_as_map | Map | Count of each element | +| pairs | List of Lists | Adjacent pairs | +| to_set | List | Deduplicated list | +| sum | Number | Sum of numeric elements | +| partition | List of Lists | Chunked list | + +## Tips + +- Use `to_set` before comparisons for better performance +- `partition` is useful for batch processing large lists +- `frequencies_as_map` great for distribution analysis +- Combine functions for complex transformations +- Most functions preserve order where meaningful diff --git a/skills/community-detection/SKILL.md b/skills/community-detection/SKILL.md new file mode 100644 index 000000000..d3a843d07 --- /dev/null +++ b/skills/community-detection/SKILL.md @@ -0,0 +1,115 @@ +--- +name: community-detection +description: Detect communities in a graph using the Louvain algorithm to find groups of densely connected nodes. Use when the user wants to find clusters, groups, communities, detect graph partitions, identify related entities, or segment a network. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: community + complexity: "O(n log n)" +--- + +# Community Detection (Louvain Algorithm) + +The Louvain method is a greedy algorithm for finding communities with maximum modularity in a graph. It identifies groups of nodes that are more densely connected internally than with the rest of the graph. + +## When to Use This Skill + +Use community detection when: +- Finding natural groupings or clusters in a network +- Identifying related entities that form communities +- Segmenting social networks into friend groups +- Discovering topic clusters in document networks +- Partitioning large graphs for analysis + +## Basic Usage + +### Detect Communities + +```cypher +CALL community_detection.get() +YIELD node, community_id +RETURN node, community_id +ORDER BY community_id; +``` + +### Get Community Statistics + +```cypher +CALL community_detection.get() +YIELD node, community_id +RETURN community_id, count(node) AS size +ORDER BY size DESC; +``` + +## Advanced Usage + +### Community Detection with Weight + +```cypher +CALL community_detection.get(True, True, "weight", 1.0) +YIELD node, community_id +RETURN node, community_id; +``` + +**Parameters:** +- `directed` (default: False) - Treat graph as directed +- `weighted` (default: False) - Use edge weights +- `weight_property` (default: "weight") - Property name for weights +- `resolution` (default: 1.0) - Resolution parameter (higher = smaller communities) + +### Community Detection on Subgraph + +```cypher +MATCH (n:Person)-[r:KNOWS]->(m:Person) +WITH collect(n) + collect(m) AS nodes, collect(r) AS rels +CALL community_detection.get_subgraph(nodes, rels) +YIELD node, community_id +RETURN node.name AS name, community_id +ORDER BY community_id; +``` + +### Find Community Members + +```cypher +CALL community_detection.get() +YIELD node, community_id +WITH community_id, collect(node.name) AS members +RETURN community_id, members, size(members) AS member_count +ORDER BY member_count DESC; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| community_id | Integer | Community identifier (nodes with same ID are in same community) | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ community_id ║ +╠════════════════════╬═══════════════╣ +║ "Alice" ║ 0 ║ +║ "Bob" ║ 0 ║ +║ "Charlie" ║ 1 ║ +║ "David" ║ 1 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Single node communities**: Isolated nodes form their own communities +2. **Highly connected graphs**: May result in one large community +3. **Disconnected components**: Each component analyzed separately +4. **Resolution parameter**: Adjust to control community granularity + +## Tips for Better Results + +- Start with default parameters, then adjust resolution +- Higher resolution (>1.0) finds smaller, more granular communities +- Lower resolution (<1.0) finds larger, coarser communities +- For weighted graphs, ensure weight property exists on edges +- Compare results at different resolution values diff --git a/skills/cycles/SKILL.md b/skills/cycles/SKILL.md new file mode 100644 index 000000000..ca0671227 --- /dev/null +++ b/skills/cycles/SKILL.md @@ -0,0 +1,129 @@ +--- +name: cycles +description: Detect cycles in a graph - paths that start and end at the same node. Use when checking for circular dependencies, finding loops, validating acyclic structures, or detecting circular references. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: structure + complexity: "O(V + E)" +--- + +# Cycle Detection + +Detect cycles (circular paths) in a graph. A cycle is a path where you can start from a node and return to the same node by following edges. + +## When to Use This Skill + +Use cycle detection when: +- Checking for circular dependencies +- Validating DAG (directed acyclic graph) property +- Finding loops in workflows or processes +- Detecting circular references in data +- Analyzing feedback loops in systems + +## Basic Usage + +### Find All Cycles + +```cypher +CALL cycles.get() +YIELD cycle +RETURN cycle; +``` + +### Check if Graph Has Cycles + +```cypher +CALL cycles.get() +YIELD cycle +RETURN count(cycle) > 0 AS has_cycles; +``` + +## Advanced Usage + +### Cycles Starting from Specific Node + +```cypher +MATCH (start:Task {name: "Task A"}) +CALL cycles.get(start) +YIELD cycle +RETURN cycle; +``` + +### Find Cycle Lengths + +```cypher +CALL cycles.get() +YIELD cycle +RETURN size(cycle) AS cycle_length, count(*) AS count +ORDER BY cycle_length; +``` + +### Manual Cycle Detection for Small Graphs + +```cypher +MATCH path = (n)-[*]->(n) +RETURN path, length(path) AS cycle_length +ORDER BY cycle_length +LIMIT 10; +``` + +### Detect Self-Loops + +```cypher +MATCH (n)-[r]->(n) +RETURN n, r; +``` + +### Find Shortest Cycle Through a Node + +```cypher +MATCH (target:Node {id: 1}) +MATCH path = (target)-[*]->(target) +RETURN path, length(path) AS cycle_length +ORDER BY cycle_length +LIMIT 1; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| cycle | List | List of nodes forming the cycle | + +## Example Results + +``` +╔═══════════════════════════════════════════════════╗ +║ cycle ║ +╠═══════════════════════════════════════════════════╣ +║ [Task_A, Task_B, Task_C, Task_A] ║ +║ [Process_1, Process_2, Process_1] ║ +╚═══════════════════════════════════════════════════╝ +``` + +## Common Edge Cases + +1. **Self-loops**: Cycles of length 1 (node to itself) +2. **Acyclic graph**: No cycles returned +3. **Multiple cycles sharing nodes**: Each cycle reported separately +4. **Undirected graphs**: Consider both directions + +## Use Cases by Domain + +| Domain | Cycle Meaning | +|--------|---------------| +| Dependencies | Circular dependency (error) | +| Workflows | Infinite loop (usually error) | +| Social networks | Mutual connections | +| Financial | Round-tripping transactions | +| Biology | Feedback mechanisms | + +## Tips + +- Self-loops are the simplest cycles +- Cycles in dependency graphs usually indicate errors +- Consider if you need directed or undirected cycle detection +- For very large graphs, limit cycle length in search +- Visualize cycles to understand their structure diff --git a/skills/degree-centrality/SKILL.md b/skills/degree-centrality/SKILL.md new file mode 100644 index 000000000..c6c3c5e88 --- /dev/null +++ b/skills/degree-centrality/SKILL.md @@ -0,0 +1,138 @@ +--- +name: degree-centrality +description: Calculate degree centrality which measures the number of connections a node has. Use when finding the most connected nodes, identifying hubs, or analyzing basic node importance based on direct connections. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: centrality + complexity: "O(E)" +--- + +# Degree Centrality + +Degree centrality is the simplest centrality measure - it counts the number of edges connected to a node. Nodes with high degree centrality are directly connected to many other nodes. + +## When to Use This Skill + +Use degree centrality when: +- Finding the most connected nodes (hubs) +- Quick assessment of node importance +- Identifying popular entities +- Analyzing network structure basics +- Pre-filtering before expensive algorithms + +## Basic Usage + +### Calculate Degree Centrality + +```cypher +CALL degree_centrality.get() +YIELD node, degree +RETURN node, degree +ORDER BY degree DESC +LIMIT 10; +``` + +### Normalized Degree Centrality + +```cypher +CALL degree_centrality.get(True) +YIELD node, degree +RETURN node, degree +ORDER BY degree DESC; +``` + +**Parameters:** +- `normalized` (default: False) - Normalize by maximum possible degree (n-1) + +## Advanced Usage + +### In-Degree and Out-Degree (Manual) + +```cypher +MATCH (n) +OPTIONAL MATCH (n)<-[in_rel]-() +OPTIONAL MATCH (n)-[out_rel]->() +WITH n, count(DISTINCT in_rel) AS in_degree, count(DISTINCT out_rel) AS out_degree +RETURN n.name, in_degree, out_degree, in_degree + out_degree AS total_degree +ORDER BY total_degree DESC; +``` + +### Degree by Relationship Type + +```cypher +MATCH (n:Person) +OPTIONAL MATCH (n)-[r:KNOWS]-() +WITH n, count(r) AS knows_count +OPTIONAL MATCH (n)-[r:WORKS_WITH]-() +WITH n, knows_count, count(r) AS works_with_count +RETURN n.name, knows_count, works_with_count +ORDER BY knows_count + works_with_count DESC; +``` + +### Filter by Label + +```cypher +CALL degree_centrality.get() +YIELD node, degree +WHERE node:Person +RETURN node.name AS name, degree +ORDER BY degree DESC +LIMIT 20; +``` + +### Degree Distribution + +```cypher +MATCH (n) +WITH size((n)--()) AS degree +RETURN degree, count(*) AS frequency +ORDER BY degree; +``` + +### Find Hub Nodes (High Degree) + +```cypher +CALL degree_centrality.get() +YIELD node, degree +WITH avg(degree) AS avg_degree, collect({node: node, degree: degree}) AS all_nodes +UNWIND all_nodes AS n +WHERE n.degree > avg_degree * 2 +RETURN n.node.name AS hub, n.degree AS degree +ORDER BY degree DESC; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| degree | Integer/Float | Number of connections (or normalized score) | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ degree ║ +╠════════════════════╬═══════════════╣ +║ "Central_Hub" ║ 150 ║ +║ "Popular_User" ║ 89 ║ +║ "Connector" ║ 67 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Isolated nodes**: Degree = 0 +2. **Self-loops**: May count as 1 or 2 depending on implementation +3. **Multiple edges**: Each edge counted separately +4. **Directed graphs**: Consider in-degree vs out-degree + +## Tips + +- Fastest centrality measure - good for initial analysis +- Use as a filter before expensive algorithms +- High degree ≠ most important (consider betweenness/PageRank) +- Compare in-degree vs out-degree for directed graphs +- Watch for degree distribution (power-law vs normal) diff --git a/skills/graph-analyzer/SKILL.md b/skills/graph-analyzer/SKILL.md new file mode 100644 index 000000000..91182e1c1 --- /dev/null +++ b/skills/graph-analyzer/SKILL.md @@ -0,0 +1,130 @@ +--- +name: graph-analyzer +description: Analyze graph structure and calculate statistics like node count, edge count, density, average degree, and other metrics. Use when the user wants to understand graph properties, get statistics, profile the database, or explore the graph structure. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: utility + complexity: "O(V + E)" +--- + +# Graph Analyzer + +Analyze the structure of a graph and compute various statistics. Essential for understanding data characteristics and profiling graph databases. + +## When to Use This Skill + +Use graph analyzer when: +- Understanding the overall structure of a graph +- Getting basic statistics (counts, density, degrees) +- Profiling a database before running algorithms +- Comparing graph characteristics over time +- Identifying potential data quality issues + +## Basic Usage + +### Full Graph Analysis + +```cypher +CALL graph_analyzer.analyze() +YIELD name, value +RETURN name, value; +``` + +### Get Specific Metrics + +```cypher +CALL graph_analyzer.analyze() +YIELD name, value +WHERE name IN ["node_count", "edge_count", "density"] +RETURN name, value; +``` + +## Available Metrics + +| Metric | Description | +|--------|-------------| +| `node_count` | Total number of nodes | +| `edge_count` | Total number of edges | +| `density` | Graph density (edges / possible edges) | +| `avg_degree` | Average node degree | +| `max_in_degree` | Maximum incoming degree | +| `max_out_degree` | Maximum outgoing degree | +| `self_loops` | Number of self-loop edges | +| `is_directed` | Whether the graph is directed | +| `is_dag` | Whether the graph is a directed acyclic graph | + +## Advanced Usage + +### Analyze Subgraph by Label + +```cypher +MATCH (n:Person)-[r:KNOWS]->(m:Person) +WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes, collect(r) AS rels +CALL graph_analyzer.analyze_subgraph(nodes, rels) +YIELD name, value +RETURN name, value; +``` + +### Compare Label Statistics + +```cypher +MATCH (n) +WITH labels(n)[0] AS label, count(*) AS count +RETURN label, count +ORDER BY count DESC; +``` + +### Degree Distribution + +```cypher +MATCH (n) +WITH n, size((n)--()) AS degree +RETURN degree, count(*) AS frequency +ORDER BY degree; +``` + +### Relationship Type Distribution + +```cypher +MATCH ()-[r]->() +RETURN type(r) AS relationship_type, count(*) AS count +ORDER BY count DESC; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| name | String | Metric name | +| value | Any | Metric value (number, boolean, or string) | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ value ║ +╠════════════════════╬═══════════════╣ +║ "node_count" ║ 1000 ║ +║ "edge_count" ║ 5000 ║ +║ "density" ║ 0.005 ║ +║ "avg_degree" ║ 10.0 ║ +║ "max_in_degree" ║ 150 ║ +║ "max_out_degree" ║ 89 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Empty graph**: Returns zeros for counts +2. **No edges**: Edge count = 0, density = 0 +3. **Single node**: Degree = 0 + +## Tips + +- Run analysis before choosing algorithms +- Low density graphs may need different approaches than dense ones +- High max degree nodes may be worth investigating +- Use as a sanity check after data import +- Compare metrics before and after data modifications diff --git a/skills/json-util/SKILL.md b/skills/json-util/SKILL.md new file mode 100644 index 000000000..feb849284 --- /dev/null +++ b/skills/json-util/SKILL.md @@ -0,0 +1,159 @@ +--- +name: json-util +description: Import and export JSON data to and from the graph database. Use when loading JSON files, converting graph data to JSON, or integrating with JSON-based APIs and systems. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: utility + complexity: "O(n)" +--- + +# JSON Utilities + +Functions for importing JSON data into the graph and exporting graph data as JSON. Essential for data integration and external system connectivity. + +## When to Use This Skill + +Use JSON utilities when: +- Loading data from JSON files +- Exporting query results as JSON +- Integrating with REST APIs +- Converting between formats +- Parsing JSON strings in queries + +## Basic Usage + +### Parse JSON String + +```cypher +WITH '{"name": "Alice", "age": 30}' AS json_string +CALL json_util.from_json(json_string) +YIELD result +RETURN result; +``` + +### Convert to JSON + +```cypher +MATCH (p:Person) +WITH collect({name: p.name, age: p.age}) AS people +CALL json_util.to_json(people) +YIELD result +RETURN result; +``` + +### Load JSON from URL + +```cypher +CALL json_util.load_from_url("https://api.example.com/data.json") +YIELD data +UNWIND data AS item +CREATE (n:Item) +SET n = item +RETURN count(n); +``` + +## Advanced Usage + +### Load JSON File and Create Nodes + +```cypher +CALL json_util.load_from_path("/path/to/data.json") +YIELD data +UNWIND data.users AS user +CREATE (p:Person { + name: user.name, + email: user.email, + age: user.age +}) +RETURN count(p) AS created; +``` + +### Parse JSON Array + +```cypher +WITH '[{"id": 1}, {"id": 2}, {"id": 3}]' AS json_array +CALL json_util.from_json_list(json_array) +YIELD result +UNWIND result AS item +RETURN item.id; +``` + +### Export Graph to JSON + +```cypher +MATCH (p:Person)-[r:KNOWS]->(friend:Person) +WITH { + person: p.name, + friends: collect(friend.name) +} AS data +CALL json_util.to_json(data) +YIELD result +RETURN result; +``` + +### Nested JSON Processing + +```cypher +WITH '{"user": {"name": "Alice", "address": {"city": "NYC"}}}' AS nested_json +CALL json_util.from_json(nested_json) +YIELD result +RETURN result.user.name AS name, result.user.address.city AS city; +``` + +### Create Graph from JSON with Relationships + +```cypher +CALL json_util.load_from_path("/data/social.json") +YIELD data +UNWIND data.users AS user +MERGE (p:Person {id: user.id}) +SET p.name = user.name +WITH p, user +UNWIND user.friends AS friend_id +MATCH (friend:Person {id: friend_id}) +MERGE (p)-[:KNOWS]->(friend) +RETURN count(*); +``` + +## Output Format + +| Function | Input | Output | +|----------|-------|--------| +| from_json | JSON string | Map/List | +| from_json_list | JSON array string | List | +| to_json | Any value | JSON string | +| load_from_url | URL | Parsed JSON | +| load_from_path | File path | Parsed JSON | + +## Example Results + +### Parsing +``` +Input: '{"name": "Alice", "age": 30}' +Output: {name: "Alice", age: 30} +``` + +### Exporting +``` +Input: {name: "Alice", friends: ["Bob", "Charlie"]} +Output: '{"name":"Alice","friends":["Bob","Charlie"]}' +``` + +## Common Edge Cases + +1. **Invalid JSON**: Throws parsing error +2. **Empty JSON**: Returns empty map/list +3. **Large files**: May impact memory +4. **Network errors**: URL loading may fail +5. **Encoding**: UTF-8 assumed + +## Tips + +- Validate JSON before loading into production +- Use UNWIND for array processing +- Create indexes before bulk imports +- Use transactions for large imports +- Consider streaming for very large files +- Handle null values explicitly in mappings diff --git a/skills/katz-centrality/SKILL.md b/skills/katz-centrality/SKILL.md new file mode 100644 index 000000000..df2ade01c --- /dev/null +++ b/skills/katz-centrality/SKILL.md @@ -0,0 +1,124 @@ +--- +name: katz-centrality +description: Calculate Katz centrality which measures node influence based on the total number of walks between nodes with exponentially decaying weights. Use when measuring influence in networks where indirect connections matter, or when PageRank doesn't account for all paths. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: centrality + complexity: "O(V * E)" +--- + +# Katz Centrality + +Katz centrality measures influence based on the total number of walks (paths of any length) between a node and all other nodes, with longer walks weighted less through an attenuation factor. + +## When to Use This Skill + +Use Katz centrality when: +- Measuring influence considering all paths (not just shortest) +- Networks where indirect connections matter +- Analyzing information spread potential +- Finding nodes with high reachability +- Alternative to PageRank for certain networks + +## Basic Usage + +### Calculate Katz Centrality + +```cypher +CALL katz_centrality.get() +YIELD node, score +RETURN node, score +ORDER BY score DESC +LIMIT 10; +``` + +### With Custom Alpha (Attenuation Factor) + +```cypher +CALL katz_centrality.get(0.1) +YIELD node, score +RETURN node, score +ORDER BY score DESC; +``` + +**Parameters:** +- `alpha` (default: 0.2) - Attenuation factor (0 < alpha < 1/λ where λ is largest eigenvalue) + +## Advanced Usage + +### Compare with PageRank + +```cypher +CALL katz_centrality.get() YIELD node AS knode, score AS katz +CALL pagerank.get() YIELD node AS pnode, rank AS pagerank +WHERE id(knode) = id(pnode) +RETURN knode.name AS name, katz, pagerank +ORDER BY katz DESC +LIMIT 10; +``` + +### Katz on Subgraph + +```cypher +MATCH (n:Person)-[r:INFLUENCES]->(m:Person) +WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes, collect(r) AS rels +CALL katz_centrality.get_subgraph(nodes, rels, 0.1) +YIELD node, score +RETURN node.name AS name, score +ORDER BY score DESC; +``` + +### Filter by Label + +```cypher +CALL katz_centrality.get() +YIELD node, score +WHERE node:Influencer +RETURN node.name AS name, score +ORDER BY score DESC; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| score | Float | Katz centrality score | + +## Katz vs PageRank + +| Aspect | Katz | PageRank | +|--------|------|----------| +| Walks considered | All walks | Random walks | +| Path weighting | Exponential decay | Damping factor | +| Sink handling | Natural | Requires adjustment | +| Use case | General influence | Link importance | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ score ║ +╠════════════════════╬═══════════════╣ +║ "Influencer_A" ║ 0.892 ║ +║ "Hub_Node" ║ 0.756 ║ +║ "Connector_B" ║ 0.634 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Alpha too high**: May not converge (should be < 1/λ) +2. **Disconnected nodes**: Score based only on reachable nodes +3. **Dense graphs**: Many paths increase scores +4. **Self-loops**: Contribute to own score + +## Tips + +- Alpha controls how much distant connections matter +- Lower alpha = more focus on direct connections +- Higher alpha = more weight on indirect paths +- If convergence fails, try lower alpha +- Good for networks where "any path" matters (not just shortest) diff --git a/skills/leiden-community-detection/SKILL.md b/skills/leiden-community-detection/SKILL.md new file mode 100644 index 000000000..0f9b64cbd --- /dev/null +++ b/skills/leiden-community-detection/SKILL.md @@ -0,0 +1,139 @@ +--- +name: leiden-community-detection +description: Detect communities using the Leiden algorithm, an improved version of Louvain that guarantees well-connected communities. Use when finding communities, clusters, or groups in large graphs with better quality guarantees than basic community detection. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: community + complexity: "O(L * E)" +--- + +# Leiden Community Detection + +The Leiden algorithm is an improvement over the Louvain method for community detection. It guarantees that all communities are well-connected and avoids the problem of poorly connected or disconnected communities that can occur with Louvain. + +## When to Use This Skill + +Use Leiden community detection when: +- Quality of community structure is important +- Working with large graphs +- Louvain produces fragmented communities +- Need guaranteed well-connected communities +- Analyzing complex network structures + +## Basic Usage + +### Detect Communities + +```cypher +CALL leiden_community_detection.get() +YIELD node, community_id +RETURN node, community_id +ORDER BY community_id; +``` + +### Community Statistics + +```cypher +CALL leiden_community_detection.get() +YIELD node, community_id +RETURN community_id, count(node) AS size +ORDER BY size DESC; +``` + +## Advanced Usage + +### With Custom Parameters + +```cypher +CALL leiden_community_detection.get(1.0, 0.01, True) +YIELD node, community_id +RETURN node, community_id; +``` + +**Parameters:** +- `resolution` (default: 1.0) - Higher values produce more, smaller communities +- `beta` (default: 0.01) - Randomness parameter for refinement phase +- `weighted` (default: False) - Use edge weights + +### Leiden on Subgraph + +```cypher +MATCH (n:User)-[r:FOLLOWS]->(m:User) +WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes, collect(r) AS rels +CALL leiden_community_detection.get_subgraph(nodes, rels) +YIELD node, community_id +RETURN node.name AS name, community_id +ORDER BY community_id; +``` + +### Community Members with Labels + +```cypher +CALL leiden_community_detection.get() +YIELD node, community_id +WITH community_id, collect(node.name) AS members +RETURN community_id, members, size(members) AS size +ORDER BY size DESC +LIMIT 10; +``` + +### Resolution Parameter Comparison + +```cypher +// Higher resolution - more communities +CALL leiden_community_detection.get(2.0) +YIELD node, community_id +RETURN "resolution=2.0" AS config, count(DISTINCT community_id) AS num_communities +UNION +// Lower resolution - fewer communities +CALL leiden_community_detection.get(0.5) +YIELD node, community_id +RETURN "resolution=0.5" AS config, count(DISTINCT community_id) AS num_communities; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| community_id | Integer | Community identifier | + +## Leiden vs Louvain + +| Aspect | Louvain | Leiden | +|--------|---------|--------| +| Speed | Faster | Slightly slower | +| Quality | Good | Better (guaranteed connectivity) | +| Disconnected communities | Possible | Never | +| Use case | Quick analysis | Quality-critical analysis | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ community_id ║ +╠════════════════════╬═══════════════╣ +║ "Alice" ║ 0 ║ +║ "Bob" ║ 0 ║ +║ "Charlie" ║ 1 ║ +║ "David" ║ 1 ║ +║ "Eve" ║ 2 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Single node**: Forms its own community +2. **Disconnected graph**: Each component analyzed separately +3. **Complete graph**: May form one large community +4. **Very sparse graph**: Many small communities + +## Tips + +- Start with default parameters, then adjust resolution +- Use Leiden over Louvain when community quality matters +- For very large graphs, consider sampling first +- Compare results at multiple resolution values +- Visualize communities to validate results diff --git a/skills/max-flow/SKILL.md b/skills/max-flow/SKILL.md new file mode 100644 index 000000000..17247cd52 --- /dev/null +++ b/skills/max-flow/SKILL.md @@ -0,0 +1,145 @@ +--- +name: max-flow +description: Calculate the maximum flow between two nodes in a network, determining the maximum amount that can flow from source to sink. Use for network capacity analysis, bandwidth optimization, or resource allocation problems. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: optimization + complexity: "O(V * E^2)" +--- + +# Maximum Flow + +Find the maximum flow from a source node to a sink node in a flow network, where edges have capacity constraints. Based on the Ford-Fulkerson algorithm. + +## When to Use This Skill + +Use max flow when: +- Analyzing network bandwidth capacity +- Resource allocation optimization +- Supply chain capacity planning +- Traffic flow analysis +- Bipartite matching problems + +## Basic Usage + +### Calculate Max Flow + +```cypher +MATCH (source:Node {name: "Source"}), (sink:Node {name: "Sink"}) +CALL max_flow.get(source, sink, "capacity") +YIELD max_flow +RETURN max_flow; +``` + +### Max Flow with Default Capacity + +```cypher +MATCH (s:Server {name: "Origin"}), (t:Server {name: "Destination"}) +CALL max_flow.get(s, t) +YIELD max_flow +RETURN max_flow; +``` + +## Advanced Usage + +### Max Flow with Flow Details + +```cypher +MATCH (source:Node {id: 1}), (sink:Node {id: 10}) +CALL max_flow.get_flow(source, sink, "capacity") +YIELD edge, flow +RETURN startNode(edge).name AS from, + endNode(edge).name AS to, + flow, + edge.capacity AS capacity; +``` + +### Find Bottleneck Edges + +```cypher +MATCH (source:Node {name: "S"}), (sink:Node {name: "T"}) +CALL max_flow.get_flow(source, sink, "capacity") +YIELD edge, flow +WHERE flow = edge.capacity +RETURN startNode(edge).name AS from, + endNode(edge).name AS to, + flow AS saturated_flow; +``` + +### Max Flow on Subgraph + +```cypher +MATCH (n:NetworkNode)-[r:LINK]->(m:NetworkNode) +WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes, collect(r) AS edges +MATCH (source:NetworkNode {type: "source"}), (sink:NetworkNode {type: "sink"}) +CALL max_flow.get_subgraph(nodes, edges, source, sink, "bandwidth") +YIELD max_flow +RETURN max_flow; +``` + +### Multi-Source Multi-Sink (Using Super Nodes) + +```cypher +// Create super source connected to all sources +// Create super sink connected from all sinks +// Then run max flow between super nodes +MATCH (source:Source), (sink:Sink) +WITH collect(source) AS sources, collect(sink) AS sinks +// Add super source/sink logic here +CALL max_flow.get(super_source, super_sink, "capacity") +YIELD max_flow +RETURN max_flow; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| max_flow | Float | Maximum flow value achievable | +| edge | Relationship | Edge in the flow network (for detailed version) | +| flow | Float | Flow through the edge | + +## Example Results + +``` +╔══════════════════╗ +║ max_flow ║ +╠══════════════════╣ +║ 23.5 ║ +╚══════════════════╝ + +╔════════════╦════════════╦═══════╦══════════╗ +║ from ║ to ║ flow ║ capacity ║ +╠════════════╬════════════╬═══════╬══════════╣ +║ "A" ║ "B" ║ 10.0 ║ 10.0 ║ +║ "A" ║ "C" ║ 13.5 ║ 15.0 ║ +║ "B" ║ "D" ║ 10.0 ║ 12.0 ║ +╚════════════╩════════════╩═══════╩══════════╝ +``` + +## Common Edge Cases + +1. **No path exists**: Max flow = 0 +2. **Source = Sink**: Max flow = 0 or infinity +3. **Negative capacities**: Not supported +4. **Zero capacity edges**: Ignored in flow +5. **Disconnected network**: Max flow = 0 + +## Applications + +| Domain | Source | Sink | Capacity | +|--------|--------|------|----------| +| Network | Server | Client | Bandwidth | +| Supply chain | Warehouse | Store | Transport capacity | +| Traffic | Origin | Destination | Road capacity | +| Scheduling | Start | End | Time slots | + +## Tips + +- Ensure all edges have valid capacity property +- Saturated edges (flow = capacity) are bottlenecks +- Increase bottleneck capacities to increase max flow +- Use for min-cut problems (max-flow = min-cut) +- Consider directed vs undirected capacity constraints diff --git a/skills/node-similarity/SKILL.md b/skills/node-similarity/SKILL.md new file mode 100644 index 000000000..d3dc18f7a --- /dev/null +++ b/skills/node-similarity/SKILL.md @@ -0,0 +1,133 @@ +--- +name: node-similarity +description: Calculate similarity between nodes based on their connections, properties, or embeddings. Use when the user wants to find similar nodes, recommendations, related entities, or measure node closeness. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: similarity + complexity: "O(V^2) or O(E)" +--- + +# Node Similarity + +Calculate how similar two nodes are based on their neighborhood structure, shared connections, or properties. Useful for recommendations, duplicate detection, and finding related entities. + +## When to Use This Skill + +Use node similarity when: +- Finding similar nodes for recommendations +- Detecting potential duplicates +- Discovering related entities +- Building recommendation systems +- Comparing node neighborhoods + +## Basic Usage + +### Jaccard Similarity + +```cypher +MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) +CALL node_similarity.jaccard(a, b) +YIELD similarity +RETURN similarity; +``` + +### Overlap Coefficient + +```cypher +MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) +CALL node_similarity.overlap(a, b) +YIELD similarity +RETURN similarity; +``` + +### Cosine Similarity + +```cypher +MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) +CALL node_similarity.cosine(a, b) +YIELD similarity +RETURN similarity; +``` + +## Advanced Usage + +### Find Most Similar Nodes + +```cypher +MATCH (target:Person {name: "Alice"}) +MATCH (other:Person) +WHERE other <> target +CALL node_similarity.jaccard(target, other) +YIELD similarity +RETURN other.name AS person, similarity +ORDER BY similarity DESC +LIMIT 5; +``` + +### Similarity Based on Specific Relationship + +```cypher +MATCH (a:Person {name: "Alice"})-[:LIKES]->(item1) +MATCH (b:Person {name: "Bob"})-[:LIKES]->(item2) +WITH a, b, collect(DISTINCT item1) AS items_a, collect(DISTINCT item2) AS items_b +WITH a, b, + [x IN items_a WHERE x IN items_b] AS intersection, + items_a + [x IN items_b WHERE NOT x IN items_a] AS union_set +RETURN a.name, b.name, + toFloat(size(intersection)) / size(union_set) AS jaccard_similarity; +``` + +### Pairwise Similarity for All Nodes + +```cypher +MATCH (a:Product), (b:Product) +WHERE id(a) < id(b) +CALL node_similarity.jaccard(a, b) +YIELD similarity +WHERE similarity > 0.5 +RETURN a.name, b.name, similarity +ORDER BY similarity DESC; +``` + +## Similarity Measures Explained + +| Measure | Formula | Best For | +|---------|---------|----------| +| Jaccard | \|A ∩ B\| / \|A ∪ B\| | General similarity | +| Overlap | \|A ∩ B\| / min(\|A\|, \|B\|) | Subset detection | +| Cosine | A · B / (\|A\| * \|B\|) | Normalized comparison | + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| similarity | Float | Similarity score between 0 and 1 | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ person ║ similarity ║ +╠════════════════════╬═══════════════╣ +║ "Charlie" ║ 0.857 ║ +║ "David" ║ 0.714 ║ +║ "Eve" ║ 0.625 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **No common neighbors**: Similarity = 0 +2. **Identical neighborhoods**: Similarity = 1 +3. **Isolated nodes**: Similarity = 0 (no neighbors to compare) +4. **Self-similarity**: Always = 1 + +## Tips + +- Jaccard is most commonly used for general similarity +- Use Overlap when one set may be a subset of another +- For large graphs, filter candidates first before computing similarity +- Consider the relationship types that define "neighbors" +- Combine with property-based similarity for better results diff --git a/skills/node2vec/SKILL.md b/skills/node2vec/SKILL.md new file mode 100644 index 000000000..79d1e88d4 --- /dev/null +++ b/skills/node2vec/SKILL.md @@ -0,0 +1,141 @@ +--- +name: node2vec +description: Generate node embeddings using the Node2Vec algorithm, which learns vector representations of nodes based on random walks. Use when you need node features for machine learning, similarity calculations, or visualization. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: ml + complexity: "O(V * E)" +--- + +# Node2Vec - Node Embeddings + +Node2Vec learns continuous feature representations (embeddings) for nodes in a graph using random walks. These embeddings capture structural information and can be used for machine learning tasks. + +## When to Use This Skill + +Use Node2Vec when: +- Creating features for node classification +- Generating embeddings for similarity search +- Preparing data for machine learning +- Visualizing graph structure in lower dimensions +- Link prediction tasks + +## Basic Usage + +### Generate Node Embeddings + +```cypher +CALL node2vec.get() +YIELD node, embedding +RETURN node, embedding +LIMIT 10; +``` + +### Node2Vec with Parameters + +```cypher +CALL node2vec.get(128, 10, 80, 1.0, 1.0) +YIELD node, embedding +RETURN node, embedding; +``` + +**Parameters:** +- `dimensions` (default: 128) - Embedding vector size +- `walk_length` (default: 10) - Length of random walks +- `num_walks` (default: 80) - Number of walks per node +- `p` (default: 1.0) - Return parameter (controls revisiting) +- `q` (default: 1.0) - In-out parameter (BFS vs DFS tendency) + +## Advanced Usage + +### Store Embeddings on Nodes + +```cypher +CALL node2vec.get() +YIELD node, embedding +SET node.embedding = embedding +RETURN count(*) AS nodes_updated; +``` + +### Find Similar Nodes Using Embeddings + +```cypher +// First generate embeddings +CALL node2vec.get() YIELD node, embedding +WITH node, embedding +SET node.embedding = embedding; + +// Then find similar nodes +MATCH (target:Person {name: "Alice"}) +MATCH (other:Person) +WHERE other <> target +WITH target, other, + gds.similarity.cosine(target.embedding, other.embedding) AS similarity +RETURN other.name, similarity +ORDER BY similarity DESC +LIMIT 5; +``` + +### Node2Vec on Subgraph + +```cypher +MATCH (n:User)-[r:FOLLOWS]->(m:User) +WITH collect(DISTINCT n) + collect(DISTINCT m) AS nodes, collect(r) AS rels +CALL node2vec.get_subgraph(nodes, rels, 64) +YIELD node, embedding +RETURN node.name AS user, embedding; +``` + +### Visualize Embeddings (Export for t-SNE/UMAP) + +```cypher +CALL node2vec.get(2) +YIELD node, embedding +RETURN node.name AS name, embedding[0] AS x, embedding[1] AS y; +``` + +## Parameter Tuning + +| Parameter | Low Value | High Value | +|-----------|-----------|------------| +| p (return) | Explores locally (DFS-like) | Explores broadly (BFS-like) | +| q (in-out) | Explores outward | Explores inward | +| dimensions | Faster, less expressive | Slower, more expressive | +| walk_length | Local context | Global context | +| num_walks | Noisier embeddings | Smoother embeddings | + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| embedding | List[Float] | Vector embedding of specified dimension | + +## Example Results + +``` +╔════════════════════╦══════════════════════════════════════════╗ +║ node ║ embedding ║ +╠════════════════════╬══════════════════════════════════════════╣ +║ Person(Alice) ║ [0.123, -0.456, 0.789, ...] ║ +║ Person(Bob) ║ [0.234, -0.567, 0.891, ...] ║ +╚════════════════════╩══════════════════════════════════════════╝ +``` + +## Common Edge Cases + +1. **Isolated nodes**: Will have random/default embeddings +2. **Small graphs**: May not need high dimensions +3. **Dense graphs**: Lower q values work better +4. **Sparse graphs**: Higher p values work better + +## Tips + +- Start with default parameters, then tune +- Use p=1, q=1 for neutral exploration (like DeepWalk) +- p<1, q>1 for homophily (similar nodes cluster) +- p>1, q<1 for structural equivalence +- Store embeddings as node properties for reuse +- Retrain when graph structure changes significantly diff --git a/skills/pagerank/SKILL.md b/skills/pagerank/SKILL.md new file mode 100644 index 000000000..cb4eebd1c --- /dev/null +++ b/skills/pagerank/SKILL.md @@ -0,0 +1,105 @@ +--- +name: pagerank +description: Calculate PageRank scores for nodes in a graph to measure their influence based on the number and quality of incoming connections. Use when the user wants to find important, influential, or central nodes, rank nodes by importance, identify key entities, or analyze node influence in networks. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: centrality + complexity: "O(V + E)" +--- + +# PageRank Algorithm + +PageRank calculates the influence of nodes based on recursive information about connected nodes' influence. Originally developed for ranking web pages, it's useful for any network where influence propagates through connections. + +## When to Use This Skill + +Use PageRank when: +- Finding the most important/influential nodes in a network +- Ranking entities by their connectivity importance +- Identifying key players in social networks +- Finding authoritative sources in citation networks +- Detecting influential accounts in social media graphs + +## Basic Usage + +### Run PageRank on Entire Graph + +```cypher +CALL pagerank.get() +YIELD node, rank +RETURN node, rank +ORDER BY rank DESC +LIMIT 10; +``` + +### Run PageRank with Custom Parameters + +```cypher +CALL pagerank.get(100, 0.85) +YIELD node, rank +RETURN node, rank +ORDER BY rank DESC; +``` + +**Parameters:** +- `max_iterations` (default: 100) - Maximum number of iterations +- `damping_factor` (default: 0.85) - Probability of following a link (0-1) + +## Advanced Usage + +### PageRank on Subgraph + +```cypher +MATCH (n:Person)-[r:KNOWS]->(m:Person) +WITH collect(n) + collect(m) AS nodes, collect(r) AS relationships +CALL pagerank.get_subgraph(nodes, relationships, 100, 0.85) +YIELD node, rank +RETURN node.name AS name, rank +ORDER BY rank DESC; +``` + +### Filter Results by Label + +```cypher +CALL pagerank.get() +YIELD node, rank +WHERE node:Person +RETURN node.name AS name, rank +ORDER BY rank DESC +LIMIT 20; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| rank | Float | PageRank score (higher = more influential) | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ rank ║ +╠════════════════════╬═══════════════╣ +║ "Alice" ║ 0.287 ║ +║ "Bob" ║ 0.234 ║ +║ "Charlie" ║ 0.198 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Disconnected graphs**: Nodes in disconnected components will have independent PageRank calculations +2. **Sink nodes** (no outgoing edges): The damping factor handles these by redistributing probability +3. **Self-loops**: Handled normally but may inflate a node's rank +4. **Empty graph**: Returns empty results + +## Tips for Interpretation + +- Scores are relative, not absolute - compare nodes within the same graph +- Higher damping factor (closer to 1) = more weight on link structure +- Lower damping factor = more uniform distribution +- Total PageRank sums to approximately 1.0 (or number of nodes depending on implementation) diff --git a/skills/shortest-path/SKILL.md b/skills/shortest-path/SKILL.md new file mode 100644 index 000000000..c039c026f --- /dev/null +++ b/skills/shortest-path/SKILL.md @@ -0,0 +1,118 @@ +--- +name: shortest-path +description: Find the shortest path between two nodes in a graph, optionally considering edge weights. Use when the user wants to find paths, routes, connections, distances, or navigation between nodes. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: pathfinding + complexity: "O(V + E log V)" +--- + +# Weighted Shortest Path + +Find the path between two nodes that minimizes the total weight (or hop count if unweighted). Essential for navigation, routing, and finding optimal connections. + +## When to Use This Skill + +Use shortest path when: +- Finding the most efficient route between two points +- Calculating distances or costs between nodes +- Navigation and routing problems +- Finding how entities are connected +- Determining degrees of separation + +## Basic Usage + +### Unweighted Shortest Path (BFS) + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS]-(b:Person {name: "Bob"}) +RETURN p; +``` + +### Weighted Shortest Path (Dijkstra) + +```cypher +MATCH p = (a:City {name: "New York"})-[*WSHORTEST (r, n | r.distance)]-(b:City {name: "Los Angeles"}) +RETURN p, reduce(total = 0, r IN relationships(p) | total + r.distance) AS total_distance; +``` + +### Using algo Module + +```cypher +MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"}) +CALL algo.dijkstra(a, b, "distance") +YIELD path, weight +RETURN path, weight; +``` + +## Advanced Usage + +### All Shortest Paths (Same Length) + +```cypher +MATCH p = allShortestPaths((a:Person {name: "Alice"})-[*]-(b:Person {name: "Bob"})) +RETURN p; +``` + +### K Shortest Paths + +```cypher +MATCH (a:City {name: "NYC"}), (b:City {name: "LA"}) +CALL algo.k_shortest_paths(a, b, 5, "distance") +YIELD path, weight +RETURN path, weight +ORDER BY weight ASC; +``` + +### Shortest Path with Relationship Type Filter + +```cypher +MATCH p = (a:Person {name: "Alice"})-[:KNOWS|WORKS_WITH *BFS]-(b:Person {name: "Bob"}) +RETURN p, length(p) AS path_length; +``` + +### Shortest Path with Max Depth + +```cypher +MATCH p = (a:Person {name: "Alice"})-[*BFS..5]-(b:Person {name: "Bob"}) +RETURN p; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| path | Path | The shortest path as a sequence of nodes and relationships | +| weight | Float | Total path weight (for weighted queries) | + +## Example Results + +``` +Path: (Alice)-[:KNOWS]->(Charlie)-[:KNOWS]->(Bob) +Total Weight: 2.5 +``` + +## Weighted vs Unweighted + +| Type | Use When | Syntax | +|------|----------|--------| +| Unweighted | All edges equal | `*BFS` | +| Weighted | Edges have costs | `*WSHORTEST (r, n \| r.cost)` | + +## Common Edge Cases + +1. **No path exists**: Returns NULL/empty result +2. **Multiple shortest paths**: Use `allShortestPaths` to get all +3. **Negative weights**: Not supported (use appropriate algorithm) +4. **Self-loops**: Source equals target returns empty path +5. **Disconnected nodes**: No path returned + +## Tips + +- Use `*BFS` for unweighted graphs (faster) +- Use `*WSHORTEST` when edge weights matter +- Specify relationship types to constrain the search +- Add max depth to limit search scope +- Consider using indexes on starting/ending node properties diff --git a/skills/tsp/SKILL.md b/skills/tsp/SKILL.md new file mode 100644 index 000000000..5e575ca6c --- /dev/null +++ b/skills/tsp/SKILL.md @@ -0,0 +1,108 @@ +--- +name: tsp +description: Solve the Traveling Salesman Problem to find the shortest route that visits each node exactly once and returns to the starting point. Use when optimizing routes, planning visits, or finding optimal tours through locations. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: optimization + complexity: "O(n!)" +--- + +# Traveling Salesman Problem (TSP) + +Find the shortest possible route that visits each node exactly once and returns to the origin. A classic optimization problem useful for route planning and logistics. + +## When to Use This Skill + +Use TSP when: +- Planning delivery or visit routes +- Optimizing travel itineraries +- Circuit board drilling sequences +- DNA sequencing optimization +- Any shortest tour problem + +## Basic Usage + +### Solve TSP + +```cypher +MATCH (n:City) +WITH collect(n) AS cities +CALL tsp.solve(cities) +YIELD path, total_weight +RETURN path, total_weight; +``` + +### TSP with Specific Start + +```cypher +MATCH (start:City {name: "New York"}) +MATCH (other:City) +WHERE other <> start +WITH start, collect(other) AS cities +CALL tsp.solve(cities, start) +YIELD path, total_weight +RETURN path, total_weight; +``` + +## Advanced Usage + +### TSP with Custom Distance Property + +```cypher +MATCH (n:Location) +WITH collect(n) AS locations +CALL tsp.solve(locations, "distance") +YIELD path, total_weight +RETURN path, total_weight; +``` + +### TSP on Subgraph + +```cypher +MATCH (n:Warehouse)-[r:CONNECTED_TO]-(m:Warehouse) +WITH collect(DISTINCT n) AS warehouses +CALL tsp.solve(warehouses) +YIELD path, total_weight +RETURN [node IN nodes(path) | node.name] AS route, total_weight AS total_distance; +``` + +### Visualize TSP Route + +```cypher +CALL tsp.solve(nodes) +YIELD path +UNWIND range(0, size(nodes(path))-2) AS i +WITH nodes(path)[i] AS from, nodes(path)[i+1] AS to +RETURN from.name AS from_city, to.name AS to_city; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| path | Path | The optimal tour as a path | +| total_weight | Float | Total distance/weight of the tour | + +## Example Results + +``` +Route: New York → Boston → Philadelphia → Washington → New York +Total Distance: 892.5 km +``` + +## Common Edge Cases + +1. **Less than 3 nodes**: Returns trivial solution +2. **Disconnected nodes**: May not find valid tour +3. **Large graphs**: Exponential complexity - use approximations +4. **Missing distances**: Uses default or Euclidean distance + +## Tips + +- TSP is NP-hard - exact solutions are slow for large graphs +- For >15-20 nodes, consider approximation algorithms +- Ensure all nodes are connected (directly or indirectly) +- Pre-compute distances for better performance +- For real-world routing, consider using VRP instead diff --git a/skills/weakly-connected-components/SKILL.md b/skills/weakly-connected-components/SKILL.md new file mode 100644 index 000000000..e721c69b2 --- /dev/null +++ b/skills/weakly-connected-components/SKILL.md @@ -0,0 +1,132 @@ +--- +name: weakly-connected-components +description: Find weakly connected components in a graph - groups of nodes where each node can reach every other node in the group, ignoring edge direction. Use when the user wants to find connected groups, isolated subgraphs, or check graph connectivity. +license: Apache-2.0 +metadata: + author: memgraph + version: "1.0" + algorithm_type: connectivity + complexity: "O(V + E)" +--- + +# Weakly Connected Components + +Find groups of nodes where there exists a path between any two nodes in the group (ignoring edge direction). Essential for understanding graph structure and finding isolated subgraphs. + +## When to Use This Skill + +Use weakly connected components when: +- Finding separate groups or clusters in a graph +- Checking if a graph is fully connected +- Identifying isolated nodes or subgraphs +- Preprocessing before other algorithms +- Analyzing network fragmentation + +## Basic Usage + +### Find All Components + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +RETURN node, component_id +ORDER BY component_id; +``` + +### Count Components + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +RETURN count(DISTINCT component_id) AS num_components; +``` + +### Get Component Sizes + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +RETURN component_id, count(node) AS size +ORDER BY size DESC; +``` + +## Advanced Usage + +### Find Largest Component + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +WITH component_id, collect(node) AS nodes, count(*) AS size +ORDER BY size DESC +LIMIT 1 +UNWIND nodes AS node +RETURN node, component_id; +``` + +### Find Isolated Nodes (Components of Size 1) + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +WITH component_id, collect(node) AS nodes +WHERE size(nodes) = 1 +UNWIND nodes AS isolated_node +RETURN isolated_node; +``` + +### Components on Subgraph + +```cypher +MATCH (n:Person)-[r:KNOWS]->(m:Person) +WITH collect(n) + collect(m) AS nodes, collect(r) AS rels +CALL weakly_connected_components.get_subgraph(nodes, rels) +YIELD node, component_id +RETURN node.name AS name, component_id +ORDER BY component_id; +``` + +### Check If Graph Is Connected + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +WITH count(DISTINCT component_id) AS num_components +RETURN CASE WHEN num_components = 1 THEN "Connected" ELSE "Disconnected" END AS status; +``` + +## Output Format + +| Column | Type | Description | +|--------|------|-------------| +| node | Node | The graph node | +| component_id | Integer | Component identifier (same ID = same component) | + +## Example Results + +``` +╔════════════════════╦═══════════════╗ +║ name ║ component_id ║ +╠════════════════════╬═══════════════╣ +║ "Alice" ║ 0 ║ +║ "Bob" ║ 0 ║ +║ "Charlie" ║ 0 ║ +║ "David" ║ 1 ║ +║ "Eve" ║ 1 ║ +╚════════════════════╩═══════════════╝ +``` + +## Common Edge Cases + +1. **Empty graph**: No components returned +2. **Single node**: Forms its own component +3. **Fully connected**: One component containing all nodes +4. **Directed edges**: Direction is ignored (weakly connected) + +## Tips + +- Use before other algorithms to analyze components separately +- Component IDs are arbitrary - only equality matters +- For directed connectivity, use strongly connected components +- Useful for data quality checks (unexpected disconnected parts) +- Consider visualizing small components to understand fragmentation