Skip to content

Neo4j visualization & Python driver fail to fetch data synced by ClickGraph #13

@peterhunter99001-cyber

Description

@peterhunter99001-cyber

1. docker-compose.yaml Content

version: '3.8'

services:
  clickhouse-service:
    image: clickhouse/clickhouse-server:25.8.11
    container_name: clickhouse
    environment:
      CLICKHOUSE_DB: "zeek"
      CLICKHOUSE_USER: "test_user"
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"
      CLICKHOUSE_PASSWORD: "test_pass"
    ports:
      - "9000:9000"
      - "8123:8123"
    healthcheck:
      test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    volumes:
      - clickhouse_data:/var/lib/clickhouse  # Named volume - Windows MergeTree fix

  neo4j:
    image: neo4j:5.15-community
    container_name: neo4j
    environment:
      NEO4J_AUTH: neo4j/test_password
      NEO4J_PLUGINS: '["apoc"]'
      NEO4J_apoc_export_file_enabled: "true"
      NEO4J_apoc_import_file_enabled: "true"
      NEO4J_apoc_import_file_use__neo4j__config: "true"
    ports:
      - "7474:7474"  # HTTP
      - "7687:7687"  # Bolt
    volumes:
      - neo4j_data:/data
      - neo4j_logs:/logs
    healthcheck:
      test: ["CMD", "cypher-shell", "-u", "neo4j", "-p", "test_password", "RETURN 1"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  clickgraph:
    image: genezhang/clickgraph:latest
    # To build locally instead, uncomment:
    # build:
    #   context: .
    #   dockerfile: Dockerfile
    container_name: clickgraph
    depends_on:
      clickhouse-service:
        condition: service_healthy
    environment:
      CLICKHOUSE_URL: "http://clickhouse-service:8123"
      CLICKHOUSE_USER: "test_user"
      CLICKHOUSE_PASSWORD: "test_pass"
      CLICKHOUSE_DATABASE: "zeek"
      GRAPH_CONFIG_PATH: "/app/zeek_conn_log.yaml"
    ports:
      - "8080:8080"
    volumes:
      - ./zeek_conn_log.yaml:/app/zeek_conn_log.yaml:ro

volumes:
  clickhouse_data:
  neo4j_data:
  neo4j_logs:

2. zeek_conn_log.yaml Content

name: zeek_conn_log
version: "1.0"

graph_schema:
  nodes:
    # IP is a virtual node - all data comes from the conn_log table
    - label: IP
      database: zeek
      table: conn_log
      id_column: ip  # Logical property name for the IP address
      property_mappings: {}  # Empty - node is purely denormalized
      
      # Properties when IP appears as source (from_node)
      from_node_properties:
        ip: "id_orig_h"      # IP.ip → conn_log."id.orig_h"
      
      # Properties when IP appears as destination (to_node)
      to_node_properties:
        ip: "id_resp_h"      # IP.ip → conn_log."id.resp_h"

  relationships:
    # ACCESSED relationship - "192.168.4.76 accessed 192.168.4.1"
    - type: ACCESSED
      database: zeek
      table: conn_log
      from_id: "id_orig_h"
      to_id: "id_resp_h"
      from_node: IP
      to_node: IP
      
      # Use uid as edge ID (unique per connection)
      edge_id: [from_id, to_id]
      property_mappings: {}  # Empty - edge is purely denormalized

3. Problem Description

I can normally retrieve data using the following command:

curl -X POST http://localhost:8080/query -H "Content-Type: application/json" -d '{
"query": "MATCH (head:IP)-[:ACCESSED]->(tail:IP) WHERE head.ip = \"192.168.5.115\" RETURN tail.ip"
}'

Response:

{"results":[{"tail.ip":"10.0.0.11"},{"tail.ip":"10.0.0.7"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.9"},{"tail.ip":"10.0.0.3"},{"tail.ip":"10.0.0.9"},{"tail.ip":"10.0.0.4"},{"tail.ip":"10.0.0.2"},{"tail.ip":"10.0.0.3"},{"tail.ip":"10.0.0.10"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.10"},{"tail.ip":"10.0.0.10"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.12"},{"tail.ip":"10.0.0.2"},{"tail.ip":"10.0.0.6"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.12"},{"tail.ip":"10.0.0.12"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.6"},{"tail.ip":"10.0.0.11"},{"tail.ip":"10.0.0.1"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.1"},{"tail.ip":"10.0.0.11"},{"tail.ip":"10.0.0.5"},{"tail.ip":"10.0.0.1"},{"tail.ip":"10.0.0.10"},{"tail.ip":"10.0.0.7"},{"tail.ip":"10.0.0.1"},{"tail.ip":"10.0.0.2"},{"tail.ip":"10.0.0.5"},{"tail.ip":"10.0.0.2"},{"tail.ip":"10.0.0.4"},{"tail.ip":"10.0.0.2"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.9"},{"tail.ip":"10.0.0.5"},{"tail.ip":"10.0.0.3"},{"tail.ip":"10.0.0.6"},{"tail.ip":"10.0.0.7"},{"tail.ip":"10.0.0.7"},{"tail.ip":"10.0.0.8"},{"tail.ip":"10.0.0.5"},{"tail.ip":"10.0.0.4"},{"tail.ip":"10.0.0.3"},{"tail.ip":"10.0.0.5"},{"tail.ip":"10.0.0.6"},{"tail.ip":"10.0.0.10"},{"tail.ip":"10.0.0.2"},{"tail.ip":"10.0.0.11"},{"tail.ip":"10.0.0.9"},{"tail.ip":"10.0.0.4"},{"tail.ip":"10.0.0.11"},{"tail.ip":"10.0.0.7"},{"tail.ip":"10.0.0.11"},{"tail.ip":"10.0.0.1"}]}

However, I cannot connect successfully when configuring the database zeek_conn_log and connecting to port 7687 via the Neo4j visualization interface (port 7474) installed in docker-compose.yaml.

Additionally, I cannot retrieve data using the following Python code:

from neo4j import GraphDatabase


def test_graph_query():
    """Test 3: Query graph data (assumes demo data is loaded)"""
    print("\n" + "=" * 60)
    print("TEST 3: Graph Query (User nodes)")
    print("=" * 60)

    try:
        driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "test_password"))
        with driver.session() as session:
            # Query for User nodes
            result = session.run("MATCH (u:IP) RETURN u.ip AS name LIMIT 5")
            records = list(result)

            print(f"✅ Found {len(records)} IP:")
            for record in records:
                print(f"   - {record['name']}")

            return len(records) > 0

    except Exception as e:
        print(f"❌ Graph query failed: {e}")
        import traceback
        traceback.print_exc()
        return False
    finally:
        driver.close()

test_graph_query()

Command execution result:

============================================================
TEST 3: Graph Query (User nodes)
============================================================
Received notification from DBMS server: <GqlStatusObject gql_status='01N42', status_description="One of the labels in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing label name is: IP)", position=<SummaryInputPosition line=1, column=10, offset=9>, raw_classification='UNRECOGNIZED', classification=<NotificationClassification.UNRECOGNIZED: 'UNRECOGNIZED'>, raw_severity='WARNING', severity=<NotificationSeverity.WARNING: 'WARNING'>, diagnostic_record={'OPERATION': '', 'OPERATION_CODE': '0', 'CURRENT_SCHEMA': '/', '_classification': 'UNRECOGNIZED', '_severity': 'WARNING', '_position': {'column': 10, 'offset': 9, 'line': 1}}> for query: 'MATCH (u:IP) RETURN u.ip AS name LIMIT 5'
Received notification from DBMS server: <GqlStatusObject gql_status='01N42', status_description="One of the property names in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing property name is: ip)", position=<SummaryInputPosition line=1, column=23, offset=22>, raw_classification='UNRECOGNIZED', classification=<NotificationClassification.UNRECOGNIZED: 'UNRECOGNIZED'>, raw_severity='WARNING', severity=<NotificationSeverity.WARNING: 'WARNING'>, diagnostic_record={'OPERATION': '', 'OPERATION_CODE': '0', 'CURRENT_SCHEMA': '/', '_classification': 'UNRECOGNIZED', '_severity': 'WARNING', '_position': {'column': 23, 'offset': 22, 'line': 1}}> for query: 'MATCH (u:IP) RETURN u.ip AS name LIMIT 5'
✅ Found 0 IP:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions