Skip to content

Organizational data service library with indexed queries, GCS support, and live updates

License

Notifications You must be signed in to change notification settings

openshift-eng/cyborg-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Organizational Data Core Package

This package provides the core functionality for accessing and querying organizational data in a performant, indexed manner.

Overview

The orgdatacore package is designed to be a reusable component that can be consumed by multiple services including:

  • Slack bots (ci-chat-bot)
  • REST APIs
  • CLI tools
  • Other organizational data consumers

Features

  • Fast Data Access: Pre-computed indexes enable O(1) lookups for common queries
  • Thread-Safe: Concurrent access with read-write mutex protection
  • Hot Reload: Support for dynamic data updates without service restart
  • Pluggable Data Sources: Load from files, GCS, or implement custom sources
  • Comprehensive Queries: Employee, team, and organization lookups with membership validation
  • Cross-Cluster Ready: Designed for distributed deployments with remote data sources

Usage

Basic Setup with Files

package main

import (
    "context"
    orgdatacore "github.com/openshift-eng/cyborg-data"
)

func main() {
    // Create a new service
    service := orgdatacore.NewService()
    
    // Load data using FileDataSource
    fileSource := orgdatacore.NewFileDataSource("comprehensive_index_dump.json")
    err := service.LoadFromDataSource(context.Background(), fileSource)
    if err != nil {
        log.Fatal(err)
    }
}

Using DataSource Interface

package main

import (
    "context"
    orgdatacore "github.com/openshift-eng/cyborg-data"
)

func main() {
    service := orgdatacore.NewService()
    
    // Load from files using DataSource interface
    fileSource := orgdatacore.NewFileDataSource("comprehensive_index_dump.json")
    err := service.LoadFromDataSource(context.Background(), fileSource)
    if err != nil {
        log.Fatal(err)
    }
    
    // Start watching for file changes
    service.StartDataSourceWatcher(context.Background(), fileSource)
}

Google Cloud Storage Support

For GCS support, first add the GCS SDK dependency and build with the gcs tag:

go get cloud.google.com/go/storage
go build -tags gcs
package main

import (
    "context"
    "time"
    orgdatacore "github.com/openshift-eng/cyborg-data"
)

func main() {
    service := orgdatacore.NewService()
    
    // Configure GCS
    config := orgdatacore.GCSConfig{
        Bucket:        "orgdata-sensitive",
        ObjectPath:    "orgdata/comprehensive_index_dump.json",
        ProjectID:     "your-project-id",
        CheckInterval: 5 * time.Minute,
        // Optional: provide service account credentials directly
        // CredentialsJSON: `{"type":"service_account",...}`,
    }
    
    // Load from GCS using the SDK implementation
    gcsSource, err := orgdatacore.NewGCSDataSourceWithSDK(context.Background(), config)
    if err != nil {
        log.Fatal(err)
    }
    
    err = service.LoadFromDataSource(context.Background(), gcsSource)
    if err != nil {
        log.Fatal(err)
    }
    
    // Start watching for GCS changes
    service.StartDataSourceWatcher(context.Background(), gcsSource)
}

Data Structure

The package expects data in the comprehensive_index_dump.json format generated by the Python orglib indexing system from the cyborg project.

Service Architecture

Query Performance

All queries use pre-computed indexes for O(1) performance:

  • Employee lookups: Direct map access via UID or Slack ID
  • Team membership: Pre-computed membership index eliminates tree traversal
  • Organization hierarchy: Flattened relationship index for instant ancestry queries
  • Slack mappings: Dedicated index for Slack ID → UID resolution

Thread Safety

The service uses read-write mutex protection:

  • Read operations (queries): Multiple concurrent readers supported
  • Write operations (data loading): Exclusive access during updates
  • Hot reload: Atomic data replacement without query interruption

Data Structure Optimization

// Optimized for fast lookups
type Data struct {
    Lookups  Lookups  // Direct object access: O(1)
    Indexes  Indexes  // Pre-computed relationships: O(1) 
}

// Example: Employee lookup
employee := data.Lookups.Employees[uid]  // Direct map access

// Example: Team membership 
memberships := data.Indexes.Membership.MembershipIndex[uid]  // Pre-computed list

Service Methods

Employee Queries

// Primary employee lookup by UID
employee := service.GetEmployeeByUID("jsmith")

// Slack integration - lookup by Slack user ID
employee = service.GetEmployeeBySlackID("U123ABC456")

// Returns *Employee with: UID, FullName, Email, JobTitle, SlackUID

Team Operations

// Get team details
team := service.GetTeamByName("Platform SRE")

// Get all teams for an employee
teams := service.GetTeamsForUID("jsmith")

// Check team membership
isMember := service.IsEmployeeInTeam("jsmith", "Platform SRE")
isSlackMember := service.IsSlackUserInTeam("U123ABC456", "Platform SRE")

// Get all team members
members := service.GetTeamMembers("Platform SRE")

Organization Queries

// Get organization details
org := service.GetOrgByName("Engineering")

// Check organization membership (includes inherited via teams)
isMember := service.IsEmployeeInOrg("jsmith", "Engineering")
isSlackMember := service.IsSlackUserInOrg("U123ABC456", "Engineering")

// Get complete organizational context
orgs := service.GetUserOrganizations("U123ABC456")
// Returns: teams, orgs, pillars, team_groups user belongs to

Performance Characteristics

Operation Complexity Index Used
GetEmployeeByUID O(1) lookups.employees
GetEmployeeBySlackID O(1) indexes.slack_id_mappings
GetTeamsForUID O(1) indexes.membership.membership_index
IsEmployeeInTeam O(1) Pre-computed membership
GetUserOrganizations O(1) Flattened hierarchy index

No expensive tree traversals - all organizational relationships are pre-computed during indexing.

Data Sources

The package supports pluggable data sources through the DataSource interface:

Built-in Data Sources

  1. FileDataSource - Local JSON files

    • No additional dependencies
    • Supports file watching with polling
    • Ideal for development and file-based deployments
  2. GCSDataSource - Google Cloud Storage

    • Requires GCS SDK: go get cloud.google.com/go/storage
    • Build with -tags gcs for full functionality
    • Supports hot reload with configurable polling interval
    • Uses Application Default Credentials (ADC) or service account JSON
    • Ideal for production cross-cluster deployments in GCP

Custom Data Sources

Implement the DataSource interface to create custom sources:

type DataSource interface {
    Load(ctx context.Context) (io.ReadCloser, error)
    Watch(ctx context.Context, callback func() error) error
    String() string
}

Examples of custom sources you could implement:

  • HTTP/HTTPS endpoints
  • AWS S3 or other S3-compatible storage
  • Git repositories
  • Database queries
  • Redis/Memcached for caching layers

Logging

The package uses structured logging via the logr interface, making it compatible with OpenShift and Kubernetes logging standards.

Default: Uses stdr (standard library logger wrapper) OpenShift Integration:

import "k8s.io/klog/v2/klogr"
import orgdatacore "github.com/openshift-eng/cyborg-data"

func init() {
    orgdatacore.SetLogger(klogr.New())
}

Log events include data source changes, reload operations, and error conditions with structured key-value context.

Dependencies

  • Go 1.19+
  • Standard library only (no external dependencies for file sources)
  • Optional: GCS SDK for Google Cloud Storage support (cloud.google.com/go/storage)

About

Organizational data service library with indexed queries, GCS support, and live updates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •