Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/deploy-docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Deploy Docusaurus to GitHub Pages

on:
push:
branches:
- main
paths:
- 'docsite/**' # Only run if docsite/ changes
workflow_dispatch: # Allow manual triggering if needed

jobs:
build-deploy:
runs-on: ubuntu-latest

steps:
# Checkout repo
- name: Checkout repository
uses: actions/checkout@v4

# Setup Node.js
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: 18 # match your local version
cache: 'yarn'

# Install dependencies
- name: Install dependencies
working-directory: docsite
run: yarn install --frozen-lockfile

# Build site
- name: Build website
working-directory: docsite
run: yarn build

# Deploy to gh-pages
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v4
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docsite/build
publish_branch: gh-pages
36 changes: 36 additions & 0 deletions .github/workflows/vale-linter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Vale Lint Checker

on:
push:
branches:
- main
paths:
- 'docsite/**' # Only run when files in docsite/ change
pull_request:
branches:
- '*'
paths:
- 'docsite/**' # Only run lint on PRs that touch docsite/
workflow_dispatch:

jobs:
prose:
runs-on: ubuntu-latest
steps:
# Step 1: Check out the repository code
- name: Checkout Code
uses: actions/checkout@v3

# Step 2: Set up Node.js
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: 16

# Step 3: Run Vale lint checks
- name: Vale Lint
uses: errata-ai/vale-action@reviewdog
with:
files: docsite/ # only lint inside docsite/
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
20 changes: 20 additions & 0 deletions docsite/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Dependencies
/node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
7 changes: 7 additions & 0 deletions docsite/.vale.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
StylesPath = styles

MinAlertLevel = suggestion

[*.md]

BasedOnStyles = Vale, Microsoft
41 changes: 41 additions & 0 deletions docsite/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Website

This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.

## Installation

```bash
yarn
```

## Local Development

```bash
yarn start
```

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

## Build

```bash
yarn build
```

This command generates static content into the `build` directory and can be served using any static contents hosting service.

## Deployment

Using SSH:

```bash
USE_SSH=true yarn deploy
```

Not using SSH:

```bash
GIT_USER=<Your GitHub username> yarn deploy
```

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
17 changes: 17 additions & 0 deletions docsite/docs/core-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
sidebar_position: 3
---

# Core Concepts

## Knowledge Builder

The `KnowledgeBuilder` is the main entry point for building the semantic layer. It takes a dictionary of datasets as input and orchestrates the entire process of profiling, link prediction, and business glossary generation.

## Data Product Builder

The `DataProductBuilder` is used to generate data products from the semantic layer. It takes an ETL model as input and generates a unified data product that can be used for analysis and exploration.

## Semantic Search

The semantic search feature allows you to search for columns in your datasets using natural language. It uses a hybrid search approach that combines dense and sparse vectors for more accurate results.
52 changes: 52 additions & 0 deletions docsite/docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
sidebar_position: 2
---

# Getting Started

## Installation

For Windows and Linux, you can follow these steps. For macOS, please see the additional steps in the macOS section below.

Before installing, it is recommended to create a virtual environment:

```bash
python -m venv .venv
source .venv/bin/activate
```

Then, install the package:

```bash
pip install intugle
```

### macOS

For macOS users, you may need to install the `libomp` library:

```bash
brew install libomp
```

If you installed Python using the official installer from python.org, you may also need to install SSL certificates by running the following command in your terminal. Please replace `3.XX` with your specific Python version. This step is not necessary if you installed Python using Homebrew.

```bash
/Applications/Python\ 3.XX/Install\ Certificates.command
```

## Configuration

Before running the project, you need to configure a LLM. This is used for tasks like generating business glossaries and predicting links between tables.

You can configure the LLM by setting the following environment variables:

* `LLM_PROVIDER`: The LLM provider and model to use (e.g., `openai:gpt-3.5-turbo`) following LangChain's [conventions](https://python.langchain.com/docs/integrations/chat/)
* `API_KEY`: Your API key for the LLM provider. The exact name of the variable may vary from provider to provider.

Here's an example of how to set these variables in your environment:

```bash
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
```
7 changes: 7 additions & 0 deletions docsite/docs/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 1
---

# Introduction

Intugle’s GenAI-powered open-source Python library builds an intelligent semantic layer over your existing data systems. At its core, it discovers meaningful links and relationships across data assets — enriching them with profiles, classifications, and business glossaries. With this connected knowledge layer, you can enable semantic search and auto-generate queries to create unified data products, making data integration and exploration faster, more accurate, and far less manual.
134 changes: 134 additions & 0 deletions docsite/docusaurus.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
import {themes as prismThemes} from 'prism-react-renderer';
import type {Config} from '@docusaurus/types';
import type * as Preset from '@docusaurus/preset-classic';

// This runs in Node.js - Don't use client-side code here (browser APIs, JSX...)

const config: Config = {
title: 'Intugle Data Tools',
tagline: 'The GenAI-powered toolkit for automated data intelligence.',
favicon: 'img/intugle-logo.png',

// Future flags, see https://docusaurus.io/docs/api/docusaurus-config#future
future: {
v4: true, // Improve compatibility with the upcoming Docusaurus v4
},

// Set the production url of your site here
url: 'https://intugle.github.io/',
// Set the /<baseUrl>/ pathname under which your site is served
// For GitHub pages deployment, it is often '/<projectName>/'
baseUrl: '/data-tools/',

// GitHub pages deployment config.
// If you aren't using GitHub pages, you don't need these.
organizationName: 'Intugle', // Usually your GitHub org/user name.
projectName: 'data-tools', // Usually your repo name.

onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',

// Even if you don't use internationalization, you can use this field to set
// useful metadata like html lang. For example, if your site is Chinese, you
// may want to replace "en" with "zh-Hans".
i18n: {
defaultLocale: 'en',
locales: ['en'],
},

presets: [
[
'classic',
{
docs: {
sidebarPath: './sidebars.ts',
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
'https://github.com/Intugle/data-tools/tree/main/docsite/',
},
blog: false,
theme: {
customCss: './src/css/custom.css',
},
} satisfies Preset.Options,
],
],

themeConfig: {
// Replace with your project's social card
image: 'img/docusaurus-social-card.jpg',
navbar: {
title: 'Data Tools',
logo: {
alt: 'Intugle Data Tools Logo',
src: 'img/intugle-logo.png',
},
items: [
{
type: 'docSidebar',
sidebarId: 'docsSidebar',
position: 'left',
label: 'Guide',
},
{
to: '/docs/intro',
label: 'Examples',
position: 'left'
},
{
href: 'https://github.com/Intugle/data-tools',
label: 'GitHub',
position: 'right',
},
],
},
footer: {
style: 'dark',
links: [
{
title: 'Docs',
items: [
{
label: 'Tutorial',
to: '/docs/intro',
},
],
},
{
title: 'Community',
items: [
{
label: 'Stack Overflow',
href: 'https://stackoverflow.com/questions/tagged/docusaurus',
},
{
label: 'Discord',
href: 'https://discordapp.com/invite/docusaurus',
},
{
label: 'X',
href: 'https://x.com/docusaurus',
},
],
},
{
title: 'More',
items: [
{
label: 'GitHub',
href: 'https://github.com/Intugle/data-tools',
},
],
},
],
copyright: `Copyright © ${new Date().getFullYear()} Intugle, Inc. Built with Docusaurus.`,
},
prism: {
theme: prismThemes.github,
darkTheme: prismThemes.dracula,
},
} satisfies Preset.ThemeConfig,
};

export default config;
Loading
Loading