Skip to content

Latest commit

 

History

History
executable file
·
300 lines (234 loc) · 13 KB

File metadata and controls

executable file
·
300 lines (234 loc) · 13 KB

Open Data Hub Workshop Setup Instructions

Prerequisites

You'll need:

  • An OpenShift 4.8 cluster - with admin rights. You can create one by following the instructions here, or via RHPDS (Red Hat staff only).
  • the OpenShift command line interface, oc available here

Workshop Structure

There are two versions of this workshop you can choose to use:

  • an FSI Use Case
  • a Telco use case Both are functionally identical - but use different product data examples, applicable to the chosen use case. At various part of the workshop, you use different files approapiate to your chosen use case.

REVISIT: This only has the FSI data files.

Download the Workshop Files

If you are running this as a workshop, it is recommended you fork this repo as there are changes you can make to your instance of the repo, that will simplify the experience for the students. See section Updating Tool URLs below.

Using the example below:

  1. Clone (or fork) this repo.
  2. Change directory into the root dir, ml-workshop.
  3. Create a variable REPO_HOME_ for this directory

REVISIT: Change to a non-personal repo, and clone based on a tag/branch:
git clone -b tag --single-branch https:// github.com/bryonbaker/ml-workshop

git clone https://github.com/bryonbaker/ml-workshop
cd ml-workshop
export REPO_HOME=`pwd`

Install the Open Data Hub Operator

  1. Log on to OpenShift as a Cluster Administrator. (For RHPDS this is opentlc-mgr.)
  2. Select the Administrator perspective
  3. Install the Open Data Hub operator. Click Operators > Operator Hub
    OpenShift displays the operator catalogue.
  4. Click the Filter by keybord text box and type open data hub
    OpenShift displays the Open Data Hub Operator tile.
  5. Click the tile
    OpenShift displays a Commmunity Operator warning dialog box.
  6. Click Continue
    OpenShift displays the operator details.
  7. Click Install
    OpenShift prompts for the operator configuration details.
    drawing
  8. Accept all defaults and click *Install
    OpenShift installs the operator and displays a diaglog box once complete.
    drawing
  9. Click View Operator
    OpenShift displays the operator details.
    drawing

The Open Data Hub Operator is now installed. Proceed to create the workshop project and install Open Data Hub

Project Creation & ODH Installation Steps

We will now create the workshop's project and install Open Data Hub into the project.
Before we do this we need to copy the Open Data Hub KfDef file that will instruct the operator which tools to install and how to configure them.

Later in these steps you will also need to:
a. Edit the KfDef file you create in OpenShift with the URL of your cluster. Pay careful attention to those steps or Airflow will not run.
b. Update the certificate for Airflow.

Prerequisite Step:

Before installing Open Data Hub you need to copy the KFDef file from a oublic git repository.
** TODO: Change from Faisal's personal repo.**

  1. Open the KFDef File from the github repository: https://github.com/masoodfaisal/odh-manifests/blob/master/kfdef/ml-workshop-limited.yaml
  2. Click the Copy Raw Contents button drawing to copy the file contents to your clipboard.

Keep this in the clipboard, you will use it shortly.

Create the Workshop's Project and Install ODH

  1. Create the ml-workshop project:
    1.1 Click Home > Projects
    1.2 Click the Create Project button on the top right of the screen
    1.3 Click the Name text box and type ml-workshop
    1.4 Click Create
    OpenShift creates the project.
    drawing

  2. Delete the Limit Range for the project
    2.1 Click Administration > LimitRanges
    2.2 Click the hambuger button for the ml-workshop-core-resource-limits.
    drawing
    2.3 Click Delete LimitRange
    OpenShift removes the LImitRange for the project.

  3. Install Open Data Hub
    2.1 Click Operators > Installed Operators
    OpenShift displays all the operators currently installed.

    Note that the ml-workshop project is unselected and All Projects is selected. You must make ml-workshop the active project.

    2.2 Click the Projects drop-down list and click ml-workshop
    drawing
    2.3 Click Open Data Hub Operator.
    OpenShift displays the operator's details.
    drawing
    2.4 Click Open Data Hub in the operator toolbar.
    OpenShift displays the operand details - of which there are none.
    drawing
    2.5 Click the Create KfDef button.
    2.6 Click the YAML View radio button
    OpenShift displays the KfDef YAML editor.
    drawing
    2.7 Replace the entire YAML file with the KfDef YAML you copied to your clipboard in the Prerequisits step above.
    This KfDef file will tell OpenShift how to install and configure ODH.
    Before you save the KfDef you must edit one line of code.
    2.8 Locate the airflow2 overlay in the code
    drawing
    Around line 57 you will see a value field that contains part of the URL to your OpenShift clister.
    2.9 Replace the value with the the URI of your cluster from the .apps through to the .com as follows:

       - kustomizeConfig:
        overlays:
          - custom-image
        parameters:
          - name: OCP_APPS_URI
            # TODO: Change this uri before applying the KfDef
            value: .apps.cluster-9482.9482.sandbox744.opentlc.com
        repoRef:
          name: manifests
          path: ml-workshop-airflow2

2.10 Click Create
OpenShift creates the KfDef and proceeeds to deploy ODH.
2.11 Click Workloads > Pods to observe the deployment progress.
drawing
Be aweare this may take seveeralk minutes to complete.

Installation Complete

The installation phase of Open Data Hub is now complete. Next you will configure the workshop environment.


Workshop Configuration

Adding users to the workshop

If you are running ODH for a a workshop then you need to configure the users. If you are using the environment as a demo then you can jump forward to the Configure Tools section.

  1. In a terminal window, type the following commands:
cd $REPO_HOME/scripts
./setup-users.sh

Note: User configuration will invalidate any other logins like opentlc-mgr.
For cluster-admin access you should now use user29.

If you need to create users with different credentials consult this blog - on which these instructions are based.

The password for all users is openshift.


Configure the S3 Storage

Upload Files to the rawdata Bucket

In this section we will upload the files that will be used for feature engineering. The files are located in the data-files directory in the ml-workshop git project you cloned earlier.

  1. Open the OpenShift console in your browser.

  2. Click: Networking > Routes

    drawing
  3. Scroll down to find minio-ml-workshop-ui.

  4. Click the Minio url under Location heading
    OpenShift opens a new browser tab and launches the Minio console and diaplays the login screen.
    drawing

  5. Enter the following credentials:

  • Username: minio
  • Password: minio123
  1. Click Login
    Minio displays the main console and all of the existing S3 buckets.
    drawing

  2. Scroll down to find the rawdata bucket.

  3. Click Browse.
    Minio displays the bucket contents.

You will now upload two folders (customers and products) to the rawdata bucket.

Upload the customers data

  1. Click: Upload Files > Upload Folder

    drawing

Minio prompts for the folder to upload.

  1. Navigate to the data files directory within the git repository
$REPO_HOME/data-files
  1. Click the customers folder.

drawing

  1. Click: Upload.
    Minio uploads the folder and all file contents to the raw data S3 bucket.

  2. Click the Clean Complete Objects button drawing to reveal the hidden upload controls.


Configure Superset

Now you need to set up Superset to talk to our S3 and Kafka raw data via Trino - exposing the data via SQL.

  1. Open the OpenShift console in your browser tab.
    openshift-rountes.png

  2. Click the url for superset
    OpenShift opens a new browser tab and displays the Superset login page.
    superset-1.png

  3. Enter the following credentials:

  • Username: admin
  • Password: admin
  1. Click SIGN IN
    Superset diaplays the main console.
    superset-2.png

  2. Click: Data > Databases
    Superset displays a list of configured databases.
    superset-4.png

  3. Click: the "+ DATABASE" button
    Superset prompts for the database connection details superset-4.png

  4. Click the Supported Databases drop-down list

  5. Scroll down to the entry Trino and click it.

  6. Copy and paste the following text into the SQL Alchemy URI text box:

trino://admin@trino-service:8080
  1. Click Test Connection.
    If all steps have been performed correctly, Superset displays the message Connection looks good!.

superset-5.png

  1. Click the Advanced tab in the Edit Database form.
    Superset prompts for the advanced database configuration.

superset-6.png

  1. Click SQL Lab.
  2. Complete the form as illustrated in the following figure:

superset-7.png

16. Click **CONNECT** (or **FINISH** if you have done this step previously) 17. Click **SQL Lab Settings > Saved Queries** in the main toolbar.

superset-8.png

  1. Click the + QUERY button.

NOTE: DO NOT SAVE THE QUERY. We don't save this as it only needs to be run once per workshop

  1. Copy and paste the query editor:

    CREATE TABLE hive.default.customers (
    customerId varchar,
    gender varchar,
    seniorCitizen varchar,
    partner varchar,
    dependents varchar,
    tenure varchar
    )
    WITH (format = 'CSV',
    skip_header_line_count = 1,
    EXTERNAL_LOCATION='s3a://rawdata/customers'
    )
    
  2. Click Run.
    Superset displays Result - true as shown.

superset-9.png

  1. Replace the SQL command with:
    SELECT customers.gender, customers.seniorcitizen, customers.partner, customers.dependents, customers.tenure, products.*  
    from hive.default.customers customers,
    customerchurn.default.data products
    where cast(customers.customerId as VARCHAR) = cast(products.customerId as VARCHAR)
    

Run the query as shown. You should see a resultset spanning personal and product consumption customer data.
superset-10.png

  1. Click the SAVE AS button superset-11.png.
    Superset displays the Save As dialog box.
  2. Click the Name text box. Replace the text with: Kafka-CSV-Join
  3. Click the SAVE button.
    Superset saves the query.

Setup Complete

You are now done with setup!