Survey Assist Themes

This code uses the i.Ai ThemeFinder python package to determine common themes, sentiment and evidence detail from survey respondent free text feedback.

Prerequisites

It is assumed you have installed:

Poetry 2.1.3
Google Cloud SDK
PyEnv

To run the code locally you will need to have access to a GCP Project that has Vertex Ai API enabled.

You should be able to authenticate with the project using Application Default Credentials:

gcloud auth application-default login

Input Data

The input data should be in a CSV format with pipe (|) delimiter.

Headings are expected to be set as "user"|"feedback_comments", csv parsing will fail otherwise.

There is an expectation that the "user" format is either STPxxxxx or STPxxxx-xxxx (where x is a number). The code will fail to parse the CSV otherwise.

There should be two columns, the user column will be converted to an int which uniquely identifies a respondent and the second is a *string which is the users feedback for analysis.

Example input data:

user|feedback_comments
STP00821-01|
STP00561-01|No 
STP00017-01|All great
STP12303-01|none
STP01847-01|
STP91885-01|Very easy to navigate

Question to Evaluate

The code defaults to a stock evaluation question of Do you have any other feedback about this survey?

This can be changed by setting an environment variable.

Environment Variables

The following environment variables are supported, it is recommended to use a .env file in the root directory.

export INPUT_BUCKET=<INPUT_BUCKET_NAME>
export INPUT_FILE=<INPUT_FOLDER>/<INPUT_FILENAME.CSV>
export OUTPUT_BUCKET=<OUTPUT_BUCKET_NAME>
export QUESTION=<Question String>
export GENERATE_THEMES_CSV=<TRUE/FALSE>

Install

Clone the repo and then set local python using pyenv and activate the environment:

pyenv local 3.12.4
python3 -m venv .venv
source .venv/bin/activate

Install the project:

poetry install

Run the application

Ensure you are set to the relevant GCP project and logged in with ADC (see above).

Check you have the environment variables set appropriately.

Start the application:

poetry run python -m survey_assist_themes.themefinder_vertexai

Output

Two files will be saved in the destination bucket you specified in the environment variable OUTPUT_BUCKET.

ThemeFinder Output

The first file is the JSON formatted output from ThemeFinder. And is structured as follows:

{
  "question": "Do you have any other feedback about this survey?",
  "sentiment": [
    {
      "response_id": 4521,
      "response": "No ",
      "position": "UNCLEAR"
    },
    {
      "response_id": 417,
      "response": "All great",
      "position": "AGREEMENT"
    },
    {
      "response_id": 2303,
      "response": "none",
      "position": "UNCLEAR"
    },
    {
      "response_id": 1885,
      "response": "Very easy to navigate",
      "position": "AGREEMENT"
    },
    ...
  ],
  "themes": [
    {
      "topic": "Survey design is effective: The survey is easy to navigate, complete, and understand, featuring clear, concise, and well-designed questions, and suitable automated follow-up questions.",
      "source_topic_count": 8,
      "topic_id": "A"
    },
    ...
  ],
  "mapping": [
    {
      "response_id": 4521,
      "response": "No ",
      "labels": [
        "G"
      ]
    },
    {
      "response_id": 417,
      "response": "All great",
      "labels": [
        "A"
      ]
    },
    {
      "response_id": 2303,
      "response": "none",
      "labels": [
        "G"
      ]
    },
    {
      "response_id": 1885,
      "response": "Very easy to navigate",
      "labels": [
        "A",
        "B"
      ]
    }],
    ...
    "detailed_responses": [
    {
      "response_id": 4521,
      "response": "No ",
      "evidence_rich": "NO"
    },
    {
      "response_id": 417,
      "response": "All great",
      "evidence_rich": "NO"
    },
    ...
    ],
  "unprocessables": [
    {
      "response_id": 5323,
      "response": "I have to think a bit but what can you do. "
    }
  ]
}

Response ID Mapping File

The second file is a JSON file which records the mapping between Response ID and the Original Source ID.

Field	Description
response_id	Sequential integer assigned to each input row, starting from 1.
original_id	Original Source ID
participant_key	Sequential integer assigned per unique `original_id`. Duplicate original IDs share the same participant_key.

The file name will match the ThemeFinder Output but will include the suffix:

_id_mapping.json

The structure of the file as follows:

[
  {
    "response_id": 1,
    "participant_key": 1,
    "original_id": "STP00001"
  },
  {
    "response_id": 2,
    "participant_key": 1,
    "original_id": "STP00001"
  },
  {
    "response_id": 3,
    "participant_key": 2,
    "original_id": "STP00002"
  },
  {
    "response_id": 4,
    "participant_key": 3,
    "original_id": "STP00003"
  }
]

Theme CSV Outputs

The ThemeFinder JSON specifies the themes found under the themes list. Each theme has a topic_id (e.g A, B, C). A CSV file per theme can be generated.

To enable the CSV file generation you need to set:

GENERATE_THEMES_CSV=TRUE

The file name will match the ThemeFinder Output but will include the suffix:

_theme_<TOPIC_ID>.csv

Example theme CSV output

response_id	original_id	response	theme_description
1	STP00001	Impossible to get seen	Inadequate Appointment System
2	STP00002	Phones always engaged	Inadequate Appointment System
3	STP00003	Doctors were helpful	Inadequate Appointment System

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github		.github
src/survey_assist_themes		src/survey_assist_themes
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
build_and_push_image.sh		build_and_push_image.sh
cloudbuild.yaml		cloudbuild.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
themes-job.py		themes-job.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survey Assist Themes

Prerequisites

Input Data

Question to Evaluate

Environment Variables

Install

Run the application

Output

ThemeFinder Output

Response ID Mapping File

Theme CSV Outputs

Example theme CSV output

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Survey Assist Themes

Prerequisites

Input Data

Question to Evaluate

Environment Variables

Install

Run the application

Output

ThemeFinder Output

Response ID Mapping File

Theme CSV Outputs

Example theme CSV output

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages