Skip to content

Create FeatureBuilder Log File #364

@xehu

Description

@xehu

Objective

One challenge with using the Team Communication Toolkit is that there are many features in the toolkit, some of which have similar names or even similar purposes. Our philosophy (as the makers of the toolkit) is that we should provide researchers with the option to explore many different measures for their own usage; we aren't taking a stance on which of the measures is 'correct,' hence why we adopt an inclusive approach with these similar features. However, we should provide users with tools to understand the outputs of the toolkit and to be informed about their results.

This Issue describes a logfile that will output a file containing information about the TCT's outputs upon running the TCT.

Basic log file information

  • The log file should be saved upon completion of the TCT featurization pipeline.
  • It should be saved to outputs/logs (a new folder).
  • It should group features by their Semantic Grouping (a field that already exists in the Feature Dictionary as semantic_grouping), and within each header, provide information about the features within the grouping at each level of analysis (chat, user, and conv).
  • It should also log how long it took to run each feature-generation function, both with and without caching (with caching, the featurebuilder should run a lot faster).

Expected Statistics

For each Semantic Grouping and each level of analysis, collect all the relevant features. Then provide the following summary statistics:

  • Basic descriptive stats: mean, median, max, min, std.
  • List of correlated features (r > 0.99, 0.9, 0.8, 0.7, 0.6, 0.5)
  • List of features with low variation (all 0’s, all 1’s)
  • List of features with NA’s (and list how many NA's there are)

In addition, log the following:

  • Statistics about the data (how many chats, speakers, conversations?)
  • The time required to run the features.

Example Log File

Team Communication Toolkit FeatureBuilder Run initiated at Tuesday, December 2, 2025 5:10:01 PM GMT-05:00

Data file has 100 lines (chats), 5 unique speakers, 4 unique conversations.

FeatureBuilder was run WITHOUT a cache. Running cache...
Cache generation completed in 45s.

Running main features...
[name of feature] 10s
[name of feature] 32s
[name of feature] 1m 5 s

...

====FEATURE OUTPUT SUMMARY====

FEATURE CATEGORY: QUANTITY

1. Chat/Turn Level
[present the statistics here in a table format]

2. Speaker Level

...

3. Conversation Level

...


Metadata

Metadata

Assignees

Labels

priority 1More important tasks

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions