Skip to content

Commit b025b67

Browse files
Bump version. Gradio upgraded. Minor formatting improvements
1 parent cd53b50 commit b025b67

File tree

8 files changed

+19
-13
lines changed

8 files changed

+19
-13
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ emoji: 📚
44
colorFrom: purple
55
colorTo: yellow
66
sdk: gradio
7-
sdk_version: 6.0.2
7+
sdk_version: 6.2.0
88
app_file: app.py
99
pinned: true
1010
license: agpl-3.0
@@ -13,9 +13,9 @@ short_description: Create thematic summaries for open text data with LLMs
1313

1414
# Large language model topic modelling
1515

16-
Version: 0.7.0
16+
Version: 0.8.0
1717

18-
Extract topics and summarise outputs using Large Language Models (LLMs, Gemma 3 4b/GPT-OSS 20b if local (see tools/config.py to modify), Gemini, Azure, or AWS Bedrock models (e.g. Claude, Nova models). The app will query the LLM with batches of responses to produce summary tables, which are then compared iteratively to output a table with the general topics, subtopics, topic sentiment, and a topic summary. Instructions on use can be found in the README.md file. You can try out examples by clicking on one of the example datasets on the main app page, which will show you example outputs from a local model run. API keys for AWS, Azure, and Gemini services can be entered on the settings page (note that Gemini has a free public API).
18+
Extract topics and summarise outputs using Large Language Models (LLMs), either local, Gemini, Azure, or AWS Bedrock models (e.g. Claude, Nova models). The app will query the LLM with batches of responses to produce summary tables, which are then compared iteratively to output a table with the general topics, subtopics, topic sentiment, and a topic summary. Instructions on use can be found in the README.md file. You can try out examples by clicking on one of the example datasets on the main app page, which will show you example outputs from a local model run. API keys for AWS, Azure, and Gemini services can be entered on the settings page (note that Gemini has a free public API).
1919

2020
NOTE: Large language models are not 100% accurate and may produce biased or harmful outputs. All outputs from this app **absolutely need to be checked by a human** to check for harmful outputs, hallucinations, and accuracy.
2121

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "llm_topic_modelling"
7-
version = "0.7.0"
7+
version = "0.8.0"
88
description = "Generate thematic summaries from open text in tabular data files with a large language model."
99
requires-python = ">=3.10"
1010
readme = "README.md"
@@ -51,7 +51,7 @@ classifiers = [
5151
]
5252

5353
dependencies = [
54-
"gradio==6.0.2",
54+
"gradio==6.2.0",
5555
"transformers==4.57.2",
5656
"spaces==0.42.1",
5757
"boto3==1.42.1",

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Note that this requirements file is optimised for Hugging Face spaces / Python 3.10. Please use requirements_no_local.txt for installation without local model inference (simplest approach to get going). Please use requirements_cpu.txt for CPU instances and requirements_gpu.txt for GPU instances using Python 3.11
2-
gradio==6.0.2
2+
gradio==6.2.0
33
transformers==4.57.2
44
spaces==0.42.1
55
boto3>=1.42.1

requirements_cpu.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
gradio==6.0.2
1+
gradio==6.2.0
22
transformers==4.57.2
33
spaces==0.42.1
44
pandas>=2.3.3

requirements_gpu.txt

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
2-
gradio==6.0.2
1+
gradio==6.2.0
32
transformers==4.57.2
43
spaces==0.42.1
54
boto3>=1.42.1

requirements_lightweight.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# This requirements file is optimised for AWS ECS using Python 3.11 alongside the Dockerfile, without local torch and llama-cpp-python. For AWS ECS, torch and llama-cpp-python are optionally installed in the main Dockerfile
2-
gradio==6.0.2
2+
gradio==6.2.0
33
transformers==4.57.2
44
spaces==0.42.1
55
boto3>=1.42.1

tools/dedup_summaries.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3086,6 +3086,14 @@ def overall_summary(
30863086
summarised_output = ""
30873087
summarised_output_for_df = ""
30883088

3089+
# Remove multiple consecutive line breaks (2 or more) and replace with single line break
3090+
if summarised_output_for_df:
3091+
summarised_output_for_df = re.sub(
3092+
r"\n{2,}", "\n", summarised_output_for_df
3093+
)
3094+
if summarised_output:
3095+
summarised_output = re.sub(r"\n{2,}", "\n", summarised_output)
3096+
30893097
summarised_outputs_for_df.append(summarised_output_for_df)
30903098
summarised_outputs.append(summarised_output)
30913099
txt_summarised_outputs.append(
@@ -3155,6 +3163,7 @@ def overall_summary(
31553163
summarised_outputs_df_for_display["Summary"]
31563164
.apply(lambda x: markdown.markdown(x) if isinstance(x, str) else x)
31573165
.str.replace(r"\n", "<br>", regex=False)
3166+
.str.replace(r"(<br>\s*){2,}", "<br>", regex=True)
31583167
)
31593168
html_output_table = summarised_outputs_df_for_display.to_html(
31603169
index=False, escape=False

tools/llm_api_call.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5396,9 +5396,7 @@ def all_in_one_pipeline(
53965396
total_number_of_calls += dedup_number_of_calls
53975397
total_time_taken += dedup_estimated_time_taken
53985398
out_message.append(
5399-
f"LLM deduplication completed: {dedup_input_tokens} input tokens, "
5400-
f"{dedup_output_tokens} output tokens, {dedup_number_of_calls} calls, "
5401-
f"{dedup_estimated_time_taken:.2f}s"
5399+
f"LLM deduplication completed. Total time: {dedup_estimated_time_taken:.2f}s"
54025400
)
54035401

54045402
# 3) Summarisation

0 commit comments

Comments
 (0)