Skip to content

Commit 08399bc

Browse files
committed
feat: add automatic message splitting for Discord's 2000 char limit
- Add smart message splitting utility that splits at natural boundaries (paragraphs, sentences, words) - Update moderation service to automatically split long messages into multiple Discord messages - Add comprehensive test suite with 16 test cases covering various scenarios - Add pytest to dev dependencies - Update CI/CD workflow to run tests before releases - Add documentation for message splitting feature Closes #[issue-number] (if applicable)
1 parent 9ef6fcb commit 08399bc

File tree

8 files changed

+589
-35
lines changed

8 files changed

+589
-35
lines changed

.github/workflows/release-please.yml

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -5,31 +5,30 @@ on:
55
branches: [main]
66

77
jobs:
8-
# test:
9-
# runs-on: ubuntu-latest
10-
# steps:
11-
# - name: Checkout
12-
# uses: actions/checkout@v4
8+
test:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Checkout
12+
uses: actions/checkout@v4
1313

14-
# - name: Setup pnpm
15-
# uses: pnpm/action-setup@v4
16-
# with:
17-
# version: 9
14+
- name: Install uv
15+
uses: astral-sh/setup-uv@v5
16+
with:
17+
enable-cache: true
1818

19-
# - name: Setup Node.js
20-
# uses: actions/setup-node@v4
21-
# with:
22-
# node-version: 20
23-
# cache: "pnpm"
19+
- name: Set up Python
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: "3.10"
2423

25-
# - name: Install dependencies
26-
# run: pnpm install
24+
- name: Install dependencies
25+
run: uv sync --extra dev
2726

28-
# - name: Run tests
29-
# run: pnpm test
27+
- name: Run tests
28+
run: uv run pytest -v
3029

3130
release-please:
32-
# needs: test
31+
needs: test
3332
runs-on: ubuntu-latest
3433
permissions:
3534
contents: write

docs/Message Splitting.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Message Splitting
2+
3+
## Overview
4+
5+
Discord enforces a maximum message length of 2,000 characters per message. When the bot generates responses that exceed this limit, the message splitting feature automatically divides the content into multiple messages while maintaining readability.
6+
7+
## Architecture Decision
8+
9+
**Decision**: Implement automatic message splitting at the message sending layer
10+
11+
**Rationale**:
12+
13+
- **User Experience**: Long responses should be delivered completely rather than truncated
14+
- **Transparency**: Multiple messages maintain the bot's complete thought process
15+
- **Natural Boundaries**: Smart splitting at paragraphs, sentences, or words maintains context
16+
- **Automatic**: No LLM intervention required - handled at the infrastructure level
17+
18+
**Date**: 2025-10-24
19+
20+
## How It Works
21+
22+
### Split Algorithm
23+
24+
The message splitting utility (`sentinel/utils/discord.py`) uses a hierarchical approach to find natural split points:
25+
26+
1. **First Priority - Paragraph Boundaries**: Splits at newlines (`\n`) if found in the latter half of the allowed length
27+
2. **Second Priority - Sentence Boundaries**: Splits at periods followed by spaces if no good newline exists
28+
3. **Third Priority - Word Boundaries**: Splits at spaces to avoid breaking words
29+
4. **Last Resort**: Hard splits at the character limit (rare, only for very long words/URLs)
30+
31+
### Message Sending Behavior
32+
33+
When `send_message` is called with content exceeding 2,000 characters:
34+
35+
1. Content is split into chunks using the smart splitting algorithm
36+
2. Each chunk is sent as a separate message in sequence
37+
3. Only the first message uses the reply reference (if provided)
38+
4. All messages are recorded for analytics and conversation tracking
39+
5. The last message is used as the conversation continuation point
40+
41+
### Example
42+
43+
```python
44+
# Input: 3,500 character message
45+
long_message = "..." * 3500
46+
47+
# Output: 2 messages
48+
# Message 1: ~2,000 characters (split at paragraph)
49+
# Message 2: ~1,500 characters (remaining content)
50+
```
51+
52+
## Implementation Details
53+
54+
### Key Components
55+
56+
1. **`sentinel/utils/discord.py:split_message()`**
57+
58+
- Core splitting logic
59+
- Configurable max length (defaults to 2,000)
60+
- Returns list of message chunks
61+
62+
2. **`sentinel/services/moderation.py:_tool_send_message()`**
63+
- Calls `split_message()` before sending
64+
- Sends each chunk sequentially
65+
- Handles reply references and threading for split messages
66+
67+
### Edge Cases Handled
68+
69+
- **Empty chunks**: Stripped whitespace means no empty messages are sent
70+
- **Single word longer than limit**: Falls back to hard split (extremely rare)
71+
- **Code blocks**: May be split mid-block (future enhancement: preserve code blocks)
72+
- **Mentions/formatting**: Preserved across splits (Discord handles this automatically)
73+
74+
## Configuration
75+
76+
No configuration is required - message splitting is automatic and always enabled.
77+
78+
### Constants
79+
80+
- `DISCORD_MAX_MESSAGE_LENGTH = 2000` (defined in `sentinel/utils/discord.py`)
81+
- Can be adjusted if Discord changes their limits
82+
83+
## Testing
84+
85+
The splitting algorithm can be tested independently:
86+
87+
```python
88+
from sentinel.utils.discord import split_message
89+
90+
# Test short message (no split needed)
91+
assert split_message("Hello") == ["Hello"]
92+
93+
# Test long message (requires split)
94+
long_msg = "a" * 3000
95+
chunks = split_message(long_msg)
96+
assert len(chunks) == 2
97+
assert all(len(chunk) <= 2000 for chunk in chunks)
98+
99+
# Test natural boundaries
100+
paragraph_msg = ("Paragraph 1.\n" * 100) + ("Paragraph 2.\n" * 100)
101+
chunks = split_message(paragraph_msg)
102+
# Should split at paragraph boundaries
103+
```
104+
105+
## Future Enhancements
106+
107+
1. **Code Block Preservation**: Detect triple-backtick code blocks and avoid splitting inside them
108+
2. **Embed Support**: Handle embeds differently (embeds have different length limits)
109+
3. **Continuation Indicators**: Add "..." or "(continued)" markers between split messages
110+
4. **Smart Numbering**: For list-based responses, ensure lists aren't split awkwardly
111+
112+
## Related Documentation
113+
114+
- [Architecture Overview](./Architecture%20Overview.md) - Overall bot architecture
115+
- [Heuristics System](./Heuristics%20System.md) - How bot learns patterns
116+
- Discord API Limits: https://discord.com/developers/docs/resources/channel#create-message
117+
118+
## Troubleshooting
119+
120+
### Messages are being cut off mid-sentence
121+
122+
This shouldn't happen with the current implementation, but if it does:
123+
124+
1. Check if the message contains very long paragraphs (>1000 characters) with no natural boundaries
125+
2. Verify the split algorithm is finding sentence/word boundaries correctly
126+
3. Add logging to see where splits are occurring
127+
128+
### Bot is sending too many messages
129+
130+
If the bot is generating excessively long responses (multiple splits):
131+
132+
1. Consider adjusting the system prompt to encourage more concise responses
133+
2. Review the LLM's output to understand why it's generating such long content
134+
3. This is working as intended - the alternative is truncation
135+
136+
### Split messages lose context
137+
138+
The split algorithm preserves all content - no truncation occurs. If context appears lost:
139+
140+
1. Verify the conversation manager is tracking the last message correctly
141+
2. Check that reply references are working for the first chunk
142+
3. Ensure threading behavior is correct for multi-message responses

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@ select = ["E", "F", "B", "I"]
2828
ignore = ["E203", "E501"]
2929

3030
[project.optional-dependencies]
31-
dev = ["black>=23.10.0", "ruff>=0.1.5"]
31+
dev = ["black>=23.10.0", "ruff>=0.1.5", "pytest>=7.4.0"]

sentinel/services/moderation.py

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1072,27 +1072,41 @@ async def _tool_send_message(
10721072
except AttributeError:
10731073
reference = None
10741074

1075-
try:
1076-
sent_message = await send_channel.send(message_content, reference=reference)
1077-
except discord.Forbidden:
1078-
logger.warning("Insufficient permissions to send message in %s", send_channel)
1079-
return
1080-
except discord.HTTPException:
1081-
logger.exception("Failed to send message in %s", send_channel)
1082-
return
1075+
# Split message if it exceeds Discord's character limit
1076+
from ..utils.discord import split_message
1077+
1078+
message_chunks = split_message(message_content)
1079+
1080+
# Send all message chunks
1081+
sent_messages: List[discord.Message] = []
1082+
for i, chunk in enumerate(message_chunks):
1083+
# Only use reference for the first message
1084+
chunk_reference = reference if i == 0 else None
1085+
1086+
try:
1087+
sent_message = await send_channel.send(chunk, reference=chunk_reference)
1088+
sent_messages.append(sent_message)
1089+
except discord.Forbidden:
1090+
logger.warning("Insufficient permissions to send message in %s", send_channel)
1091+
return
1092+
except discord.HTTPException:
1093+
logger.exception("Failed to send message in %s", send_channel)
1094+
return
10831095

1084-
# Record bot's response in conversation if we have one
1085-
if conversation_id and self._conversations:
1086-
await self._conversations.record_bot_response(conversation_id, sent_message)
1096+
# Record bot's response in conversation if we have one (use the last message)
1097+
if conversation_id and self._conversations and sent_messages:
1098+
await self._conversations.record_bot_response(conversation_id, sent_messages[-1])
10871099

10881100
# Record channel activity for analytics (not logged as an action since this is just a reply)
1101+
# Record activity for each sent message
10891102
channel_for_log = (
10901103
send_channel if isinstance(send_channel, discord.abc.GuildChannel) else resolved_channel
10911104
)
10921105
if isinstance(channel_for_log, discord.abc.GuildChannel):
1093-
await self._record_bot_channel_activity(
1094-
guild, channel_for_log, sent_message, context_tag
1095-
)
1106+
for sent_msg in sent_messages:
1107+
await self._record_bot_channel_activity(
1108+
guild, channel_for_log, sent_msg, context_tag
1109+
)
10961110

10971111
async def _tool_escalate(self, args: Dict[str, Any], context: EventContext) -> None:
10981112
summary = args.get("summary")

sentinel/utils/discord.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
"""Discord-specific utility functions."""
2+
3+
from __future__ import annotations
4+
5+
from typing import List
6+
7+
# Discord's maximum message length
8+
DISCORD_MAX_MESSAGE_LENGTH = 2000
9+
10+
11+
def split_message(content: str, max_length: int = DISCORD_MAX_MESSAGE_LENGTH) -> List[str]:
12+
"""Split a message into chunks that fit within Discord's character limit.
13+
14+
This function intelligently splits messages at natural boundaries (newlines,
15+
sentences, words) to maintain readability while respecting Discord's 2000
16+
character limit per message.
17+
18+
Args:
19+
content: The message content to split
20+
max_length: Maximum length per chunk (default: 2000 for Discord)
21+
22+
Returns:
23+
List of message chunks, each under max_length characters
24+
25+
Examples:
26+
>>> split_message("Short message")
27+
["Short message"]
28+
29+
>>> long_msg = "a" * 3000
30+
>>> chunks = split_message(long_msg)
31+
>>> len(chunks)
32+
2
33+
>>> all(len(chunk) <= 2000 for chunk in chunks)
34+
True
35+
"""
36+
if len(content) <= max_length:
37+
return [content]
38+
39+
chunks: List[str] = []
40+
remaining = content
41+
42+
while remaining:
43+
if len(remaining) <= max_length:
44+
chunks.append(remaining)
45+
break
46+
47+
# Find the best split point within max_length
48+
split_point = max_length
49+
50+
# Try to split at a newline (paragraph boundary)
51+
newline_pos = remaining.rfind("\n", 0, max_length)
52+
if newline_pos > max_length * 0.5: # Only use if it's past halfway point
53+
split_point = newline_pos + 1 # Include the newline in current chunk
54+
55+
# If no good newline, try to split at a sentence boundary
56+
elif "." in remaining[:max_length]:
57+
# Look for sentence endings: period followed by space or end
58+
for i in range(max_length - 1, int(max_length * 0.5), -1):
59+
if remaining[i] == "." and (i + 1 >= len(remaining) or remaining[i + 1] in " \n"):
60+
split_point = i + 1
61+
break
62+
63+
# If no sentence boundary, try to split at a word boundary
64+
else:
65+
space_pos = remaining.rfind(" ", 0, max_length)
66+
if space_pos > max_length * 0.5: # Only use if it's past halfway point
67+
split_point = space_pos + 1 # Include the space
68+
69+
# Extract the chunk and update remaining
70+
chunk = remaining[:split_point].rstrip()
71+
if chunk: # Only add non-empty chunks
72+
chunks.append(chunk)
73+
74+
remaining = remaining[split_point:].lstrip()
75+
76+
return chunks

0 commit comments

Comments
 (0)