Skip to content

feat: OneNote Image to Telegram Photo Upload#27

Merged
komplexb merged 5 commits intomasterfrom
feat/image-upload-2025
Sep 1, 2025
Merged

feat: OneNote Image to Telegram Photo Upload#27
komplexb merged 5 commits intomasterfrom
feat/image-upload-2025

Conversation

@komplexb
Copy link
Owner

@komplexb komplexb commented Jun 7, 2025

Summary

• Add OneNote image extraction and Telegram photo upload functionality
• Implement memory-efficient image processing with size limits for Lambda constraints
• Enable automatic image detection and upload from OneNote pages to Telegram channels

Key Features

Image Detection & Processing

  • Extract first image from OneNote HTML content using extractFirstImage()
  • Parse image metadata including alt text, dimensions, and high-resolution URLs
  • Prefer high-resolution image URLs (data-fullres-src) when available

Memory Management for Lambda

  • 3MB default size limit to stay within 512MB Lambda memory
  • Pre-download size checking via HEAD requests (gracefully handles 405 errors)
  • Streaming download with progressive size monitoring
  • Configurable timeouts (30s response, 45s deadline)

Telegram Integration

  • Send images as photo attachments with formatted captions
  • Caption format: [prefix] [title] [alt text or preview text]
  • Images sent first, followed by note body text as separate message
  • Both messages include same subject for reader context

Error Handling & Fallback

  • Graceful degradation: note text always sends even if image fails
  • Comprehensive error logging for debugging
  • Size limit enforcement with clear error messages
  • Timeout protection for large file downloads

Technical Implementation

New Functions Added

  • extractFirstImage() - Parse OneNote HTML for image elements
  • getImageSize() - Check file size before download
  • downloadImage() - Stream download with size limits
  • sendPhotoToTelegram() - Upload images to Telegram via multipart form

Enhanced Functions

  • withTelegram() - Now handles image detection and upload workflow
  • Better error handling with optional chaining for response objects

Changelog

Added

  • OneNote image extraction from HTML content
  • Telegram photo upload with buffer attachment method
  • Image size validation and memory management
  • Comprehensive error handling and logging
  • Support for both standard and high-resolution image URLs

Enhanced

  • Message flow now sends images first, then note text
  • Error handling uses optional chaining for safer access
  • Added hasImage flag to response objects for tracking

Technical Details

  • Uses Microsoft Graph API endpoints for image access
  • Implements proper OAuth token authentication for image downloads
  • Telegram file upload via multipart/form-data
  • Compatible with existing Lambda memory (512MB) and timeout (60s) constraints

Test Results

Successfully tested with:

  • ✅ Image detection in OneNote pages
  • ✅ Image download (205KB test image)
  • ✅ Telegram photo upload with caption
  • ✅ Note text message delivery
  • ✅ Error handling for invalid HEAD requests (405 errors)
  • ✅ Memory efficiency within Lambda constraints

🤖 Generated with Claude Code

komplexb and others added 3 commits June 7, 2025 22:08
Replace manual refresh token handling with recommended MSAL pattern to resolve authentication persistence issues across Lambda invocations.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract first image from OneNote HTML content
- Download images with 3MB size limit and streaming validation
- Send images to Telegram with formatted captions (subject + alt text)
- Graceful fallback when images fail (note text still sends)
- Memory-efficient implementation for Lambda constraints
- Added timeout controls and proper error handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
• Add comprehensive cache plugin error handling and logging
• Implement manual cache persistence as backup mechanism
• Enhance token validation with proper extended expiry checks
• Add detailed logging for authentication flow debugging
• Ensure cache is persisted to DynamoDB after all successful operations
• Improve device login fallback with cache persistence guarantees

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
(cherry picked from commit b049277)
@komplexb
Copy link
Owner Author

@claude code review

@claude
Copy link
Contributor

claude bot commented Aug 31, 2025

Claude finished @komplexb's task —— View job


Code Review for OneNote Image to Telegram Photo Upload

Todo List:

  • Read and analyze core changed files
  • Review image processing and memory management implementation
  • Analyze error handling and fallback mechanisms
  • Check security considerations for image downloads
  • Review Telegram API integration
  • Assess performance and Lambda constraint compliance
  • Provide comprehensive feedback and recommendations

Overall Assessment

This is a well-implemented feature that adds OneNote image extraction and Telegram photo upload functionality. The implementation shows careful consideration for Lambda constraints and includes proper error handling. The code follows the existing patterns and integrates cleanly with the current architecture.

✅ Strengths

Memory Management & Lambda Optimization

  • Smart size limits: 3MB default limit with configurable parameters helps stay within 512MB Lambda memory
  • Progressive download monitoring: lib/onenote.js:396-401 implements chunk-by-chunk size checking during download
  • Pre-download size validation: HEAD request checks (lib/onenote.js:343-364) prevent large file downloads
  • Proper timeout configuration: 30s response, 45s deadline timeouts are appropriate for Lambda constraints

Error Handling & Resilience

  • Graceful degradation: lib/notify.js:174-180 ensures note text always sends even if image processing fails
  • HEAD request error tolerance: lib/onenote.js:382-384 gracefully handles 405 errors from servers that don't support HEAD
  • Comprehensive logging: Good error context throughout the image processing pipeline
  • Optional chaining: lib/notify.js:205,209 safely accesses response objects

Integration Quality

  • Clean module boundaries: New functions are properly exported and follow existing patterns
  • Consistent with existing auth: Reuses the same OAuth tokens for image downloads
  • Proper Telegram integration: Uses multipart/form-data for photo uploads as required by Telegram API
  • Caption formatting: Maintains consistent MarkdownV2 formatting with existing message style

⚠️ Areas for Improvement

1. Recursive Function Issue

Critical: lib/onenote.js:308-310 has a potential infinite recursion bug:

if (recentNotes.includes(note.id)) {
  getRandomNote(notes)  // Missing return statement!
}

Fix: Add return statement:

if (recentNotes.includes(note.id)) {
  return getRandomNote(notes, sectionHandle)  // Pass sectionHandle parameter too
}

2. Error Handling Inconsistencies

  • Inconsistent promise patterns: Mix of Promise constructor and async/await could be standardized
  • Error context: Some error logs lack sufficient context for debugging (e.g., lib/onenote.js:417)
  • Timeout configurations: lib/onenote.js:5-8 defines timeouts but image functions use different hardcoded values

3. Security & Input Validation

  • Image URL validation: No validation that imageUrl is a legitimate Microsoft Graph endpoint
  • HTML parsing safety: extractFirstImage() trusts HTML content without additional sanitization
  • Buffer size validation: Consider additional validation beyond just size limits

4. Code Organization

  • Parameter consistency: downloadImage(imageUrl, maxSizeBytes = 3 * 1024 * 1024) hardcodes default - could use env var
  • Magic numbers: Several hardcoded timeouts and sizes could be constants or environment variables
  • Function complexity: lib/notify.js:140-213 withTelegram() is quite long and handles multiple concerns

5. Performance Considerations

  • Sequential operations: Image download and upload happen sequentially - could potentially be optimized
  • Memory usage: Image buffer is held in memory during entire upload process
  • Error recovery: No retry logic for transient network failures during image processing

📋 Specific Code Issues

High Priority

  1. lib/onenote.js:308: Fix recursive function bug (critical)
  2. lib/onenote.js:392-408: Consider adding Content-Type validation for downloaded images
  3. lib/notify.js:175: Add error context - which image URL failed?

Medium Priority

  1. lib/onenote.js:372: Consider making max size configurable via environment variable
  2. lib/notify.js:266: Hardcoded filename 'image.jpg' - could extract from URL or use generic name
  3. Error messages: Standardize error message format across the new functions

Low Priority

  1. Add JSDoc documentation for the new functions to match existing code style
  2. Consider extracting image processing logic to separate module if this grows
  3. Add type hints in comments for better IDE support

🧪 Testing Recommendations

Since no test files exist, consider adding:

  1. Unit tests for extractFirstImage() with various HTML inputs
  2. Mock tests for downloadImage() with size limit scenarios
  3. Integration tests for the complete image + text message flow
  4. Error case testing (network failures, oversized images, malformed HTML)

🚀 Deployment Considerations

  • ✅ Lambda memory limit (512MB) is well respected
  • ✅ Timeout constraints (60s) are handled appropriately
  • ✅ Environment variables are properly used
  • ⚠️ Consider adding CloudWatch metrics for image processing success/failure rates
  • ⚠️ Monitor actual memory usage in production to validate assumptions

Summary

This is a solid implementation that adds valuable functionality while respecting system constraints. The main concern is the recursive function bug which should be fixed before deployment. The error handling and memory management show good engineering practices. Once the critical issue is resolved, this feature should work reliably in production.

Recommendation: ✅ Approve with required fixes - specifically address the recursive function issue before merging.


Job Run Link | Branch: feat/image-upload-2025

* master:
  auth: code review cleanups
  auth: remove troubleshooting logs
  Add Claude Code GitHub Workflow (#30)
  auth: add better logging
  auth: appears to have resolved login issue
  refactor: enhance cache handling and logging in auth and onenote modules

# Conflicts:
#	handler.js
#	lib/auth.js
@claude
Copy link
Contributor

claude bot commented Aug 31, 2025

Code Review: OneNote Image to Telegram Photo Upload

Overall Assessment ✅

This is a well-implemented feature that adds image upload functionality to the existing OneNote-to-Telegram notification system. The implementation demonstrates good practices with proper error handling, memory management, and graceful fallbacks.

Code Quality & Best Practices ✅

Strengths:

  • Modular design: New functions are well-separated (extractFirstImage, downloadImage, getImageSize)
  • Comprehensive JSDoc: All new functions have proper documentation with parameter types and return values
  • Consistent coding style: Follows existing patterns in the codebase
  • Proper module exports: New functions are correctly exported from their respective modules

Minor Style Issues:

  • Mixed spacing around operators in some places (e.g., lib/notify.js:175 uses || inconsistently)
  • Consider extracting the 3MB size limit to a constant or environment variable for easier configuration

Potential Bugs & Issues ⚠️

1. Missing Return Statement (lib/onenote.js:316)

if (recentNotes.includes(note.id)) {
  getRandomNote(notes)  // Missing return!
}

This could cause infinite recursion or unexpected behavior. Should be:

if (recentNotes.includes(note.id)) {
  return getRandomNote(notes, sectionHandle)
}

2. Error Handling Race Condition (lib/notify.js:152-180)
The image processing happens inside a try-catch but failures could leave the function in an inconsistent state. Consider making the image processing completely independent of the text message flow.

3. HEAD Request Error Handling (lib/onenote.js:366-370)
The catch block creates a new Error wrapping the original error, which could lose important context:

.catch(err => {
  console.error('getImageSize', err)
  reject(new Error(err))  // Could lose original error details
})

Performance Considerations ✅

Excellent Memory Management:

  • ✅ 3MB size limit prevents Lambda memory exhaustion
  • ✅ Streaming download with progressive size checking
  • ✅ Pre-download size validation via HEAD requests
  • ✅ Proper timeout configuration (30s/45s)

Potential Optimizations:

  • Consider adding image format validation (JPEG, PNG, GIF) before download
  • The duplicate getNoteContents call in withTelegram could be optimized by reusing the content from formatNoteBody

Security Concerns 🔒

Good Security Practices:

  • ✅ Proper OAuth token usage for Microsoft Graph API calls
  • ✅ Size limits prevent resource exhaustion attacks
  • ✅ Timeout controls prevent hanging requests
  • ✅ Input validation for image attributes

Potential Concerns:

  • Image URLs from OneNote are trusted without additional validation
  • No MIME type validation - relies on file extension .jpg hardcoded in attachment
  • Consider adding Content-Type header validation for downloaded images

Test Coverage ❌

Major Gap: No tests found for the new image functionality. The codebase appears to lack a comprehensive test suite.

Recommended Test Cases:

  • Unit tests for extractFirstImage() with various HTML inputs
  • Integration tests for downloadImage() with size limits
  • Mock tests for Telegram photo upload
  • Error handling tests for various failure scenarios
  • End-to-end tests for the image workflow

Additional Recommendations

1. Configuration Management:
Consider moving magic numbers to configuration:

const IMAGE_CONFIG = {
  MAX_SIZE_BYTES: process.env.MAX_IMAGE_SIZE || 3 * 1024 * 1024,
  TIMEOUT_RESPONSE: 30000,
  TIMEOUT_DEADLINE: 45000
}

2. Logging Improvements:
Add structured logging for better debugging:

console.log('Image processing', { 
  hasImage: !!imageInfo, 
  size: imageSize, 
  url: imageInfo?.imageUrl 
})

3. Consider Circuit Breaker Pattern:
For production resilience, consider adding a circuit breaker for image downloads to prevent cascading failures.

Verdict: APPROVE with Minor Fixes ✅

This is a solid implementation that adds valuable functionality while maintaining the existing system's reliability. The critical bug in getRandomNote() should be fixed before merging, but overall the code demonstrates good engineering practices.

Required before merge:

  • Fix the missing return statement in getRandomNote() function
  • Consider adding basic unit tests for the new functions

Nice to have:

  • Extract magic numbers to constants
  • Add image format validation
  • Improve error message preservation

@claude
Copy link
Contributor

claude bot commented Aug 31, 2025

Code Review: OneNote Image to Telegram Photo Upload

Strengths

Well-structured feature implementation:

  • Clear separation of concerns between image extraction (onenote.js) and Telegram delivery (notify.js)
  • Comprehensive error handling with graceful fallback to text-only messages
  • Good documentation with detailed JSDoc comments

Memory management for Lambda:

  • Smart approach using 3MB size limits for 512MB Lambda environment
  • Progressive size monitoring during streaming downloads
  • Pre-download size checking with HEAD requests

Robust error handling:

  • Optional chaining for safer property access (err.response?.body?.description)
  • Graceful degradation when image processing fails
  • Comprehensive logging for debugging

⚠️ Issues & Concerns

Security & Best Practices:

  1. Potential memory exhaustion (onenote.js:403-409):

    res.on('data', chunk => {
      downloadedBytes += chunk.length
      if (downloadedBytes > maxSizeBytes) {
        return callback(new Error(`Image download exceeded size limit: ${downloadedBytes} bytes`))
      }
      chunks.push(chunk)
    })

    The chunks array continues to grow even after size limit is exceeded. Should abort the request immediately.

  2. Inconsistent timeout handling (onenote.js:356, notify.js:269):

    • getImageSize(): 10s response, 15s deadline
    • downloadImage(): 30s response, 45s deadline
    • sendPhotoToTelegram(): 30s response, 45s deadline

    Consider standardizing timeout values or making them configurable.

  3. Missing input validation (onenote.js:327-343):
    extractFirstImage() doesn't validate that htmlContent is a string or handle null/undefined inputs.

Code Quality:

  1. Recursive call potential (onenote.js:316):

    if (recentNotes.includes(note.id)) {
      return getRandomNote(notes, sectionHandle)
    }

    Could cause stack overflow if all notes are recent. Consider iterative approach or max retry limit.

  2. Hardcoded filename (notify.js:268):

    .attach('photo', imageBuffer, 'image.jpg')

    Should derive filename from image URL or content-type header.

  3. Unclear error propagation (notify.js:180-183):
    Image errors are logged as warnings but might mask important issues. Consider differentiating between expected vs unexpected errors.

🚀 Performance Considerations

Positive:

  • Efficient streaming download with size monitoring
  • Avoids loading entire image into memory before size check

Improvements needed:

  • Consider adding image caching to avoid re-downloading same images
  • HEAD request fallback adds latency - consider making size check optional

🧪 Test Coverage

Critical gap: No automated tests found for new functionality. Recommend adding tests for:

  • extractFirstImage() with various HTML inputs
  • downloadImage() size limit enforcement
  • sendPhotoToTelegram() with mock Telegram API
  • Error handling scenarios in withTelegram()

📝 Minor Issues

  1. Configuration change (eventData.json:5):
    Switching isSequential to true seems unrelated to image functionality.

  2. Unused npm script (package.json:10):
    remove-logs script added but not documented in CLAUDE.md.

🔧 Recommendations

  1. Fix memory leak in downloadImage():

    res.on('data', chunk => {
      downloadedBytes += chunk.length
      if (downloadedBytes > maxSizeBytes) {
        res.destroy() // Abort the request
        return callback(new Error(`Image download exceeded size limit`))
      }
      chunks.push(chunk)
    })
  2. Add input validation:

    function extractFirstImage(htmlContent) {
      if (!htmlContent || typeof htmlContent !== 'string') {
        return null
      }
      // ... rest of function
    }
  3. Add tests - Create test files for the new image functionality

  4. Consider rate limiting - Add delays between image downloads to respect API limits

Overall Assessment

This is a solid feature implementation that adds valuable image support while maintaining system reliability. The memory management approach is appropriate for Lambda constraints, and the error handling ensures robustness.

Recommendation: Approve with minor fixes - Address the memory leak and add input validation before merging.

🤖 Generated with Claude Code

@komplexb komplexb merged commit 6edc471 into master Sep 1, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant