Skip to content

fix: resource leak in Doc2PDF.parseJSP2PDF - HttpURLConnection never closed #2244

@github-actions

Description

@github-actions

Summary

The parseJSP2PDF method in Doc2PDF.java has a resource leak where it opens an HttpURLConnection via GetInputFromURI() but never closes it. This can lead to connection pool exhaustion and file descriptor leaks under load.

Current Impact: LOW - The method is currently not called anywhere in the active codebase, making this a dormant issue.


Affected Code

Location

File: src/main/java/ca/openosp/openo/util/Doc2PDF.java
Method: parseJSP2PDF(HttpServletRequest, HttpServletResponse, String, String) at line 90
Helper Method: GetInputFromURI(String, String) at line 264

The Problem

// Line 97 in parseJSP2PDF
BufferedInputStream in = GetInputFromURI(jsessionid, uri);

// Parse directly from InputStream with UTF-8 encoding and base URI
org.jsoup.nodes.Document doc = Jsoup.parse(in, StandardCharsets.UTF_8.name(), uri);
// ... rest of method
// ❌ 'in' is NEVER closed

The GetInputFromURI method (lines 264-281) creates an HttpURLConnection and returns a BufferedInputStream:

public static BufferedInputStream GetInputFromURI(String jsessionid, String uri) {
    BufferedInputStream in = null;
    try {
        URL url = new URI(uri + ";jsessionid=" + jsessionid).toURL();
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        in = new BufferedInputStream(conn.getInputStream());
        // ❌ Connection never closed
    } catch (Exception e) {
        logger.error("Unexpected error", e);
    }
    return in;
}

When the BufferedInputStream is not closed, the underlying HttpURLConnection remains open indefinitely.


Resources Leaked

  1. HTTP connection pool entries - Prevents connection reuse
  2. File descriptors - Can exhaust OS limits under load
  3. Socket resources - Network layer resources remain allocated
  4. Memory buffers - Connection buffers not released

Current Usage Analysis

✅ Currently Unused (Dormant Leak)

I searched the entire codebase and found that parseJSP2PDF is NOT called anywhere:

# No active callers found
git grep -n "parseJSP2PDF(" -- "*.java" "*.jsp"
# Only returns the method definition itself

Related Methods (No Leak)

The three active messenger PDF actions use different methods that do NOT have this leak:

  1. MsgDoc2PDF2Action (lines 113, 122):

    • Calls Doc2PDF.parseString2PDF()
    • Calls Doc2PDF.parseString2Bin()
  2. MsgAttachPDF2Action (lines 134, 154):

    • Calls Doc2PDF.parseString2PDF()
    • Calls Doc2PDF.parseString2Bin()
  3. MsgViewPDF2Action (line 142):

    • Calls Doc2PDF.PrintPDFFromBin()

None of these methods use GetInputFromURI or open HttpURLConnections.


Correction Options

Option 1: Fix with Try-With-Resources (Recommended if keeping method)

public static void parseJSP2PDF(HttpServletRequest request, HttpServletResponse response, 
                                String uri, String jsessionid) {
    try {
        BufferedInputStream in = GetInputFromURI(jsessionid, uri);
        if (in == null) {
            logger.error("Failed to retrieve content from URI: {}", uri);
            return;
        }
        
        try (in) {  // Auto-close the stream
            org.jsoup.nodes.Document doc = Jsoup.parse(in, StandardCharsets.UTF_8.name(), uri);
            configureJsoupForXhtml(doc);
            
            String cleanHtml = doc.html();
            MiscUtils.getLogger().debug("Parsed HTML content, length: {} chars", cleanHtml.length());
            String documentTxt = AddAbsoluteTag(request, cleanHtml, uri);
            
            PrintPDFFromHTMLString(response, documentTxt);
        }
    } catch (Exception e) {
        logger.error("", e);
    }
}

Option 2: Remove Dead Code (Recommended)

Since parseJSP2PDF is not called anywhere and the entire Doc2PDF class is already marked @Deprecated, consider removing this unused method entirely:

  1. Remove parseJSP2PDF method (lines 90-113)
  2. Remove GetInputFromURI method (lines 264-281) if no other callers exist
  3. Update class JavaDoc to note the removal

Benefits:

  • Eliminates the leak entirely
  • Reduces attack surface (security best practice)
  • Simplifies codebase maintenance
  • Aligns with OpenO's active code cleanup philosophy

Testing

If fixing (Option 1), manual testing would require:

  1. Find or create a workflow that calls parseJSP2PDF (currently none exist)
  2. Monitor connection pool usage with netstat or JMX
  3. Verify connections are released after PDF generation
  4. Check file descriptor count doesn't grow: lsof -p <pid> | wc -l

If removing (Option 2), verify:

# Confirm no callers exist
git grep -n "parseJSP2PDF(" -- "*.java" "*.jsp"
git grep -n "GetInputFromURI(" -- "*.java" "*.jsp"

# Run full test suite
make install --run-tests

Related Issues

  • This issue was identified during code review of PR fix: migrate Doc2PDF from jtidy to Jsoup #2238 (Doc2PDF jtidy → Jsoup migration)
  • The Doc2PDF class is already marked @Deprecated with warning: "unsafe with potential memory leaks"
  • The broader issue of replacing Doc2PDF entirely is tracked separately

Labels

  • type: bug - Resource leak is a defect
  • priority: medium - Dormant but should be addressed
  • good first issue - Clear problem, well-defined solutions, low risk

Recommendation

Given that:

  1. The method is unused
  2. The class is deprecated
  3. OpenO actively removes unused code
  4. The leak has existed for years without impact

Recommend Option 2 (removal) unless there's a known future need for this functionality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomerstype: bugSomething isn't working as expected

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions