Skip to content

Conversation

@ravikumark815
Copy link

@ravikumark815 ravikumark815 commented Jan 3, 2026

Summary

This PR fixes a critical crash in isisd when processing malformed Router Capability TLVs (Issue #20329).

Problem

The unpack_tlv_router_cap function in isisd/isis_tlvs.c called stream_getc() without validating that sufficient bytes were available in the stream. This triggered an assertion failure in lib/stream.c:353, causing the daemon to abort and creating a Remote Denial of Service (DoS) vulnerability.

Solution

Added four stream boundary checks using STREAM_READABLE() before all stream_getc() calls in the Router Capability TLV parsing logic:

  1. Initial header validation (router ID + flags)
  2. Sub-TLV loop header check
  3. FAD sub-TLV length validation
  4. FAD sub-sub-TLV loop header check
    Also added a check to prevent memory leaks when duplicate FAD entries are received.

Testing

  • Compiled FRR with the fix
  • Sent malformed LSP packets with truncated Router Capability TLVs
  • Verified the daemon no longer crashes and handles the malformed packets gracefully
  • Confirmed normal ISIS operation continues after receiving malformed packets

Checklist

  • Code follows FRR coding standards
  • Commit message follows FRR guidelines
  • Signed-off-by present
  • Fix tested and verified
  • No new compiler warnings introduced

Fixes #20329

@vjardin
Copy link
Contributor

vjardin commented Jan 3, 2026

typoe: do you mean the issue #20329 ? Update the commit log.

@vjardin
Copy link
Contributor

vjardin commented Jan 3, 2026

The added code to handle duplicate FAD entries has a memory leak.

  if (rcap->fads[fad->fad.algorithm])
      XFREE(MTYPE_ISIS_TLV, rcap->fads[fad->fad.algorithm]);

The cleanup in isis_tlvs.c shows the correct approach:

  if (rcap->fads[fad->fad.algorithm]) {
      struct isis_router_cap_fad *old_fad = rcap->fads[fad->fad.algorithm];
      admin_group_term(&old_fad->fad.admin_group_exclude_any);
      admin_group_term(&old_fad->fad.admin_group_include_any);
      admin_group_term(&old_fad->fad.admin_group_include_all);
      XFREE(MTYPE_ISIS_TLV, old_fad);
  }

The four STREAM_READABLE() checks look correct and address the crash scenarios described in #20329. The FAD length validation (length < 4) is also appropriate.

Please address the two items above and this should be better. Consider running with ASan/valgrind/FRR memory leakage to verify no memory leaks occur when processing duplicate FAD sub-TLVs. Do you have a mean to add a test for it ?

@ravikumark815
Copy link
Author

ravikumark815 commented Jan 4, 2026

The added code to handle duplicate FAD entries has a memory leak.

  if (rcap->fads[fad->fad.algorithm])
      XFREE(MTYPE_ISIS_TLV, rcap->fads[fad->fad.algorithm]);

The cleanup in isis_tlvs.c shows the correct approach:

  if (rcap->fads[fad->fad.algorithm]) {
      struct isis_router_cap_fad *old_fad = rcap->fads[fad->fad.algorithm];
      admin_group_term(&old_fad->fad.admin_group_exclude_any);
      admin_group_term(&old_fad->fad.admin_group_include_any);
      admin_group_term(&old_fad->fad.admin_group_include_all);
      XFREE(MTYPE_ISIS_TLV, old_fad);
  }

The four STREAM_READABLE() checks look correct and address the crash scenarios described in #20329. The FAD length validation (length < 4) is also appropriate.

Please address the two items above and this should be better. Consider running with ASan/valgrind/FRR memory leakage to verify no memory leaks occur when processing duplicate FAD sub-TLVs. Do you have a mean to add a test for it ?

Thank you for the review! I've addressed both issues:

  1. Fixed issue number: Changed from Duplicate routes from local route protocol #20337 to isisd: crash in unpack_tlv_router_cap via stream_getc assertion failure #20329 in the commit message.
  2. Fixed memory leak: Updated the duplicate FAD handling to properly cleanup admin_group structures before freeing memory, following the pattern used in the cleanup code at line 5105-5107.
    The updated code now calls admin_group_term() for all three admin_group fields before XFREE().
    Regarding testing:
    I verified the fix using Valgrind with a custom test script that injects packets with duplicate FAD sub-TLVs.
    Valgrind Output: The definitely lost: 0 bytes confirms that replacing duplicate FAD entries no longer leaks memory.
    ==191058== LEAK SUMMARY:
    ==191058== definitely lost: 0 bytes in 0 blocks
    ==191058== indirectly lost: 0 bytes in 0 blocks


sbuf_push(log, indent, "Unpacking Router Capability TLV...\n");
if (tlv_len < ISIS_ROUTER_CAP_SIZE) {
if (tlv_len < ISIS_ROUTER_CAP_SIZE ||
Copy link
Contributor

@mjstapp mjstapp Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several of these changes seem to be oriented around "we can't trust the tlv_len value" that was passed in. it would be clearer to fix that: to ensure that tlv_len was valid before calling this function

Copy link
Author

@ravikumark815 ravikumark815 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that validating tlv_len against the stream bounds before dispatching to the specific unpack function would be a better design for the entire TLV parsing.

However, unpack_tlv and the dispatch logic currently trust the tlv_len from the packet header. Changing that behavior would require refactoring how tlv_len is handled for all TLV types to ensure no regressions.

Given that this PR targets a specific crash/vulnerability, I'd prefer to keep the fix localized to unpack_tlv_router_cap to ensure it's safe and minimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, but ... that's just wrong isn't it? "trust the value from the packet header" I mean - if that's true, that's something that has caused trouble in this specific path, and will continue to cause trouble until it's fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the code as suggested to ensure tlv_len validity before calling the handler.
Changes:

  1. Added size_t min_len to struct tlv_ops.
  2. Updated unpack_tlv() to check if (tlv_len < ops->min_len) before dispatching.
  3. Defined TLV_OPS_MIN_LEN macro and used it for Router Capability (Type 242) with .min_len = 5.
  4. Removed the manual length check from unpack_tlv_router_cap as it is now redundant.
    This provides a central mechanism for enforcing minimum TLV lengths.

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good ... just waiting on an answer to @mjstapp 's comments

@ravikumark815 ravikumark815 requested a review from mjstapp January 6, 2026 18:21
Add validation to ensure sufficient stream data is available before
calling stream_getc() in unpack_tlv_router_cap(). This prevents
assertion failures when processing malformed Router Capability TLVs.

The fix adds four boundary checks:

1. Verify STREAM_READABLE before reading the 5-byte Router Capability
   header (router ID + flags).

2. Check stream availability before reading sub-TLV type and length
   in the main sub-TLV processing loop.

3. Validate FAD sub-TLV length is at least 4 bytes before reading
   mandatory fields (algorithm, metric_type, calc_type, priority).

4. Verify stream has 2 bytes available before reading sub-sub-TLV
   headers in the FAD processing loop.

Edit2: Additionally, add a check to prevent memory leak when duplicate FAD
entries are received for the same algorithm.

Edit3: Set `min_len` to 5 for Router Capability TLV (Type 242) to prevent
processing malformed TLVs that are shorter than the header size,
which previously caused assertion failures.

Fixes: FRRouting#20329
Signed-off-by: ravikumark815 <[email protected]>
@github-actions github-actions bot added the rebase PR needs rebase label Jan 7, 2026
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has sort of wandered a bit.
the issue says that it's possible to get a stream with a readable size less than five through unpack_tlv_router_cap() - but I don't see how that's possible: the ROUTER_CAP_SIZE check appears to prevent that.
the issue also says that in general the tlv lengths aren't checked against the stream, but I do see checks in the incoming callers of isis_unpack_tlvs(), and in unpack_tlvs() and unpack_tlv(). can you show where it's possible to get past those existing checks?

the issue with the subtlvs is a different, and does appear to be real - so it would be better just to focus on that?

@ravikumark815
Copy link
Author

this has sort of wandered a bit. the issue says that it's possible to get a stream with a readable size less than five through unpack_tlv_router_cap() - but I don't see how that's possible: the ROUTER_CAP_SIZE check appears to prevent that. the issue also says that in general the tlv lengths aren't checked against the stream, but I do see checks in the incoming callers of isis_unpack_tlvs(), and in unpack_tlvs() and unpack_tlv(). can you show where it's possible to get past those existing checks?

the issue with the subtlvs is a different, and does appear to be real - so it would be better just to focus on that?

I've reverted the tlv_ops refactoring to keep the PR focused on the specific crash and memory leak in the Router Capability TLV, as suggested.
Regarding your question about the existing checks:

"can you show where it's possible to get past those existing checks?"

Yes, the crash occurs when a TLV advertises a length that is valid according to the stream (avail_len >= tlv_len), but smaller than the fixed header size expected by the handler.

Scenario:

  1. Incoming Packet: [Type=242] [Len=2] [Byte1] [Byte2]
  2. unpack_tlv() at line 7280 checks: avail_len < tlv_len + 2.
    • 4 < 2 + 2 is FALSE. The check passes.
  3. unpack_tlv_router_cap() is called.
  4. It calls stream_get_ipv4(s) which attempts to read 4 bytes.
  5. CRASH: The stream only has 2 bytes available.

That is why the check if (tlv_len < ISIS_ROUTER_CAP_SIZE) inside the handler is necessary—unpack_tlv validates the container size, but not the content requirements.

@ravikumark815 ravikumark815 requested a review from mjstapp January 7, 2026 22:52
@mjstapp
Copy link
Contributor

mjstapp commented Jan 8, 2026

sorry: the point was that there are existing checks that appear to prevent several of the faults that the open issue complains about. I think that issue is largely bogus. you've now gone back and forth several times, without ever showing exactly what problem you were trying to solve - you were just pointing at the bogus open issue.

you had removed the content check that was already present, and was preventing the read of 5 octets from a too-small stream. I see you've now put that check back in place. so again, I'll ask this: please describe whether any of the complaints in the issue are actually valid, and then explain how to fix them. I think the complaint about sub-tlvs may be valid, for example.

this has sort of wandered a bit. the issue says that it's possible to get a stream with a readable size less than five through unpack_tlv_router_cap() - but I don't see how that's possible: the ROUTER_CAP_SIZE check appears to prevent that. the issue also says that in general the tlv lengths aren't checked against the stream, but I do see checks in the incoming callers of isis_unpack_tlvs(), and in unpack_tlvs() and unpack_tlv(). can you show where it's possible to get past those existing checks?
the issue with the subtlvs is a different, and does appear to be real - so it would be better just to focus on that?

I've reverted the tlv_ops refactoring to keep the PR focused on the specific crash and memory leak in the Router Capability TLV, as suggested. Regarding your question about the existing checks:

"can you show where it's possible to get past those existing checks?"

Yes, the crash occurs when a TLV advertises a length that is valid according to the stream (avail_len >= tlv_len), but smaller than the fixed header size expected by the handler.

Scenario:

1. Incoming Packet: `[Type=242] [Len=2] [Byte1] [Byte2]`

2. `unpack_tlv()` at line 7280 checks: `avail_len < tlv_len + 2`.
   
   * `4 < 2 + 2` is FALSE. The check **passes**.

3. `unpack_tlv_router_cap()` is called.

4. It calls `stream_get_ipv4(s)` which attempts to read **4 bytes**.

5. **CRASH**: The stream only has 2 bytes available.

That is why the check if (tlv_len < ISIS_ROUTER_CAP_SIZE) inside the handler is necessary—unpack_tlv validates the container size, but not the content requirements.

sbuf_push(log, indent, "Unpacking Router Capability TLV...\n");
if (tlv_len < ISIS_ROUTER_CAP_SIZE) {
if (tlv_len < ISIS_ROUTER_CAP_SIZE ||
STREAM_READABLE(s) < ISIS_ROUTER_CAP_SIZE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be meaningful - please show how an invalid length value could be passed here.

#endif /* ifndef FABRICD */
uint8_t msd_type;

if (STREAM_READABLE(s) < 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, please show how this could be needed, how the check above could be incorrect.

if (STREAM_READABLE(s) < 2) {
sbuf_push(log, indent,
"WARNING: Unexpected end of stream\n");
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not the correct error return handling, is it?
won't matter if this block is removed, of course.

uint32_t v;
int n_ag, i;

if (STREAM_READABLE(s) < 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and again: if the subtlv length was valid, then the subsubtlvs_len check that's already present should be valid. how could this occur?

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

isisd: crash in unpack_tlv_router_cap via stream_getc assertion failure

5 participants