Skip to content

Path to 0.3... #1320

@d-e-s-o

Description

@d-e-s-o

Notes and ideas on how to improve the library, in large part based on learnings from 0.2*, likely in backwards incompatible manner:

  • 1. unify Inspector & Symbolizer types

    • because both may cache similar data, users may end up with increased memory usage and additional work being performed when symbolizing and inspecting from the same symbol source
  • 2. have first class differentiation between container and non-container formats

    • the unification of symbolization using container formats (kernel, process, APK) with that of single sources (ELF, Gsym, ...) is not the best idea
    • for process symbolization, for example, it would be nice to report the binary that an address falls into, even if ultimately an address could not be symbolized, but this data makes little sense in other contexts
    • similarly, we may want to report more detailed "module" information (see Report special module strings for BPF and vDSO symbols #1183) for these container formats
  • 3. consider keeping copies of cached data internally

    • right now we mmap symbol sources and effectively use zero-copy parsing and then hand out mmap'ed data
    • this is fine and works well and is performant, but it is troublesome if users modify symbol source data behind our backs
    • but because we tie everything we report to the Symbolizer instance anyway, it may be beneficial to just have a bump allocator inside the Symbolizer instance and hand out data allocated there
    • this could improve locality and would allow us to release the mmapings and it would be safer in the case of modified data
  • 4. rework programmable dispatch (this is related to point 2., as it affects container formats)

    • we likely need a way to decide whether to invoke "default" dispatch path before or after, as both can make sense in different contexts
    • perhaps we may want to work with data from the file system unconditionally, to integrate with the FileCache
    • right now, because of the API design, we support arbitrary "resolvers" that don't expose any file system paths to the core library (the upside is that things could conceivably be kept in memory, but use of that is probably rare)
  • 5. The on-demand created KernelResolver stuff is...weird. Perhaps it would be better to set relevant kernel data once for the Symbolizer object and not on a per-request basis. That would open the door to caching KernelResolver objects, which would allow us to move more logic in there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions