11# log-surgeon: A performant log parsing library
2- Project Link: [ Homepage] [ home-page ]
32
4- Video Demo Link: [ Video Demo] [ video-demo ]
5-
6- ---
7-
8- ## Team Members
9- - Student 1: Siwei (Louis) He, 1004220960,
[email protected] 10- - Student 2: Zhihao Lin, 1005071299,
[email protected] 11-
12- ---
3+ [ ![ Build status] [ badge-build-status ]] [ project-gh-action ]
4+ ![ Apache Lisensed] [ badge-apache ]
135
146## Introduction
157
16- ` log-surgeon ` is a library for high-performance parsing of unstructured text
17- logs implemented using Rust.
8+ ` log-surgeon ` is a library for high-performance parsing of unstructured text logs implemented using
9+ Rust. This project originated as the course project for
10+ [ ECE1724F1 Performant Software Systems with Rust] [ ece1724 ] , offered in 2024 at the University of
11+ Toronto.
1812
19- ---
13+ - Project Link: [ Homepage] [ home-page ]
14+ - Video Demo Link: [ Video Demo] [ video-demo ]
15+ - Team Members
16+ - Student 1:
[ Siwei (Louis) He
] [ github-siwei ] , 1004220960,
[email protected] 17+ - Student 2:
[ Zhihao Lin
] [ github-zhihao ] , 1005071299,
[email protected] 2018
2119## Motivation
2220Today's large technology companies generate logs the magnitude of petabytes per day as a critical
@@ -83,8 +81,6 @@ Our project, [log-surgeon-rust][home-page], is designed to improve CLP's parsing
8381safe and high-performant regular expression engine specialized for unstructured logs, allowing users
8482to extract named variables from raw text log messages efficiently according to user-defined schema.
8583
86- ---
87-
8884## Objective
8985The objective of this project is to fill the gap explained in the motivation above in the current
9086Rust ecosystem. We shall deliver a high-performance and memory-safe log parsing library using Rust.
@@ -107,8 +103,6 @@ The log parsing interface will provide user programmatic APIs to:
107103- Feed input log stream to the log parser
108104- Retrieve outputs (parsed log events structured according to the user schema) from the parser
109105
110- ---
111-
112106## Features
113107As a log parsing library, log-surgeon provides the following features that differ from general text
114108parsers:
@@ -133,13 +127,9 @@ feature provides APIs for:
133127- Merging multiple NFAs into a single DFA.
134128- Simulating a DFA with character streams or strings.
135129
136- ---
137-
138130## Architecture Overview
139131![ log-surgeon-arch-overview] ( docs/src/overall-arch-diagram.png )
140132
141- ---
142-
143133## User's Guide
144134log-surgeon is a Rust library for high-performance parsing of unstructured text logs. It is being
145135shipped as a Rust crate and can be included in your Rust project by adding the following line to
@@ -184,8 +174,6 @@ The example uses the repository relative path to include the dependency. If you
184174library in your project, you can follow the user's guide above where you should specify the git URL
185175to obtain the latest version of the library.
186176
187- ---
188-
189177## Contributions by each team member
1901781 . ** [ Louis] [ github-siwei ] **
191179- Implemented the draft version of the AST-to-NFA conversion.
@@ -202,8 +190,6 @@ to obtain the latest version of the library.
202190Both members contributed to the overall architecture, unit testing, integration testing, and library
203191finalization. Both members reviewed the other's implementation through GitHub's Pull Request.
204192
205- ---
206-
207193## Lessons learned and concluding remarks
208194This project provided us with an excellent opportunity to learn about the Rust programming language.
209195We gained hands-on experience with Rust's borrowing system, which helped us write safe and reliable
@@ -226,17 +212,20 @@ The future work:
226212- Implement [ tagged-DFA] [ wiki-tagged-dfa ] to support more powerful variable extraction.
227213- Optimize the lexer to emit tokens based on buffer views, reducing internal string copying.
228214
229-
215+ [ badge-apache ] : https://img.shields.io/badge/license-APACHE-blue.svg
216+ [ badge-build-status ] : https://github.com/Toplogic-Inc/log-surgeon-rust/workflows/CI/badge.svg
230217[ clp-paper ] : https://www.usenix.org/system/files/osdi21-rodrigues.pdf
231218[ clp-s-paper ] : https://www.usenix.org/system/files/osdi24-wang-rui.pdf
219+ [ ece1724 ] : https://www.eecg.toronto.edu/~bli/ece1724
232220[ github-clp ] : https://github.com/y-scope/clp
233221[ github-siwei ] : https://github.com/Louis-He
234222[ github-zhihao ] : https://github.com/LinZhihao-723
235223[ hadoop-logs ] : https://zenodo.org/records/7114847
236224[ home-page ] : https://github.com/Toplogic-Inc/log-surgeon-rust
237225[ mongodb-logs ] : https://zenodo.org/records/11075361
226+ [ project-gh-action ] : https://github.com/Toplogic-Inc/log-surgeon-rust/actions
238227[ regex-syntax-ast-Ast ] : https://docs.rs/regex-syntax/latest/regex_syntax/ast/enum.Ast.html
239228[ wiki-dfa ] : https://en.wikipedia.org/wiki/Deterministic_finite_automaton
240229[ wiki-nfa ] : https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
241230[ wiki-tagged-dfa ] : https://en.wikipedia.org/wiki/Tagged_Deterministic_Finite_Automaton
242- [ video-demo ] : TODO
231+ [ video-demo ] : https://www.youtube.com/watch?v=0mJwwBKXU7A&ab_channel=SiweiHe
0 commit comments