Skip to content

Commit ce5758e

Browse files
Engineering handbook #2 (#2288)
* chore: new engineering docs * chore: new engineering docs * chore: new engineering docs * add security advisory * add security advisory * fixes * fixes * Update pages/handbook/product-engineering/how-we-work/code-review.mdx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update pages/handbook/product-engineering/how-we-work/code-review.mdx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update pages/handbook/product-engineering/how-we-work/code-review.mdx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * fixes * fixes * fixes * fixes * fixes * fixes * Update pages/handbook/product-engineering/how-we-work/code-review.mdx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * fixes * fixes * fixes * fixes * add tech stack * add tech stack * improve engineering docs * improve engineering docs * improve engineering docs * improve engineering docs --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
1 parent 5f37221 commit ce5758e

File tree

6 files changed

+136
-50
lines changed

6 files changed

+136
-50
lines changed

pages/handbook/product-engineering/architecture.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ flowchart LR
3939
- **Redis**: Stores event queue (BullMQ) and caching layer (API keys, prompts).
4040
- **S3**: Stores raw ingestion events and multi-modal attachments (images, audio).
4141

42+
43+
### Why do we need an OLAP database (Clickhouse) for observability data?
44+
- We built Langfuse initially on Postgres and eventually migrated to Clickhouse. We always knew that Postgres wont be the best fit for our observability data.
45+
- OLAP databases have a columnar layout. With that the database only scans data required to produce results for analytical queries (e.g. LLM cost over time).
46+
- We needed a multi-node database to scale our data insert.
47+
- As we are an open source product, we required a database which runs on an open source license.
48+
4249
---
4350

4451

pages/handbook/product-engineering/how-we-work/_meta.tsx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
export default {
22
"roadmapping": "Roadmapping",
33
"workflow": "Workflow",
4+
"onboarding": "Onboarding",
45
"*": {
56
layout: "default",
67
},

pages/handbook/product-engineering/how-we-work/code-review.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
- Low risk (2 way door): go ahead and merge. Engineer is responsible for monitoring Datadog in case something goes wrong. Notify on-call engineer for roll-backs if required.
55
- Product expert review: sometimes we build a larger change in someone else’s area of responsibility. Feel free to ask for feedback from [DRI](https://docs.google.com/spreadsheets/d/1gOvWf_uSAtcXxWkR_gMuWX8-OYmSJNIPXTmPGfZ2DGY/edit?gid=0#gid=0).
66
- Risk (1 way door): Assign Max as reviewer (Max sees open PR reviews in Linear, SLA 24h, ping if more urgent). **Non exhaustive list of PRs which require a review**: database migrations, changes in the public API, changes in SDK signatures, auth changes, major changes in the ingestion pipeline, larger infra changes. If you are unsure, ask Max for a review.
7-
- New joiners should get all their PRs reviewed for the first 1 month at least. This helps a lot with knowledge sharing and making sure we use abstractions within our code base the right way. Afterwards, we move towards no-review merges.
7+
- New joiners should get all their PRs reviewed for the first 1 month at least. By this, new joiners will learn about the code base and how our system works.
88

99

1010
## Responsibility of the PR author
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Onboarding at Langfuse
2+
3+
Welcome to Langfuse!
4+
5+
This document outlines how onboarding works for every new team member — what to expect on your first day, first week, first month, and beyond. Our goal is to help you quickly gain context, feel confident contributing, and become an owner of your area of the product.
6+
7+
At Langfuse, we lead with context, not control. We share the "why" and "what's important right now," so you can make the best decisions for our users. You'll get support, feedback, and guidance along the way — but you'll also have the space to take ownership early.
8+
9+
10+
## Timeline Overview
11+
12+
| Milestone | Focus | Key Outcome |
13+
|-----------|-------|-------------|
14+
| **Day 1** | Getting oriented | Understand company priorities and get set up |
15+
| **Week 1** | Making your first contribution | Deploy your first change to production |
16+
| **Months 1-3** | Taking ownership | Complete your first independent project end-to-end |
17+
| **Month 6** | Becoming a go-to person | Own a product area and be a trusted domain expert |
18+
19+
---
20+
21+
## Day 1 — Getting Oriented
22+
23+
**Goal:** Understand company priorities and get set up for work.
24+
25+
- **Welcome meeting with Max (CTO)** — Context on company status, top priorities, challenges, and decision-making
26+
- **Setup and access** — GitHub, Datadog, AWS, Slack, and local development environment
27+
- **First task** — Small starter task to explore codebase and workflows. Goal: merge something within the first few days
28+
29+
30+
31+
## Week 1 — Making Your First Contribution
32+
33+
**Goal:** Deploy your first change to production and understand our systems.
34+
35+
By the end of Week 1:
36+
- Fully working development setup
37+
- Merged and deployed your first change
38+
- Understand high-level architecture, CI/CD process, and monitoring
39+
- Read and acknowledge policies (security, compliance, data handling)
40+
- Engage in team communication
41+
42+
_We don't expect speed — we expect curiosity. Take the time to explore and ask "why."_
43+
44+
45+
46+
## Months 1-3 — Taking Ownership
47+
48+
**Goal:** Deliver your first independent project end-to-end.
49+
50+
By the end of Month 3, you should have:
51+
- **Shipped an independent project** — Led a small feature or improvement from start to finish (technical planning, product thinking, implementation, documentation, user research)
52+
- **Joined support rotation** — Started handling support tickets, seeing product through users' eyes, and maintaining your improvement backlog
53+
- **Learned through reviews** — Received feedback on every PR from the team during your first ~2 months
54+
- **Balanced priorities** — Demonstrated ability to align your backlog (support, fixes, improvements) with company priorities
55+
- **Contributed to planning** — Participated in planning discussions
56+
57+
58+
## Month 6 — Becoming a Go-To Person
59+
60+
**Goal:** Own a product area and be a trusted domain expert.
61+
62+
By six months:
63+
- **Domain expertise** — Be the go-to person for a specific area (e.g., evals, datasets, integrations, infrastructure)
64+
- **Shipped projects** — Multiple projects delivered from idea to production
65+
- **Mentorship** — Able to mentor new joiners and review PRs in your area
66+
- **Self-sufficient** — Fully independent in planning, building, and deploying
67+
- **Impact-driven** — Clear sense of how to prioritize for maximum impact
68+
69+
_You're not just contributing — you're shaping how we build and make product decisions._
70+
71+
72+
73+
## Ongoing Support
74+
75+
Throughout your onboarding and beyond, you'll have support from:
76+
77+
- **Max and the team**: Available for prioritization help and technical guidance
78+
- **Pull request reviews**: Continuous feedback on your code
79+
- **Support rotation**: Learn from real user problems and needs
80+
- **Context sharing**: Regular updates on company priorities and strategic direction
81+
82+
83+
Welcome to the team!

pages/handbook/product-engineering/how-we-work/roadmapping.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ title: Roadmapping
1414
## Process
1515

1616
<Callout type="info">
17-
All process stages are run in the “Roadmap” Figjam (internal)
17+
All process stages are run in the “Roadmap” Figjam (internal). We run this process once a quarter.
1818
</Callout>
1919

2020
1. Exploration

pages/handbook/product-engineering/how-we-work/workflow.mdx

Lines changed: 43 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,55 @@
11
# Engineering Workflow
22

3-
## Specification
3+
## Prioritization
4+
5+
Our goals is to build a company where we do not spend hours each week to triage and prioritize tickets. Therefore, everyone has to keep track of their own priorities and to escalate things with Max if work is getting too much. All of this only works, if we all have a shared understanding of priorities and SLAs.
6+
7+
In case of uncertainty: Tag Max (24h SLA on Linear inbox). Everyone should have a clear view on what priorities are. The best way to achieve this is by having one Linear project or issue for the current main task. For bugs and smaller improvements, use the [bug view](https://linear.app/langfuse/view/bugs-confirmedopen-97856f9f745c) or “My issues”. Tickets in Linear should always have a prioritization:
8+
9+
### Issue States
10+
11+
| State | Description |
12+
|-------|-------------|
13+
| **Triage** | - Unassigned, no labels or prios<br/>- Created via GitHub Issue integration or in Linear by non-engineering team members<br/>- Marc/Max subscribe, triage and assign, and add labels, and refine title<br/>- After assignment, Engineer dedupes with existing tickets and handles communications with users |
14+
| **Backlog** | - Everyone manages their backlogs in Linear<br/>- Issues always have a label. Use Linear views to look at all tickets of a label. We only create projects for work that has a clear end (no endless bucket of tickets)<br/>- Add user feedback to tickets or projects: If via Plain, use the "Link thread" feature. Link Linear issues to Plain threads straight from Plain. Snooze issues in Plain for which we want to review again e.g. next week<br/>- Only backlog what you plan/hope to do over the next 1-2 Quarters, rest GitHub Discussions<br/>- Add labels to product areas<br/>- Issue titles: Titles have to be good enough so that someone who reads a list of titles knows what each is about. Descriptions are optional. Write short and precise descriptions so everyone understands |
15+
| **Todo/Progress** | See priority table below for handling guidelines |
16+
17+
### Priority Levels
418

5-
Goal: we should make sure to have as few loops as possible per change.
19+
| Priority | Timeline | Description | Examples |
20+
|----------|----------|-------------|----------|
21+
| **P0 (urgent)** | Drop everything and fix | - Security incidents (e.g. data breach)<br/>- Performance issues (ingestion delay, clickhouse CPU/memory issues)<br/>- Issues that have large scale impact and break our application | Traces table does not load, login broken |
22+
| **P1 (high)** | Same week resolution | - Issues with smaller scale impact<br/>- Improvements that are \<1h work and have a big impact for many customers; great to move fast on these to make users share more feedback as they are excited that we ship<br/>- Delight a user same day for a small change that helps them | Some edge case does not work for dashboards |
23+
| **P2 (mid)** | Same month resolution | - Fixes which are papercut for our users but do not have a wide range impact | Trace tree UI breaks when users have many scores |
24+
| **P3 (low)** | Backlog | - Addition to Langfuse which is nice to have but not urgent | Create new prompt based on non latest prompt version |
625

7-
- Small/mid sized changes (e.g. adding a new filter to a table, adding a new field to a form…)
8-
- Engineer leads, very brief notes on linear or just a descriptive title
9-
- Open to do brief discussions with Max/Marc if helpful to speed process up and reduce risk/uncertainty
26+
## Specification on how to build things
27+
28+
- We want to ship fast with a small team. To achieve this and maintain quality we need to balance individual agency and being able to move fast and smart design decisions early in the process.
29+
- As Langfuse has scaled to a platform processing billions of events, implementation strategy of features can have substantial impact on implementation effort, maintainability, user experience, cloud cost, and performance.
30+
- For such features, getting the right people early in one room is a massive time saver and allows us to ship fast and with high quality.
31+
32+
### Small/mid sized changes (e.g. adding a new filter to a table, adding a new field to a form…)
33+
- Engineer creates a Linear ticket with very brief notes or just a descriptive title
34+
- Open to do brief discussions with Max if helpful to speed process up and reduce risk/uncertainty
1035
- If you plan to have a reviewer of a change, please make sure to involve the reviewer in the planning process.
11-
- Large projects (e.g. supporting Agents in Langfuse, rebuilding SDKs ..)
12-
1. Engineer does initial investigation and then schedules meeting with Marc/Max and other team members who have a lot of relevant context
13-
2. Meeting goal: Derisking and clarifying all important topics
36+
- Asking Claude Code or ChatGPT "what am I missing?" or "how would you build this?" is a great way to get feedback and reduce risk/uncertainty.
37+
38+
### Large projects (e.g. supporting Agents in Langfuse, rebuilding SDKs ..)
39+
1. Engineer does initial research and then schedules meeting with Marc/Max and other team members who have a lot of relevant context. Ideally, he creates a google doc or Lineat ticket with the initial research and a rough specification.
40+
2. Meeting to make decisions and plan the implementation:
41+
- Meeting usually starts with everyone reading and commenting to make sure we are all on the same page.
1442
- Recorded \-\> meeting includes lots of details, good to generate spec/issue-description/working with Claude Code
1543
- Discussion on timelines and how we can make cut requirements to build faster
1644
- Implementation plan afterwards, spin with Max if needed
1745
- If things change in a meeting (also follow-up discussions), small note logged to Linear ticket / project.
1846
- Engineer needs to make sure to have relevant people in the room to make a decision ([DRI](https://docs.google.com/spreadsheets/d/1gOvWf_uSAtcXxWkR_gMuWX8-OYmSJNIPXTmPGfZ2DGY/edit?gid=0#gid=0)).
1947
- Engineer needs to manage Linear project based on the outcome of the meeting.
20-
3. Engineer may need input on UI/UX or Clickhouse queries
21-
- Get quick feedback on implementation thoughts from respective owners ([DRI](https://docs.google.com/spreadsheets/d/1gOvWf_uSAtcXxWkR_gMuWX8-OYmSJNIPXTmPGfZ2DGY/edit?gid=0#gid=0)).
22-
- Based on the initial discussion, ask the owner for PR reviews if needed.
23-
24-
## Prioritization
25-
26-
In case of uncertainty: Tag Max (24h SLA on Linear inbox). Everyone should have a clear view on what priorities are. The best way to achieve this is by having one Linear project or issue for the current main task. For bugs and smaller improvements, use the [bug view](https://linear.app/langfuse/view/bugs-confirmedopen-97856f9f745c) or “My issues”. Tickets in Linear should always have a prioritization:
2748

28-
- **State Triage**
29-
- State: unassigned, no labels or prios
30-
- Created via
31-
- GitHub Issue integration
32-
- In Linear by non-engineering team members
33-
- Marc/Max subscribe, triage and assign, and add labels, and refine title
34-
- After assignment, Engineer dedupes with existing tickets and handles communications with users.
35-
- **State Backlog**
36-
- Everyone manages their backlogs.
37-
- Issues always have a label. Use Linear views to look at all tickets of a label. We only create projects for work that has a clear end (no endless bucket of tickets).
38-
- Add user feedback to tickets or projects
39-
- If via Plain, use the “Link thread” feature of plain. Thereby you get all those threads in “close the loop” once the issue is resolved
40-
- Also, link Linear issues to Plain threads straight from Plain.
41-
- Snooze issues in Plain for which we want to review again e.g. next week.
42-
- Only backlog what you plan/hope to do over the next 1-2 Quarters, rest GitHub Discussions
43-
- Add labels to product areas
44-
- Issue titles: Titles have to be good enough so that someone who reads a list of titles knows what each is about. Descriptions are optional. Write short and precise descriptions so everyone understands.
45-
- **State Todo/progress**
46-
- P0 (urgent) – drop everything and fix
47-
- Incident
48-
- Security incident (e.g. data breach)
49-
- Performance issues (ingestion delay, clickhouse CPU / memory issues)
50-
- issues that have large scale impact and break our application (e.g. traces table does not load, login broken..).
51-
- P1 (high) – same week resolution
52-
- issues with smaller scale impact (e.g. some edge case does not work for dashboards)
53-
- Improvements that are \<1h work and have a big impact for many customers; great to move fast on these to make users share more feedback as they are excited that we ship.
54-
- Delight a user same day for a small change that helps them.
55-
- P2 (mid) – same month resolution
56-
- Fixes which are papercut for our users but do not have a wide range impact (e.g. trace tree UI breaks when users have many scores)
57-
- P2 (low) – backlog
58-
- Addition to Langfuse which is nice to have but not urgent (e.g. create new prompt based on non latest prompt version)
49+
### Ways of pulling help from others
50+
Engineers can ask others any time to help them with their work. 15 minutes of shared discussion can very much improve the overall output.
51+
- UI/UX: For larger UI/UX changes, it is helpful to get input from Max or Marc. Otherwise, draw a small sketch and ask anyone from the team for feedback. We have to take time to polish the UI/UX.
52+
- Clickhouse: For more complex Clickhouse queries, it is helpful to get input from the Clickhouse ([DRI](https://docs.google.com/spreadsheets/d/1gOvWf_uSAtcXxWkR_gMuWX8-OYmSJNIPXTmPGfZ2DGY/edit?gid=0#gid=0)). Otherwise, we may end up with anti patterns in performance and maintainability.
5953

6054
## Implementation
6155

@@ -80,19 +74,20 @@ Slack is extremely busy with many noisy channels. We do not want you to have too
8074
We all need to do busy work and at the same time need to make progress on the most important project we want to drive forward. Hence:
8175

8276
- You should spend max 2h/day on coding bug fixes, support tickets and similar. Sometimes it is necessary and super important for our company to fix bugs. Yet, if you continuously spend more time than 2h/day on bug, talk to Max to get buy-in or distribute work in the team.
83-
- If you are a reviewer on a PR, you have to review the same day. It's important to not block others.
77+
- If you are a reviewer on a PR, you have to review within 24h. Please see (/handbook/product-engineering/how-we-work/code-review) for more details.
8478
- You are expected to clear out your Plain inbox 2 times a day. Always acknowledge a request by a user ASAP and then look into it / work on a fix.
8579
- You are expected to clear out your Linear inbox once a day.
8680

8781
## Linear
8882

89-
Let us all heavily use Linear for planning and for technical and product discussions. I think that Linear can help us all individually and as a team:
83+
We use Linear as internal project planning / ticketing tool. It helps us to:
9084

9185
- Collect user feedback in one place
9286
- Discuss product requirements and implementation details
9387
- Understand what work is left to finish a project
9488
- Triage and prioritize bugs
9589
- Reduce Linear/Slack knowledge split. Keep as much knowledge in Linear as possible.
90+
- Integrate with different tools (Plain, GitHub, Cursor, etc.) to make the workflow smoother.
9691

9792
### Conventions
9893

0 commit comments

Comments
 (0)