tH-Wiki will provide an easy solution to having my personal wiki, task and issue tracker – all in one. YouTrack would fit closest (never tried their Knowledge Base feature, but back then the issue tracker was nice), but I do not want proprietary software.
Wiki
Issues
Attachments
Administration
Other
General
- Development to put it to GitHub later, maybe
- ⇒ State: …not maybe, but for sure 🥳
- API first
- For this project, I will separate backend and frontend into two independent applications (not build an all-in-one JAR)
- Abstraction of persistence, so we may start with file-based (json files, H2, SQLite),
but can easily extend to Postgres later
- ⇒ State: Partly implemented. Database uses JDBC for different engines, file-based with JSON is not possible.
However,
DemoDataInitializer
goes towards that direction. Attachments useStorage
interface, which is just a wrapper around Java'sFileSystem
.
- ⇒ State: Partly implemented. Database uses JDBC for different engines, file-based with JSON is not possible.
However,
- If not persisted that way, easy import/export with JSON or YAML
- ⇒ State: There is no specialized import/export feature yet. However, backups are easy when using H2 database, it's just "stop and copy the whole folder".
Wiki
- Simple input of data using Markdown
- Attachments (e.g. Screenshots or configuration files)
- Powerful full-text search (looking towards you, Lucene 😄)
- ⇒ State: Not yet. On client-side we have a powerful ANTLR query language, but it only works that good, because the client loads all texts of all wiki pages / issues.
- Hierarchical storage of pages
- E.g. Level 1 "Server", "Programming", "Personal" – Level 2 "Proxmox", "Kotlin", "Living Room"
- No fixed hierarchies (e.g. "Books" → "Pages"), but arbitrary depths of "folders"
- Tags as an additional mechanism to cluster content across different folders
Issue Tracker
- Basic features of an issue tracker: projects, issues, types, statuses, comments, relations
- ⇒ State: Comments not implemented yet. Edit is sufficient for now.
Task Management
- Should be an intermediate between Wiki and Issue Tracker
- Tasks need to be ordered into hierarchies like the Wiki
- Tasks need statuses/comments/relations like the Issue Tracker
- We need a "Due" field
⇒ State: A separate task management was implemented, but did not pan out. We have integrated everything we need into the issue tracker. Tasks are a separate issue type.
Notes Management
- Similar to Wiki and Tasks
- Notes could have a hierarchy to cluster them if needed.
- Some notes are just "immediate brain dumps" without any structure.
- Notes also can have tags (like in the Wiki)
- Notes are like Wiki, but not so structured. (Technically, they may be the same.)
⇒ State: There is no separate notes managements. For now, the issue tracker has a separate issue type for notes.
WikiPage
id
:UUID
(unique!)title
:String
(unique/non-empty)content
:String
(Markdown, can be blank)parent
:UUID?
(optional parent to form trees)creationTime
:LocalDateTime
modificationTime
:LocalDateTime
Attachment
id
:UUID
(unique! represents the filename we save the attachment in storage)wikiPageId
/issueId
:UUID?
- an attachment belongs to either a wiki page or an issue (not none, not both!), so it can be referenced relative to it, e.g. an image in markdown
- but we keep options open for global attachments with a
NULL
value
filename
:String
(original filename, we allow the same filename multiple times, but only once per same entry, so there is no clash in Markdown rendering)description
:String
(description or comment, the user adds to the upload)lastModifiedTime
:Datetime?
(file's mtime, can beNULL
if unknown)uploadTime
:Datetime
(time of upload)size
:Long
(size of the file)mimeType
:String
(MIME type the client said, we don't validate for now, as detection is hard).- Can be empty if the client did not send any, in this case we will deliver with application/octet-stream
imageWidth
/imageHeight
:Int?
(for images we save the dimensions of the image)sha256Sum
:String
(SHA-256 checksum as hex digits)
Project
id
:UUID
(unique!)prefix
:String
(prefix for issue key, we only allow uppercase characters)title
:String
(unique)description
:String
(description or comment)nextIssueNumber
:Int
(strictly monotonically increasing counter to build issue keys from, e.g.prefix
= "DEMO
",nextIssueNumber
= 1 -> the next issue gets key "DEMO-1
")
IssueType
id
:UUID
(unique!)title
:String
(unique)sortIndex
:Int
icon
:String
iconColor
:String
IssuePriority
id
:UUID
(unique!)title
:String
(unique)sortIndex
:Int
icon
:String
iconColor
:String
showIconInList
:Boolean
IssueStatus
id
:UUID
(unique!)title
:String
(unique)description
:String
(is used for tooltips to explain the status)sortIndex
:Int
icon
:String
iconColor
:String
doneStatus
:Boolean
Issue
id
:UUID
(unique!)projectId
:UUID
(an issue belongs to a project, we don't allow project-less issues)issueNumber
:Int
(unique per project, part of the issue key)- The issue key is project's
prefix
+ "-
" + issue'sissueNumber
issueNumber
is incremented with each new issue
- The issue key is project's
issueKey
:String
(issue key is saved redundantly, so there is no need to JOIN the project table)issueTypeId
:UUID
(issue's type, e.g. Feature, Bug, Task)issuePriorityId
:UUID
(each issue has a mandatory priority)issueStatusId
:UUID
(each issue has a status)- There are no status workflows like in JIRA. Not sure, we want that later.
- Later, statuses can be configured, but you can transition from any status to any other one.
title
:String
(non-empty, multiple issues with the same title are allowed)description
:String
(can be empty, Markdown)creationTime
:LocalDateTime
modificationTime
:LocalDateTime
progress
:Int?
(can be set optionally, e.g. on issueType=Task)dueDate
:LocalDate?
(soon due/overdue issues will be highlighted in the UI)doneTime
:LocalDateTime?
(will be set when status goes to a status being doneStatus)
IssueLinkType
id
:UUID
(unique!)type
:String
(magic string for the UI to identify the different issue link types)- UI can use this to display different styles in the dependency graph, or different icons in lists.
- We don't use an enum to be easily open for future additions.
sortIndex
:Int
wording
:String
(Reading the link forward: "Issue 1 wording Issue 2")wordingInverse
:String
(Reading the link backward: "Issue 2 wordingInverse Issue 1")- Can be empty if the order of the issues does not matter, e.g. "relates to".
- In general, wording should be active (e.g. "blocks"), wordingInverse passive (e.g. "is blocked by").
IssueLink
id
:UUID
(unique!)issue1Id
:UUID
issue2Id
:UUID
issueLinkTypeId
:UUID
Tag
id
:UUID
(unique!)projectId
:UUID?
(null = global tag, non-null = project tag, cannot be changed after creation)scope
:String
(can be empty)scopeIcon
:String
(can be empty)scopeColor
:String
(can be empty)title
:String
titleIcon
:String
(can be empty)titleColor
:String
description
:String
(can be empty)
TagAssociation
tagId
:UUID
wikiPageId
UUID?
(can be null, exactly one of those nullable UUID must be set)issueId
UUID?
(can be null, exactly one of those nullable UUID must be set)
Entry is flexible to hold anything. The different presentations will be done by the UI, for example rendering a checkbox next to a task loading/saving custom field "done".
Because the "Powerful full-text search" must understand the content of the data, BE could/should
validate non-sense data like "type=wiki, due=2024-06-07
". However, such examples are not hurtful
and could make sense, for example "Please check and rework this wiki page until 2024-06-07.".
With such an approach we could easily implement "convert note to task" by changing the type. We could (should?!) even keep the ID. UI is responsible for cleaning/enforcing certain custom fields, for example forcing/defaulting a status when "note ⇒ issue" and vice-versa delete the status when "issue ⇒ note".
Field definitions could be added later, for example assisting "Status" with a set of pre-defined values. Or "Due" having a "Date/Time" format.
⇒ Discontinued and replaced:
We refactored Entry
back to WikiPage
and Issue
. Having these "flexible fields" made us more trouble
than it was worth. If we ever need a "super-object" again, we can build that with GraphQL and interfaces
or union types.
Projects serve as a basis for the issue tracker.
Previously, there was entries. Now, issues are disjunct to wiki pages, so it's debatable whether there will
be different wikis, one per project, or not. The "everything is an entry" did NOT serve us well, and was discontinued.
Tasks had been be replaced by issues, as the issue tracker will be able to handle all task use-cases. Notes and other
future extensions will be separate as well. Shared functionality like attachments, tags, or custom fields can be
implemented with different base tables like wiki_attachment
and issue_attachment
, or a single table having
separately columns like attachment.wiki_page_id
and attachment.issue_id
.
For projects we have both an id
and the prefix
which is unique.
It's not yet clear, whether we want frontend URLs like /issues/FOO-1
or issues/536215eb-a23c-4b14-a0ea-c68c4ada351a
(or both). To separate both the wording is "issue ID" (the UUID) vs. "issue key" (prefix + "-
" + running number).
For the API, only the UUID is primarily relevant.
We won't (or only much later) track moving issues between projects like e.g. JIRA does. Since it's allowed to delete an issue, it's also allowed to move an issue from project A ("delete it there") to project B ("create it there").
sortIndex
allows to arrange the items in a dropdown list. It's only used for sorting,
but never communicated to a client.
Issue type, priority, and status have an icon
field. It's a reference to a Font-Awesome icon.
We use 48 chars as column length. The longest Font-Awesome icon currently has 32 characters.
Check https://github.com/FortAwesome/Font-Awesome/blob/6.x/js/all.js with regex \s+"[^"]{38,}":
.
iconColor
can either be a "#rrggbb
" string for a specific color, or a Bootstrap CSS class suffix
like primary
or danger-emphasis
.
All three tables are automatically filled by the backend with default values. For now, there are no endpoints to alter these. We will later extend the functionality, so the data can be altered to provide more flexibility to the user.
Additional special fields are provided:
IssuePriority.showIconInList
: Indicator to the UI to not show the icon in the list. It's so that "normal" priorities have no icon, only lower/higher priorities. In the issue's detail view the icon is always shown.IssueStatus.description
: a description explaining what a status means. UI can render this as a tooltip to the status.IssueStatus.doneStatus
: Marks a status as "done". When an issue transitions into such a status the issue'sdoneTime
is set. We use this to marking the end state, this can be for example "Done" or "Declined".
Tags are designed to be associable with all kinds of entities. For now, we have issues and wiki pages. It's possible to use them for attachment and future things.
There is project tags (associated with a particular project) and global tags (without a project).
By design, wiki pages can be associated with global tags. There is no special category "wiki tags". In theory, in a bigger setup, wiki pages would be associated with a project, i.e. forming one "wiki" per project. For now, we don't do that and only have one "global wiki". Thus it's consistent to use the global tags.
For association we decided to have foreign key constraints by the database, therefore each entity to be associable
with a tag has a column in the table. The alternative (tagId, type, targetId)
looks easier, but does not allow
FKs. TagAssociation
is the first table not having a surrogate key. We don't have to keep meta-data there and
consider a tag association "belonging" to the owning entity, e.g. if an issue gets deleted, so does their
tag associations, automatically.
- Complicated user, groups and permissions ⇒ First, it's only for me.
- We may start with fixed issue types/statuses/relations. Later this can be extended to be configurable, then even configurable per project.
- "Scrum"ish issue tracker with dashboards and reports
- We will start without repository/service layer, putting all code into the controller.
Additional layer/classes will only be inserted, if there is need for such code.
- Consequence: Testing will be done on HTTP level, testing the controllers directly.
- GraphQL API
- The initially used REST API has been replaced by a GraphQL API. Reasons:
- With the previous REST approach, the frontend did a lot of (copy&paste) multiple requests to get all the data, e.g. load issue types, issue priorities, and issue statuses to fill dropdowns or render an issue. GraphQL allows to fetch everything in one request.
- The frontend had different use cases to fetch a different subset of properties, e.g. for a list expensive
fields like
content
/description
are not needed, when rendering a single issue thedescription
was fetched. Providing?fields
functionality had proven to be quite annoying to implement, and had to be done endpoint by endpoint. GraphQL provides field-wise selection to the client out-of-the-box (yes, with the general work we had to do for GraphQL, once).- We do GraphQL right! In contrast to many implementations out there (cause of Hibernate, or even the Spring GraphQL docs showing examples with full objects), we don't load full objects and let the framework throw away a lot of data, but rather tailor our SQL queries to what's really requested by the GraphQL request.
- Important: Sometimes, ID columns must be loaded regardless whether there were requested!
Imagine the following query:
{ issues { title project { prefix } issueLinks { # id <-- without "id" requested, IssueLink.id is not loaded. issueLinkType { wording # <-- But without knowning IssueLink.id, how would we load the associated issueLinkType? # The DataFetcher would not find anything, wording would be null, which is incorrect! } } issueNumber } }
- GraphQL easily lets us return multiple errors at once, e.g. two missing fields.
(We could have done this with REST also, but GraphQL is immediately offering us
a) an
errors
property, b) an array for multiple errors)
- Deletion mutations will have an
id
field only. There is no need to return the removed object, so we don't have to fetch it first. - File uploads: GraphQL does not support file uploads. There is different workarounds available, see
1,
2,
3.
- Multipart Requests: Would open CSRF vulnerabilities. Spring does not have direct support, redirecting to a (as of now) only 15 stars project multipart-spring-graphql. ⇒ Nope
- Cloud services ⇒ Nooooope! 😱
- Base64 string uploads: This is the easiest solution, and since we are not expecting lots of traffic and/or
gigantic files, we go with this approach. No additional configuration needed, no deviation from the GraphQL spec.
- Remark: We could try out Base85 (needs a special "quotes/backslash to sth else" replacement to work in JSON data)
- File download: File uploads are done by Base64 strings. We don't go that way symmetrically for file downloads,
but rather provide the usual GET endpoints. These URLs or file names will be provided by GraphQL responses.
(❗) NOTE: This has to be re-evaluated. First, we deleted the REST API and with it the above mentioned download
functionality. Now, we re-introduced the GET endpoint. Users can access the attachment by its ID.
No separate URL/filename in the GraphQL response.
Reasons:
- Browsers can easily render a file with a
Content-Type
header. With Base64 decode magic we would need to send theContent-Type
separately, and put effort into letting the browser know. - No change to existing code necessary.
- Files can easily accessed outside the API by just typing in the URL into the browser.
- Browsers can easily render a file with a
- GraphQL types should match our database types. For example, having
issueKey
(= project'sprefix
+ "-
" + issue'sissueNumber
) needs an additional JOIN fromissue
toproject
. This was bumpy in REST implementation already, and would make it even harder in GraphQL implementation. Solution is a) not provide the field anymore, or b) persist it redundantly in the database (asproject.issue_key
). - "Testing will be done on HTTP level" also applies to GraphQL. We don't use Spring's
HttpGraphQlTester
. - Naming:
- Mutations start with a verb.
- For mutations an input argument is just called
input
. - A mutation can have different arguments as
input
, but in general we don't want many arguments. If it's more than two, there should be ainput
argument instead. However, we don't do it as strict as GitHub's API, they have all mutations only a singleinput
argument.- Thoughts: An
input
type can be extended more easily. We only choose separate arguments, when we are sure there won't be a change in the future. For example a delete mutation only accepting anid
parameter. No need for a separateinput
argument with only one field.
- Thoughts: An
- All mutations shall have a
...Response
type as a result, not a direct result, likeProject
orDeletionResult
. No two mutations may share the same type as a result. This is to allow us non-breaking schema changes in the future (by adding a new, deprecating an old field). - See https://www.apollographql.com/docs/graphos/schema-design/guides/naming-conventions
- Regarding "not found":
- Querying a non-existing entity is not an error when the schema specifies a nullable value.
No entry is added to
errors
. - When referencing a non-existing entity as input in a mutation, it's an error.
errors
will contain aNOT_FOUND
classified field (in contrast to other invalid input which is classifiedBAD_REQUEST
, e.g. "number out of range", "date in invalid format").
- Querying a non-existing entity is not an error when the schema specifies a nullable value.
No entry is added to
DataLoader
s do not necessarily go into the same file as their@SchemaMapping
.@SchemaMapping
s are grouped by theirtypeName
, e.g.@SchemaMapping(typeName = "WikiPage", field = "attachments")
needs to go intoWikiPageController
.DataLoader
s go into "their" controller. That is because the controller knows how to resolve the fields. E.g.@SchemaMapping(typeName = "WikiPage", field = "attachments")
(defined intoWikiPageController
) wants to loadAttachments
. The neededDataLoader
for these is defined inAttachmentsController
.
- General
Controller
layout in order:init
block for definingDataLoader
s and registering them intobatchLoaderRegistry
@SchemaMapping
(order like in schema definition,extend type
s last)@QueryMapping
s (order like in schema definition)@MutationMapping
s (order like in schema definition)determineFieldsToLoad()
methods (fromDataFetchingFieldSelectionSet
, fromBatchLoaderEnvironment
, fromSet<String>
)inner class
DataLoader
s usingdetermineFieldsToLoad()
- The initially used REST API has been replaced by a GraphQL API. Reasons:
- Issues are released to my production wiki since end of January 2025, and are performing extremely well 🥳 Starting right now (2025-02-09), issue references are used in commit messages if there is a corresponding issue in the tracker.
- Naming strategy for database
CONSTRAINT
s andINDEX
es:pk__table
(PRIMARY KEY
)fk__table__reference
(FOREIGN KEY
):reference
usually is the name of the referenced table, but can contain additional discriminators.uniq__table__columns
(UNIQUE
):columns
is the name(s) of the columns, separated by__
index__table__columns
(INDEX
):columns
is the name(s) of the columns, separated by__
check__table__param
(CHECK
):param
is a short description of the check- Generally,
- Column names can be shortened when it's clear, e.g.
tag
instead oftag_id
. - Multiple columns can be referenced by their meaning instead of listing all columns,
e.g.
key
insteadproject_id__issue_number
. - References can be shortened as well, e.g.
issue1
instead ofissue__id
. This is even mandatory when the same column is referenced twice. - If needed, use
__
for separating table and columns, e.g.referenced_table__column_1
vs.referenced_table__column_2
.
- Column names can be shortened when it's clear, e.g.
- In controller, we use jOOQ
Record
classes as an indirection, not the GraphQL POJO directly.- This helps, if some column/field in the database does not directly map to GraphQL POJO.
- GraphQL POJO classes should implement a static
fromRecord()
function for this. - Common pattern:
@QueryMapping fun foos(fieldSelectionSet: DataFetchingFieldSelectionSet): List<Foo> { val fields = determineFieldsToLoad(fieldSelectionSet, emptySet()) return create .select(fields) .from(FOO) .fetchInto(FooRecord::class.java) .map { Foo.fromRecord(it) } }
Record
s help us updating an entity and without looking at the requested fields be safe by returningFoo.fromRecord(it)
, e.g.@MutationMapping fun updateFoo(@Argument input: UpdateFooInput): UpdateFooResponse { val errors = GraphQLErrors() val fooRecord = create .selectFrom(FOO) .where(FOO.ID.eq(input.id)) .fetchOne() ?: GraphQLErrors.throwApiExceptionWithNotFoundType(null, "There is no foo with ID '${input.id}'.") // ... validation errors.ifAnyThrowApiException() // updating the changed fields only fooRecord.text = input.text fooRecord.modificationTime = now create.executeUpdate(fooRecord) return UpdateFooResponse(Foo.fromRecord(fooRecord)) }
- Field selection is done by
determineFieldsToLoad()
functions- They map each GraphQL field manually, e.g.
when { it == "id" -> listOf(ISSUE.ID) // ... }
- Children fields must be mapped, so that their
DataFetcher
s can load their association by Foreign Key, e.g.when { it == "nestedField" -> emptyList() it.startsWith("nestedField/") -> listOf(FOO.NESTED_FIELD_ID) // FK needed for children fields // ... }
when
block with anelse -> throw
will ensure, we don't forget any field. Particular handy, when you overlook anextend type Foo
mapping the GraphQL schema.when { // ... else -> throw AssertionError("Unknown field '$it'. Schema should not allow that.") }
- They map each GraphQL field manually, e.g.
- Connect DBeaver to H2 database: dbeaver/dbeaver#20676 (comment)
Fixed logging configuration, checked into VCS, is configured within application.yaml
.
Additional logging configuration can be done individually by environment variables. For example
LOGGING_LEVEL_BIZ_THEHACKER=DEBUG
LOGGING_LEVEL_ORG_JOOQ_TOOLS=DEBUG
to enable general DEBUG
logging for packages biz.thehacker
(so the whole tH-Wiki)
and org.jooq.tools
(e.g. includes org.jooq.tools.LoggerListener
to output all queries
and their results).
Note: It's not possible to configure specific loggers by environment variables that way, only full packages. See https://docs.spring.io/spring-boot/reference/features/logging.html#features.logging.log-levels.
Configuring individual loggers works like this:
SPRING_APPLICATION_JSON='{"logging.level.org.jooq.tools.LoggerListener": "DEBUG"}'
Simply change the src/main/resources/schema.sql
file, then run
./gradlew jooqCodegen
Afterwards, you can start/continue coding with the newly created files.
./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
java \
-jar -Dspring.profiles.active=demo \
build/libs/th-wiki-1.0-SNAPSHOT.jar
./gradlew bootJar
mkdir -p storage # ensure directory exists
THWIKI_CORS_ORIGIN="http://localhost:5173" \
SPRING_DATASOURCE_URL=jdbc:h2:file:./storage/th-wiki \
SPRING_SQL_INIT_MODE=ALWAYS \
java -jar build/libs/th-wiki-1.0-SNAPSHOT.jar
./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
java \
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005 \
-jar -Dspring.profiles.active=demo \
build/libs/th-wiki-1.0-SNAPSHOT.jar
# Use "Attach to process" in your debugger
(matches "in-memory" IntelliJ run configuration)
./gradlew bootJar
docker build -t th-wiki .
docker run --rm \
-p 8080:8080 \
-e THWIKI_CORS_ORIGIN="http://localhost:5173" \
th-wiki
(matches "persisted" IntelliJ run configuration)
./gradlew bootJar
docker build -t th-wiki .
docker run --rm \
-p 8080:8080 \
-v /home/thehacker/IdeaProjects/th-wiki/th-wiki/storage:/th-wiki/storage \
-e SPRING_DATASOURCE_URL=jdbc:h2:file:/th-wiki/storage/th-wiki \
-e SPRING_SQL_INIT_MODE=ALWAYS \
-e THWIKI_CORS_ORIGIN=http://localhost:5173 \
th-wiki
On my private projects I usually keep merge commits to have a cleaner history of the bigger features
split over multiple commits. Between bigger features there is usually a clean-up phase where I do smaller
improvements or refactoring not directly related to the previous feature. They either go into some
feature/misc
branch or directly to the master
.
With tH-Wiki I tried a 100% linear history, state-of-the-art nowadays. While this performs well in professional projects with big teams, it does not for private projects where there sometimes is breaks for months until work continues.
I regretted the decision. On 2025-02-08 I rewrote the history for the th-wiki
and th-wiki-ui
repositories completely. Retroactively I reconstructed the branches and added the missing merge commits
by --no-ff
merging. Since these commits are new, they got a fresh author date equal to 2025-02-08.
(I did not artificially set a fitting date, but rather kept the real one.) The other commits that
were cherry-picked only got their commit date changed, but kept their original author date by Git.