Skip to content

tH-Wiki will provide an easy solution to having my personal wiki, task and issue tracker – all in one.

License

Notifications You must be signed in to change notification settings

theHacker/th-wiki

Repository files navigation

tH-Wiki

tH-Wiki will provide an easy solution to having my personal wiki, task and issue tracker – all in one. YouTrack would fit closest (never tried their Knowledge Base feature, but back then the issue tracker was nice), but I do not want proprietary software.

Screenshots

Wiki

Wiki: Home Wiki: Markdown rendering Wiki: Mermaid support, e.g. mindmap Wiki: Powerful search query with tags

Issues

Issues: List with powerful search query Issues: Dependency graph of issue links with multiple depth levels Issues: Issue links with tooltip on hover Issues: Dependency graph shows different issue types, issue priorities, and link types

Attachments

Attachment: Table view Attachment: Thumbnails view

Administration

Administration: Projects Administration: Tags

Other

Integrated help Version

Goals / Progress

General

  • Development to put it to GitHub later, maybe
    • ⇒ State: …not maybe, but for sure 🥳
  • API first
  • For this project, I will separate backend and frontend into two independent applications (not build an all-in-one JAR)
  • Abstraction of persistence, so we may start with file-based (json files, H2, SQLite), but can easily extend to Postgres later
    • ⇒ State: Partly implemented. Database uses JDBC for different engines, file-based with JSON is not possible. However, DemoDataInitializer goes towards that direction. Attachments use Storage interface, which is just a wrapper around Java's FileSystem.
  • If not persisted that way, easy import/export with JSON or YAML
    • ⇒ State: There is no specialized import/export feature yet. However, backups are easy when using H2 database, it's just "stop and copy the whole folder".

Wiki

  • Simple input of data using Markdown
  • Attachments (e.g. Screenshots or configuration files)
  • Powerful full-text search (looking towards you, Lucene 😄)
    • ⇒ State: Not yet. On client-side we have a powerful ANTLR query language, but it only works that good, because the client loads all texts of all wiki pages / issues.
  • Hierarchical storage of pages
    • E.g. Level 1 "Server", "Programming", "Personal" – Level 2 "Proxmox", "Kotlin", "Living Room"
    • No fixed hierarchies (e.g. "Books" → "Pages"), but arbitrary depths of "folders"
  • Tags as an additional mechanism to cluster content across different folders

Issue Tracker

  • Basic features of an issue tracker: projects, issues, types, statuses, comments, relations
    • ⇒ State: Comments not implemented yet. Edit is sufficient for now.

Task Management

  • Should be an intermediate between Wiki and Issue Tracker
    • Tasks need to be ordered into hierarchies like the Wiki
    • Tasks need statuses/comments/relations like the Issue Tracker
  • We need a "Due" field

⇒ State: A separate task management was implemented, but did not pan out. We have integrated everything we need into the issue tracker. Tasks are a separate issue type.

Notes Management

  • Similar to Wiki and Tasks
    • Notes could have a hierarchy to cluster them if needed.
    • Some notes are just "immediate brain dumps" without any structure.
    • Notes also can have tags (like in the Wiki)
  • Notes are like Wiki, but not so structured. (Technically, they may be the same.)

⇒ State: There is no separate notes managements. For now, the issue tracker has a separate issue type for notes.

Database Schema

Tables

WikiPage

  • id: UUID (unique!)
  • title: String (unique/non-empty)
  • content: String (Markdown, can be blank)
  • parent: UUID? (optional parent to form trees)
  • creationTime: LocalDateTime
  • modificationTime: LocalDateTime

Attachment

  • id: UUID (unique! represents the filename we save the attachment in storage)
  • wikiPageId/issueId: UUID?
    • an attachment belongs to either a wiki page or an issue (not none, not both!), so it can be referenced relative to it, e.g. an image in markdown
    • but we keep options open for global attachments with a NULL value
  • filename: String (original filename, we allow the same filename multiple times, but only once per same entry, so there is no clash in Markdown rendering)
  • description: String (description or comment, the user adds to the upload)
  • lastModifiedTime: Datetime? (file's mtime, can be NULL if unknown)
  • uploadTime: Datetime (time of upload)
  • size: Long (size of the file)
  • mimeType: String (MIME type the client said, we don't validate for now, as detection is hard).
    • Can be empty if the client did not send any, in this case we will deliver with application/octet-stream
  • imageWidth/imageHeight: Int? (for images we save the dimensions of the image)
  • sha256Sum: String (SHA-256 checksum as hex digits)

Project

  • id: UUID (unique!)
  • prefix: String (prefix for issue key, we only allow uppercase characters)
  • title: String (unique)
  • description: String (description or comment)
  • nextIssueNumber: Int (strictly monotonically increasing counter to build issue keys from, e.g. prefix = "DEMO", nextIssueNumber = 1 -> the next issue gets key "DEMO-1")

IssueType

  • id: UUID (unique!)
  • title: String (unique)
  • sortIndex: Int
  • icon: String
  • iconColor: String

IssuePriority

  • id: UUID (unique!)
  • title: String (unique)
  • sortIndex: Int
  • icon: String
  • iconColor: String
  • showIconInList: Boolean

IssueStatus

  • id: UUID (unique!)
  • title: String (unique)
  • description: String (is used for tooltips to explain the status)
  • sortIndex: Int
  • icon: String
  • iconColor: String
  • doneStatus: Boolean

Issue

  • id: UUID (unique!)
  • projectId: UUID (an issue belongs to a project, we don't allow project-less issues)
  • issueNumber: Int (unique per project, part of the issue key)
    • The issue key is project's prefix + "-" + issue's issueNumber
    • issueNumber is incremented with each new issue
  • issueKey: String (issue key is saved redundantly, so there is no need to JOIN the project table)
  • issueTypeId: UUID (issue's type, e.g. Feature, Bug, Task)
  • issuePriorityId: UUID (each issue has a mandatory priority)
  • issueStatusId: UUID (each issue has a status)
    • There are no status workflows like in JIRA. Not sure, we want that later.
    • Later, statuses can be configured, but you can transition from any status to any other one.
  • title: String (non-empty, multiple issues with the same title are allowed)
  • description: String (can be empty, Markdown)
  • creationTime: LocalDateTime
  • modificationTime: LocalDateTime
  • progress: Int? (can be set optionally, e.g. on issueType=Task)
  • dueDate: LocalDate? (soon due/overdue issues will be highlighted in the UI)
  • doneTime: LocalDateTime? (will be set when status goes to a status being doneStatus)

IssueLinkType

  • id: UUID (unique!)
  • type: String (magic string for the UI to identify the different issue link types)
    • UI can use this to display different styles in the dependency graph, or different icons in lists.
    • We don't use an enum to be easily open for future additions.
  • sortIndex: Int
  • wording: String (Reading the link forward: "Issue 1 wording Issue 2")
  • wordingInverse: String (Reading the link backward: "Issue 2 wordingInverse Issue 1")
    • Can be empty if the order of the issues does not matter, e.g. "relates to".
    • In general, wording should be active (e.g. "blocks"), wordingInverse passive (e.g. "is blocked by").

IssueLink

  • id: UUID (unique!)
  • issue1Id: UUID
  • issue2Id: UUID
  • issueLinkTypeId: UUID

Tag

  • id: UUID (unique!)
  • projectId: UUID? (null = global tag, non-null = project tag, cannot be changed after creation)
  • scope: String (can be empty)
  • scopeIcon: String (can be empty)
  • scopeColor: String (can be empty)
  • title: String
  • titleIcon: String (can be empty)
  • titleColor: String
  • description: String (can be empty)

TagAssociation

  • tagId: UUID
  • wikiPageId UUID? (can be null, exactly one of those nullable UUID must be set)
  • issueId UUID? (can be null, exactly one of those nullable UUID must be set)

Additional Thoughts and Decisions

Entry (didn't pan out)

Entry is flexible to hold anything. The different presentations will be done by the UI, for example rendering a checkbox next to a task loading/saving custom field "done".

Because the "Powerful full-text search" must understand the content of the data, BE could/should validate non-sense data like "type=wiki, due=2024-06-07". However, such examples are not hurtful and could make sense, for example "Please check and rework this wiki page until 2024-06-07.".

With such an approach we could easily implement "convert note to task" by changing the type. We could (should?!) even keep the ID. UI is responsible for cleaning/enforcing certain custom fields, for example forcing/defaulting a status when "note ⇒ issue" and vice-versa delete the status when "issue ⇒ note".

Field definitions could be added later, for example assisting "Status" with a set of pre-defined values. Or "Due" having a "Date/Time" format.

Discontinued and replaced: We refactored Entry back to WikiPage and Issue. Having these "flexible fields" made us more trouble than it was worth. If we ever need a "super-object" again, we can build that with GraphQL and interfaces or union types.

Project

Projects serve as a basis for the issue tracker.

Previously, there was entries. Now, issues are disjunct to wiki pages, so it's debatable whether there will be different wikis, one per project, or not. The "everything is an entry" did NOT serve us well, and was discontinued. Tasks had been be replaced by issues, as the issue tracker will be able to handle all task use-cases. Notes and other future extensions will be separate as well. Shared functionality like attachments, tags, or custom fields can be implemented with different base tables like wiki_attachment and issue_attachment, or a single table having separately columns like attachment.wiki_page_id and attachment.issue_id.

For projects we have both an id and the prefix which is unique. It's not yet clear, whether we want frontend URLs like /issues/FOO-1 or issues/536215eb-a23c-4b14-a0ea-c68c4ada351a (or both). To separate both the wording is "issue ID" (the UUID) vs. "issue key" (prefix + "-" + running number). For the API, only the UUID is primarily relevant.

We won't (or only much later) track moving issues between projects like e.g. JIRA does. Since it's allowed to delete an issue, it's also allowed to move an issue from project A ("delete it there") to project B ("create it there").

Issue Type, Issue Priority, and Issue Status

sortIndex allows to arrange the items in a dropdown list. It's only used for sorting, but never communicated to a client.

Issue type, priority, and status have an icon field. It's a reference to a Font-Awesome icon. We use 48 chars as column length. The longest Font-Awesome icon currently has 32 characters. Check https://github.com/FortAwesome/Font-Awesome/blob/6.x/js/all.js with regex \s+"[^"]{38,}":.

iconColor can either be a "#rrggbb" string for a specific color, or a Bootstrap CSS class suffix like primary or danger-emphasis.

All three tables are automatically filled by the backend with default values. For now, there are no endpoints to alter these. We will later extend the functionality, so the data can be altered to provide more flexibility to the user.

Additional special fields are provided:

  • IssuePriority.showIconInList: Indicator to the UI to not show the icon in the list. It's so that "normal" priorities have no icon, only lower/higher priorities. In the issue's detail view the icon is always shown.
  • IssueStatus.description: a description explaining what a status means. UI can render this as a tooltip to the status.
  • IssueStatus.doneStatus: Marks a status as "done". When an issue transitions into such a status the issue's doneTime is set. We use this to marking the end state, this can be for example "Done" or "Declined".

Tags

Tags are designed to be associable with all kinds of entities. For now, we have issues and wiki pages. It's possible to use them for attachment and future things.

There is project tags (associated with a particular project) and global tags (without a project).

By design, wiki pages can be associated with global tags. There is no special category "wiki tags". In theory, in a bigger setup, wiki pages would be associated with a project, i.e. forming one "wiki" per project. For now, we don't do that and only have one "global wiki". Thus it's consistent to use the global tags.

For association we decided to have foreign key constraints by the database, therefore each entity to be associable with a tag has a column in the table. The alternative (tagId, type, targetId) looks easier, but does not allow FKs. TagAssociation is the first table not having a surrogate key. We don't have to keep meta-data there and consider a tag association "belonging" to the owning entity, e.g. if an issue gets deleted, so does their tag associations, automatically.

Non-goals (or only later)

  • Complicated user, groups and permissions ⇒ First, it's only for me.
  • We may start with fixed issue types/statuses/relations. Later this can be extended to be configurable, then even configurable per project.
  • "Scrum"ish issue tracker with dashboards and reports

Architecture Decisions

  • We will start without repository/service layer, putting all code into the controller. Additional layer/classes will only be inserted, if there is need for such code.
    • Consequence: Testing will be done on HTTP level, testing the controllers directly.
  • GraphQL API
    • The initially used REST API has been replaced by a GraphQL API. Reasons:
      • With the previous REST approach, the frontend did a lot of (copy&paste) multiple requests to get all the data, e.g. load issue types, issue priorities, and issue statuses to fill dropdowns or render an issue. GraphQL allows to fetch everything in one request.
      • The frontend had different use cases to fetch a different subset of properties, e.g. for a list expensive fields like content/description are not needed, when rendering a single issue the description was fetched. Providing ?fields functionality had proven to be quite annoying to implement, and had to be done endpoint by endpoint. GraphQL provides field-wise selection to the client out-of-the-box (yes, with the general work we had to do for GraphQL, once).
        • We do GraphQL right! In contrast to many implementations out there (cause of Hibernate, or even the Spring GraphQL docs showing examples with full objects), we don't load full objects and let the framework throw away a lot of data, but rather tailor our SQL queries to what's really requested by the GraphQL request.
        • Important: Sometimes, ID columns must be loaded regardless whether there were requested! Imagine the following query:
          {
            issues {
              title
              project {
                prefix
              }
              issueLinks {
                # id  <-- without "id" requested, IssueLink.id is not loaded.
                issueLinkType {
                  wording # <-- But without knowning IssueLink.id, how would we load the associated issueLinkType?
                          #     The DataFetcher would not find anything, wording would be null, which is incorrect!
                }
              }
              issueNumber
            }
          }
      • GraphQL easily lets us return multiple errors at once, e.g. two missing fields. (We could have done this with REST also, but GraphQL is immediately offering us a) an errors property, b) an array for multiple errors)
        • In contrast to 1 and 2, we do not incorporate possible errors into the schema. Maybe later. We start simple: Field errors use field inside the extensions for each error, so the FE knows what field to colorize red, the message will be shown to the user.
    • Deletion mutations will have an id field only. There is no need to return the removed object, so we don't have to fetch it first.
    • File uploads: GraphQL does not support file uploads. There is different workarounds available, see 1, 2, 3.
      • Multipart Requests: Would open CSRF vulnerabilities. Spring does not have direct support, redirecting to a (as of now) only 15 stars project multipart-spring-graphql. ⇒ Nope
      • Cloud services ⇒ Nooooope! 😱
      • Base64 string uploads: This is the easiest solution, and since we are not expecting lots of traffic and/or gigantic files, we go with this approach. No additional configuration needed, no deviation from the GraphQL spec.
        • Remark: We could try out Base85 (needs a special "quotes/backslash to sth else" replacement to work in JSON data)
    • File download: File uploads are done by Base64 strings. We don't go that way symmetrically for file downloads, but rather provide the usual GET endpoints. These URLs or file names will be provided by GraphQL responses. (❗) NOTE: This has to be re-evaluated. First, we deleted the REST API and with it the above mentioned download functionality. Now, we re-introduced the GET endpoint. Users can access the attachment by its ID. No separate URL/filename in the GraphQL response. Reasons:
      • Browsers can easily render a file with a Content-Type header. With Base64 decode magic we would need to send the Content-Type separately, and put effort into letting the browser know.
      • No change to existing code necessary.
      • Files can easily accessed outside the API by just typing in the URL into the browser.
    • GraphQL types should match our database types. For example, having issueKey (= project's prefix + "-" + issue's issueNumber) needs an additional JOIN from issue to project. This was bumpy in REST implementation already, and would make it even harder in GraphQL implementation. Solution is a) not provide the field anymore, or b) persist it redundantly in the database (as project.issue_key).
    • "Testing will be done on HTTP level" also applies to GraphQL. We don't use Spring's HttpGraphQlTester.
    • Naming:
      • Mutations start with a verb.
      • For mutations an input argument is just called input.
      • A mutation can have different arguments as input, but in general we don't want many arguments. If it's more than two, there should be a input argument instead. However, we don't do it as strict as GitHub's API, they have all mutations only a single input argument.
        • Thoughts: An input type can be extended more easily. We only choose separate arguments, when we are sure there won't be a change in the future. For example a delete mutation only accepting an id parameter. No need for a separate input argument with only one field.
      • All mutations shall have a ...Response type as a result, not a direct result, like Project or DeletionResult. No two mutations may share the same type as a result. This is to allow us non-breaking schema changes in the future (by adding a new, deprecating an old field).
      • See https://www.apollographql.com/docs/graphos/schema-design/guides/naming-conventions
    • Regarding "not found":
      • Querying a non-existing entity is not an error when the schema specifies a nullable value. No entry is added to errors.
      • When referencing a non-existing entity as input in a mutation, it's an error. errors will contain a NOT_FOUND classified field (in contrast to other invalid input which is classified BAD_REQUEST, e.g. "number out of range", "date in invalid format").
    • DataLoaders do not necessarily go into the same file as their @SchemaMapping.
      • @SchemaMappings are grouped by their typeName, e.g. @SchemaMapping(typeName = "WikiPage", field = "attachments") needs to go into WikiPageController.
      • DataLoaders go into "their" controller. That is because the controller knows how to resolve the fields. E.g. @SchemaMapping(typeName = "WikiPage", field = "attachments") (defined into WikiPageController) wants to load Attachments. The needed DataLoader for these is defined in AttachmentsController.
    • General Controller layout in order:
      • init block for defining DataLoaders and registering them into batchLoaderRegistry
      • @SchemaMapping (order like in schema definition, extend types last)
      • @QueryMappings (order like in schema definition)
      • @MutationMappings (order like in schema definition)
      • determineFieldsToLoad() methods (from DataFetchingFieldSelectionSet, from BatchLoaderEnvironment, from Set<String>)
      • inner class DataLoaders using determineFieldsToLoad()

Other Decisions

  • Issues are released to my production wiki since end of January 2025, and are performing extremely well 🥳 Starting right now (2025-02-09), issue references are used in commit messages if there is a corresponding issue in the tracker.
  • Naming strategy for database CONSTRAINTs and INDEXes:
    • pk__table (PRIMARY KEY)
    • fk__table__reference (FOREIGN KEY): reference usually is the name of the referenced table, but can contain additional discriminators.
    • uniq__table__columns (UNIQUE): columns is the name(s) of the columns, separated by __
    • index__table__columns (INDEX): columns is the name(s) of the columns, separated by __
    • check__table__param (CHECK): param is a short description of the check
    • Generally,
      • Column names can be shortened when it's clear, e.g. tag instead of tag_id.
      • Multiple columns can be referenced by their meaning instead of listing all columns, e.g. key instead project_id__issue_number.
      • References can be shortened as well, e.g. issue1 instead of issue__id. This is even mandatory when the same column is referenced twice.
      • If needed, use __ for separating table and columns, e.g. referenced_table__column_1 vs. referenced_table__column_2.

Correct Dealing with GraphQL

  • In controller, we use jOOQ Record classes as an indirection, not the GraphQL POJO directly.
    • This helps, if some column/field in the database does not directly map to GraphQL POJO.
    • GraphQL POJO classes should implement a static fromRecord() function for this.
    • Common pattern:
      @QueryMapping
      fun foos(fieldSelectionSet: DataFetchingFieldSelectionSet): List<Foo> {
          val fields = determineFieldsToLoad(fieldSelectionSet, emptySet())
      
          return create
              .select(fields)
              .from(FOO)
              .fetchInto(FooRecord::class.java)
              .map { Foo.fromRecord(it) }
      }
    • Records help us updating an entity and without looking at the requested fields be safe by returning Foo.fromRecord(it), e.g.
      @MutationMapping
      fun updateFoo(@Argument input: UpdateFooInput): UpdateFooResponse {
          val errors = GraphQLErrors()
      
          val fooRecord = create
              .selectFrom(FOO)
              .where(FOO.ID.eq(input.id))
              .fetchOne()
              ?: GraphQLErrors.throwApiExceptionWithNotFoundType(null, "There is no foo with ID '${input.id}'.")
      
          // ... validation
      
          errors.ifAnyThrowApiException()
      
          // updating the changed fields only
      
          fooRecord.text = input.text
          fooRecord.modificationTime = now
          create.executeUpdate(fooRecord)
      
          return UpdateFooResponse(Foo.fromRecord(fooRecord))
      }
  • Field selection is done by determineFieldsToLoad() functions
    • They map each GraphQL field manually, e.g.
      when {
          it == "id" -> listOf(ISSUE.ID)
          // ...
      }
    • Children fields must be mapped, so that their DataFetchers can load their association by Foreign Key, e.g.
      when {
          it == "nestedField" -> emptyList()
          it.startsWith("nestedField/") -> listOf(FOO.NESTED_FIELD_ID) // FK needed for children fields
          // ...
      }
    • when block with an else -> throw will ensure, we don't forget any field. Particular handy, when you overlook an extend type Foo mapping the GraphQL schema.
      when {
          // ...
          else -> throw AssertionError("Unknown field '$it'. Schema should not allow that.")
      }

Useful Links

How to Configure Logging

Fixed logging configuration, checked into VCS, is configured within application.yaml.

Additional logging configuration can be done individually by environment variables. For example

LOGGING_LEVEL_BIZ_THEHACKER=DEBUG
LOGGING_LEVEL_ORG_JOOQ_TOOLS=DEBUG

to enable general DEBUG logging for packages biz.thehacker (so the whole tH-Wiki) and org.jooq.tools (e.g. includes org.jooq.tools.LoggerListener to output all queries and their results).

Note: It's not possible to configure specific loggers by environment variables that way, only full packages. See https://docs.spring.io/spring-boot/reference/features/logging.html#features.logging.log-levels.

Configuring individual loggers works like this:

SPRING_APPLICATION_JSON='{"logging.level.org.jooq.tools.LoggerListener": "DEBUG"}'

How to Change the Database Schema

Simply change the src/main/resources/schema.sql file, then run

./gradlew jooqCodegen

Afterwards, you can start/continue coding with the newly created files.

How to Run the JAR

Everything (Storage and Database) in Memory, Filled with Demo Data

./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
  java \
    -jar -Dspring.profiles.active=demo \
    build/libs/th-wiki-1.0-SNAPSHOT.jar

Configured with Default Persistent Storage in ./storage

./gradlew bootJar
mkdir -p storage  # ensure directory exists
THWIKI_CORS_ORIGIN="http://localhost:5173" \
  SPRING_DATASOURCE_URL=jdbc:h2:file:./storage/th-wiki \
  SPRING_SQL_INIT_MODE=ALWAYS \
  java -jar build/libs/th-wiki-1.0-SNAPSHOT.jar

Run with Debugger

./gradlew bootJar
THWIKI_CORS_ORIGIN="http://localhost:5173" \
  java \
    -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005 \
    -jar -Dspring.profiles.active=demo \
    build/libs/th-wiki-1.0-SNAPSHOT.jar
# Use "Attach to process" in your debugger

How to Docker

Defaults with no Storage and In-Memory Database

(matches "in-memory" IntelliJ run configuration)

./gradlew bootJar
docker build -t th-wiki .
docker run --rm \
  -p 8080:8080 \
  -e THWIKI_CORS_ORIGIN="http://localhost:5173" \
  th-wiki

Configured with Storage from Outside

(matches "persisted" IntelliJ run configuration)

./gradlew bootJar
docker build -t th-wiki .
docker run --rm \
  -p 8080:8080 \
  -v /home/thehacker/IdeaProjects/th-wiki/th-wiki/storage:/th-wiki/storage \
  -e SPRING_DATASOURCE_URL=jdbc:h2:file:/th-wiki/storage/th-wiki \
  -e SPRING_SQL_INIT_MODE=ALWAYS \
  -e THWIKI_CORS_ORIGIN=http://localhost:5173 \
  th-wiki

FAQ

Why is there some Strange Out-Of-Sequence Commits on February, 8th 2025?

On my private projects I usually keep merge commits to have a cleaner history of the bigger features split over multiple commits. Between bigger features there is usually a clean-up phase where I do smaller improvements or refactoring not directly related to the previous feature. They either go into some feature/misc branch or directly to the master.

With tH-Wiki I tried a 100% linear history, state-of-the-art nowadays. While this performs well in professional projects with big teams, it does not for private projects where there sometimes is breaks for months until work continues.

I regretted the decision. On 2025-02-08 I rewrote the history for the th-wiki and th-wiki-ui repositories completely. Retroactively I reconstructed the branches and added the missing merge commits by --no-ff merging. Since these commits are new, they got a fresh author date equal to 2025-02-08. (I did not artificially set a fitting date, but rather kept the real one.) The other commits that were cherry-picked only got their commit date changed, but kept their original author date by Git.

About

tH-Wiki will provide an easy solution to having my personal wiki, task and issue tracker – all in one.

Topics

Resources

License

Stars

Watchers

Forks

Languages