Telegram Message Search

Usage

The app is currenty hosted here: https://true-real-michael.github.io/tg-message-search

Export a telegram chat in a JSON format
Upload the result.json file (the website runs in the browser and no data is saved)
Search and browse threads and messages

Экспортируйте телеграм-чат в формате JSON
Загрузите файл result.json (сайт работает полностью в браузере, и никакие данные не сохраняются)
Производите поиск по тредам и сообщениям

UI looks like this:

Building

Install Trunk, add wasm32 target

cargo install trunk
rustup target add wasm32-unknown-unknown

Download the lemmatization-ru.tsv.gz file from releases and place it under the /data directory. Alternatively, download the morphological dictionary from OpenCorpora's website, place it under /data, run the scripts/preprocess_opcorpora.py script, and gzip the result
Run the project

trunk serve --port 3000 --release

The project will be available at localhost:3000/tg-message-search

If you want to use this project for a different language, you should replace the lemmatization dictionary with the one for your language. If you want more complex lemmatization/stemming/embedding logic, you should take a look at the Lemmatizer struct in src/analysis/lemmatizer.rs and modify it accordingly.

Why?

The Telegram native message search was not convenient for me, especially:
- When I wanted to search for synonyms.
- When I wanted to search for combinations of words.
- When the info is scattered across multiple messages, which form a reply chain.
- When there are many results, it is inconvenient to scroll through them in a tiny search results bar.

Design choices

Why WASM?
- To maintain privacy by keeping all data client-side.
- To avoid round-trips for queries and data upload.
- I didn't want to spend money on a backend.
- Because it is a cool technology and I wanted to try it out.
Why no embedded db?
- Because I wanted bespoke lemmatization logic.
- I also wanted to keep the app lightweight and minimalistic.
Why Leptos?
- No reason at all, just wanted to try it out.
- This project used to use wasm-bindgen + vanilla JS + HTML, but I tried doing reactive UI with Leptos and it worked well.
- Language unification was a nice bonus.
Why dictionary-based lemmatization?
- I initially considered using word embeddings, but I could not find a suitable model for Russian.
- Dictionary gets the work done and does not take too much space (arguably): ≈9MB compressed, ≈300MB uncompressed.

What I learned

How to use WASM in a web application.
How to use Leptos for building a reactive UI.
Refreshed memories on parsing.

Potential improvements

Web Workers for background initialization. Currently it is blocking the main thread.
Revise the code because it contains a lot of clones and unwraps.

License

All the code is licensed under MIT License

The file lemmatization-ru.tsv.gz in this repository's GitHub releases is a derivative of OpenCorpora's Russian language morphologic dictionary and is licenced under Creative Commons Attribution-ShareAlike 3.0

Весь код находится под лицензией MIT

Файл lemmatization-ru.tsv.gz в GitHub-релизах этого репозитория является производной от морфологического словаря OpenCorpora и находится под лицензией Creative Commons Attribution-ShareAlike 3.0

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github/workflows		.github/workflows
assets		assets
public		public
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
Trunk.toml		Trunk.toml
index.html		index.html
rust-toolchain.toml		rust-toolchain.toml
tailwind.config.js		tailwind.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Telegram Message Search

Usage

Building

Why?

Design choices

What I learned

Potential improvements

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

true-real-michael/tg-message-search

Folders and files

Latest commit

History

Repository files navigation

Telegram Message Search

Usage

Building

Why?

Design choices

What I learned

Potential improvements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages