Skip to content

ETL pipeline in pure LibreOffice Basic that transforms 10ten Japanese Reader output into normalized LibreOffice Calc spreadsheet data. Parses words and alternative forms, collects definitions, dynamically inserts rows, merges cells, and applies borders—everything heavily commented so you know exactly what's happening.

License

Notifications You must be signed in to change notification settings

sanchesfilho/10tenkaizen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

10tenKaizen

Overview

A robust ETL (Extract, Transform, Load) pipeline built on LibreOffice Basic scripting that transforms unstructured dictionary data output from the 10ten Japanese Reader browser extension, by Birchill (dictionary data from JMdict/EDICT, KANJIDIC and JMnedict/ENAMDICT), into normalized, structured LibreOffice Calc spreadsheet data.

Key Features

  • Complete ETL Workflow: Validates, parses, transforms, loads, and formats data in a single execution
  • Data Normalization: Converts semi-structured text into structured LibreOffice Calc columns
  • Error Handling: Multi-layer validation to maintain data integrity
  • Zero Dependencies: Pure UNO API implementation, no external requirements
  • Built-in Safeguards: 50-line limit per entry prevents infinite processing loops
  • Dynamic Row Insertion: Automatically creates rows when entry components exceed available space
  • Verbose Commentary: Built for LibreOffice Basic learning and ease of maintenance

Installation

(Tested on 10ten Japanese Reader 1.26.1, Mozilla Firefox 147.0.3, and LibreOffice 25.8.4.2)

Clone the repository

git clone https://github.com/sanchesfilho/10tenkaizen

Import the macro to LibreOffice Calc

  1. Open LibreOffice Calc
  2. At the program's main window, navigate: Tools → Macros → Organize Macros → Basic... (keyboard shortcut: Alt + F11)
  3. On the BASIC Macros dialog box, navigate to My Macros → Standard → Module 1, and click "Edit". (Alternatively, you can create new Modules by clicking "Organizer...", navigating to My Macros → Standard, clicking "New", and clicking "Ok" on the "New Module" dialog box.)
  4. The "My Macros & Dialogs" window will appear. There will be a template of code, which may look something like this:
1 REM  *****  BASIC  *****
2 
3 Sub Main
4 
5 End Sub
6
  1. Select and delete all that pre-existing code in the module (e.g. by clicking on the code editor window and pressing Ctrl + ADelete). Make sure that the cursor is positioned at line 1.
  2. Navigate to File → Import BASIC...
  3. On the Open File dialog, navigate to /10tenkaizen/, select "10tenKaizen_v1.0.0.bas" and click "Open"
  4. Click "Save" (keyboard shortcut: Ctrl + S). You can now run the macro by clicking the "Run" button (keyboard shortcut: F5) inside the "My Macros & Dialogs" window

Set the macro as a shortcut (optional, but recommended)

  1. At the program's main window, navigate to Tools → Customize...
  2. Under the "Keyboard" tab, go into the "Category" panel, and navigate to Application Macros → My Macros → Standard → Module 1 (or other Module you have created).
  3. Select "Macro10tenKaizen" on the "Function" panel.
  4. Select your desired shortcut key or key combination inside the menu "Shortcut Keys", click "Assign", and click "OK".

How to Use

  1. Copy Data: There are two main ways to copy an entry with the 10ten Japanese Reader browser extension:
    • Method 1 (Menu navigation): Hover over a Japanese word → wait for the popup → click the word inside the popup → click "Entry" under the "Copy" tab.
    • Method 2 (Keyboard shortcut): Hover over a word → wait for the popup → press the sequence CE on your keyboard.
      IMPORTANT: Other copying methods are NOT supported by this code. Only the "Copy entry" format produces bracketed readings (e.g. [かな]) which are required by this macro.
  2. Paste: Paste the copied 10ten Japanese Reader output (keyboard shortcut: Ctrl + V) into any LibreOffice Calc cell. The Text Import dialog is expected to appear automatically; then, click "OK". This spreads the output vertically across rows in the same column.
    (If the Text Import dialog does not appear, use Paste Special (keyboard shortcut: Ctrl + Shift + V) and select "Use text import dialog").
    You can paste either one or multiple 10ten Japanese Reader outputs, as long as they are all in the same column and there are no empty cells between them—the macro will automatically detect and process all consecutive entries until an empty cell is encountered.
  3. Execute: Select the first cell of the first entry (i.e., the headword) and run Macro10tenKaizen as explained above, either via the "My Macros & Dialogs" window or the keyboard shortcut you have assigned.
  4. Result: The macro automatically parses, normalizes, structures, merges, numbers, and formats the data across columns A–E.

Processing Flow Example

Raw Data Output From 10ten Japanese Reader: 仮名, 仮字, 假名 [かな] (n) kana; Japanese syllabaries (i.e. hiragana and katakana)

Normalized Data Output Schema via 10tenKaizen:

A B C D E
1 仮名 仮字
假名
かな (n) kana; Japanese syllabaries
(i.e. hiragana and katakana)
Auto-generated
group identifier
Headword Alternative forms Reading Definitions

ETL Transformation Sequence

  1. Parse: Tokenize the first line from the output ("仮名, 仮字, 假名 [かな]") into semantic components
  2. Extract: Isolate headword, alternative writings, and enclosed reading ("かな") from parsed tokens
  3. Collect: Aggregate definition lines vertically below the main entry (row 2 onwards) in the source column
  4. Calculate: Determine required block height based on whichever is largest: word count, reading count, or definition count
  5. Expand: Dynamically insert rows if calculated height exceeds available space between entries
  6. Distribute: Map parsed elements to normalized columns B–E
  7. Number: Assign sequential Group ID numbers to column A (auto-incremented based on previous blocks)
  8. Structure: Apply cell merging (columns A–D) and border formatting for visual grouping
  9. Cleanup: Erase processed source data to maintain sheet cleanliness
  10. Iterate: Continue processing subsequent entries in the same manner until empty cell is encountered

Author / Contact

Jayme Sanches Filho

License

Distributed under the MIT License. See LICENSE for more information.

About

ETL pipeline in pure LibreOffice Basic that transforms 10ten Japanese Reader output into normalized LibreOffice Calc spreadsheet data. Parses words and alternative forms, collects definitions, dynamically inserts rows, merges cells, and applies borders—everything heavily commented so you know exactly what's happening.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages