Skip to content

Commit f2816d3

Browse files
committed
Sync with pipes.digital to v3: Data blocks and more
1 parent dc099d1 commit f2816d3

File tree

393 files changed

+59357
-15256
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

393 files changed

+59357
-15256
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
*.db
1+
*.db*
22
vendor/
3+
**node_modules/

CLAUDE.md

Lines changed: 0 additions & 61 deletions
This file was deleted.

DEVELOPMENT.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
This document will contain notes for the development of Pipes. It is meant as a reminder for me and as a starting point for other developers.
2+
3+
## Getting involved
4+
5+
Ideally, open an issue to discuss your plans or send me a mail. I might be able to help plan your changes. Then you can send in a PR with your changes. I need to understand them, but if I do and they don't collide with what is there I will merge.
6+
7+
## Architecture
8+
9+
Pipes has a very small architecture, with few files and classes. It is thus easy to read the whole code, but it is conceptually dense. Read through this section do understand the core concepts.
10+
11+
### Key Implementation Details
12+
13+
- Uses Ruby/Sinatra on the backend, Raphaël on the frontend for the editor, otherwise Pure.CSS as CSS library
14+
- Authentication via Portier (passwordless email-based login)
15+
- Pipes are stored as JSON structures defining blocks and their connections
16+
- When executed, a Pipe object creates a tree of Block objects based on the JSON
17+
- The output block's `run()` method triggers recursive processing of all input blocks
18+
- Results are cached for 10 minutes (600 seconds) to improve performance
19+
- Uses SQLite for data storage (pipes, users, sessions, cache)
20+
- Blocks can have both data inputs and text inputs (user parameters)
21+
- RSS is the internal data format for feed type blocks - those blocks input/output RSS feeds
22+
- Data blocks rely on a data layer called water instead - it contains a hash that gets converted to JSON or XML
23+
24+
### Backend
25+
26+
Pipes is on the backend a Ruby application. It uses Sinatra as a web framework, with Portier for logins. SQLite is used as a database, with all code that gets and sets data in the **database.rb**. For the functionality, each block as seen in the frontend has a corresponding ruby class under **blocks/**. Blocks have a function `process` where the custom functionality of each block is defined, again in Ruby code. A pipe is a graph of blocks, created in **pipe.rb**'s `createInputs` based on the JSON the frontend produced. There is one root (the output block) of the graph. The pipe class calls `run` on the output block, which then calls `run` on its inputs, which call `run` on their inputs and so on. In `run` the `process` function is called and its output returned.
27+
28+
What that output is depends on the type of blocks. There are two.
29+
30+
Blocks inheriting directly from `Block` will return an RSS object (from the [ruby rss gem](https://github.com/ruby/rss)). They are called feed type blocks in the user documentation. The idea is that other blocks work with that object without always having to parse an RSS string, as it was done initially. Blocks will usually iterate over the input feed, create a new RSS object with `RSS::Maker.make("rss2.0")`, use the `transferChannel` function (defined in **block.rb**) to copy the channel and then do their work on the items, copying items with `transferData` (also defined in **block.rb**) when possible. The feed block is the main entry point for those pipes and creates the initial RSS object, with the help of the feedparser gem if Ruby's RSS parser does not work with the fetched data. All other blocks of that type can now assume that they get a valid RSS object, and they function accordingly on channel items etc.
31+
32+
Blocks inheriting fom `WateredBlock` do not return an RSS object, but a `Water` object (**water.rb**) - it is what flows in a pipe. This is a data abstraction layer. They are called data blocks in the user documentation. `Water` can `absorb` XML or JSON (this could be extended for all hierarchical data representation formats) and saves this internally as a hash. That hash can in the end be `solidify`ed into XML or JSON. The idea is that blocks can work directly on the internal hash, without having to use tools specialized for either XML or JSON. But it turned out that JSONPath gems like Janeway were helpful to work with the hash. These type of blocks can assume nothing about the structure of the data they work on, users have to select the relevant fields. Water has an `outline` function that emits all possible JSONPaths, with which the autocomplete function is implemented.
33+
34+
The **server.rb** defines all the web endpoints. It is supposed to not do too much work itself, but call other classes like `Pipe`, `Database` or `User`. The other relevant class is in **downloader.rb**. `Downloader` is a wrapper around the gem HTTParty. It is simple, but it does implement throttling, respects http 429 headers (so we don't get banned as easily) and is a core functionality of almost all pipes.
35+
36+
37+
#### Core helpers/gems
38+
39+
Partly a recap, but this backend architecture makes Pipes rely on a number of Ruby gems. Especially:
40+
41+
* Sinatra for the web functionality (with Rack)
42+
* RSS as the data representation and tool used for regular `Block`s
43+
* OXML, currently a fork of it, as the tool that parses XML into the hash in `WateredBlock`s (based on the very fast Ox gem)
44+
* JSON to parse JSON input files in `WateredBlock`s, and for the pipe representation
45+
* Janeway, the JSONPath gem used to implement the functionality of the existing `WateredBlock`s
46+
* HTTParty for the downloads
47+
* throttle-queue to limit the amount of parallel downloads
48+
* sinatra-portier for user logins
49+
50+
### Frontend
51+
52+
On the frontend, you can separate Pipes into two parts. Most of the pages - there aren't that many - are static HTML generated by the ERB templates under **views/**. Also the editor page is created that way, but it is also the bigger second part: The editor functionality is implemented in Javascript via the Raphaël library, with Raphaël creating SVG objects on a canvas. It is completly managed in the (overly) big Javascript file **public/pipes-ui.js**. HTML input elements are absolutely positioned on that canvas to provide the user inputs, and manually culled or re-created when the user scrolls.
53+
54+
How the blocks are placed on that canvas, filled and connected gets serialized in a JSON object. That JSON object gets sent to the backend, where it is stored in the database on save or used to create the ruby blocks when a pipe runs. That's the mechanism with which the user creates a pipe.
55+
56+
The pipes-ui.js has functions for each block, like `FilterBlock`. Those functions .call the `Block` function for shared functionality, like the input and output objects. Each of the `new`ed functions for blocks on the canvas are stored in a global `blocks` array, connections between blocks are stored in a global `connections` array. These lists are later used to serialize the editor state.
57+
58+
The other pages, the HTML parts of the site, use the Pure.CSS framework. That was already a weird choice when the project started, but it was a bit of an easter egg, to reference the Yahoo! background of Pipes by using a Yahoo! CSS library. Pure provides some classes and default stylings that are used throughout the site, overriden in **public/style.css** where necessary.
59+
60+
Apart from that, the site has a interaction pattern of using vex.js' dialog boxes to ask for confirmation and to pop up text inputs.
61+
62+
#### Core libraries
63+
64+
The frontend thus depends on:
65+
66+
* Raphaël, as it paints and manages the blocks with their elements and connections
67+
* Pure.CSS, the CSS framework used
68+
* Interact.js for the drag'n drop functionality in the editor
69+
* Font Awesome for most of the icons
70+
* vex for dialog boxes
71+
* vkBeautify to pretty print JSON in the block inspector
72+
* XMLDisplay to pretty print XML in the block inspector
73+
74+
## Possible future steps
75+
76+
Pipes being a single monolith might be strange. There is very separate functionality: On one side the web application that handles incoming requests and renders the HTML the user sees, on the other the Ruby code that runs the pipes. This could be separated.
77+
It wasn't done yet because attempts in that direction failed so far. Once very late, when after launch the performance impact was too high for the existing usage, so the change had to be reverted. But a better implementation might even help with managing server ressources. Or, it is possible that since all pipe requests involve the webserver aspect - and the webserver requests that don't run pipes are too rare -, doing it all with the webworkers has an inherent performance benefit. To be investigated.
78+
79+
It might be a good idea to implement the pipe editor with different technology. Raphaël is quite old and hasn't seen a release in years. That is not really a problem since SVGs are very stable, and so is Javascript, but browser compatibility issues might still become an issue. Not only with Raphaël itself, also the approach to mix SVGs with absolutely positioned HTML input elements is not bulletproof. We have already seen issues [with Safari](https://github.com/pipes-digital/pipes/issues/86), and in the past there were similar issues with Firefox and Chrome.
80+
One option for that rebuild is Flutter - building only one part a webapp is supported (so, one page, like the editor), the way it paints UI elements on a canvas would fit and in general it would allow for a more modern UI (e.g. with more animations), plus I do like the Dart programming language, it meshes well with Ruby. On the other hand, Flutter and its ecosystem not being as stable as Vanilla JS and Raphaël could make this change become a future maintenance problem.
81+
82+
The other parts of the frontend could also use a modernization. If it results in a more modern looking design or if it makes some design elements easier to implement, it might be time to replace the Pure.CSS modules. Either with a more modern CSS Framework or library or just with modern HTML and CSS.
83+
84+
The data blocks are new and not complete. More feed type blocks should get a data block equivalent. Not all of them - some are too focused on feed structure, but others would work well with the new approach, like the webhook block. There also seem to be opportunities for some old and new blocks to work better together, especially the extract and the feed builder block should have options there. And there might be new kind of blocks that are possible now, that were not before.
85+
86+
One possibility of a new block is an LLM block, though that not even depends on the new data abstraction approach. Pipes should never jump on the AI hype, it collides way too much with the history of this software and the stability that is needed here. But, LLMs are very good in changing data structures, transforming between them for example or adding missing parts. And blocks might actually be a nice UI abstraction to work with an LLM, the inputs being the data to be worked on, together with a written prompt from the userinput. There is potential here to make actually good use of the "AI" technology.
87+
88+
Test coverage is not sufficient to be confident in changes. Not too much an issue when it was just about keeping the software/the server stable, but not helpful when doing bigger changes, like [when I reworked](https://github.com/pipes-digital/pipes/issues/141) what feed type blocks output. All blocks should get some tests to at least secure basic functionality. Pipes internals could also use some (though frankly, the surface is so small that this is quite optional).

Gemfile

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,34 @@
11
source "https://rubygems.org"
22
gem 'sinatra'
3-
gem 'sinatra-portier'
3+
gem 'sinatra-portier', '~>2.0'
44
gem 'sinatra-contrib'
55
gem 'puma'
6+
gem 'rack'
67
gem 'open_uri_redirections'
78
gem 'feedparser'
8-
gem 'sqlite3', '~> 1.6'
9+
gem 'sqlite3'
910
gem 'hashids', '~>1.0'
1011
gem 'rack-rewrite'
1112
gem 'nokogiri'
1213
gem 'lru_redux'
13-
gem 'moneta'
14-
gem 'thread'
1514
gem 'jsonpath'
1615
gem 'to_regexp'
1716
gem 'american_date'
18-
gem 'twitter'
1917
gem 'throttle-queue'
20-
gem 'capybara'
21-
gem 'selenium-webdriver'
18+
gem 'ferrum'
2219
gem 'oga'
2320
gem 'strings'
2421
gem 'httparty'
2522
gem 'addressable'
23+
gem 'securerandom'
2624
gem 'rss'
27-
gem 'test-unit'
25+
gem 'scylla'
26+
gem 'dalli'
27+
gem 'sanitize'
28+
gem 'base64'
29+
gem 'finishing_moves'
30+
gem 'oxml', git: 'https://github.com/onli/oxml', branch: 'merged'
31+
gem 'janeway-jsonpath', '~> 0.6.0'
32+
33+
gem 'test-unit'
34+
gem 'rake'

0 commit comments

Comments
 (0)