Skip to content

Sometimes you just have to externalize the thoughts by implementing them. Zero downtime upgrade of a server-side rendered animation of svg. No k8s, extra databases or message brokers needed (trade-offs may apply)

License

Notifications You must be signed in to change notification settings

d-led/ssr-robust-live-svg

Repository files navigation

Robust SVG-based Live View Animated

Disclaimer

Motivation

  • practice patterns for SVG animation on the server
  • practice OTP processes that externalize their state to support an minimal-interruption restart. Demonstratable via fault injection
  • practice injecting larger faults such as node failures
  • investigate GitHub Copilot-assisted coding in Elixir
  • practice writing mix tasks, e.g. to bump the app version
  • practice a whole application hot code upgrade without directly using a known "advanced" approach.

Functionality

  • a virtual ball is flying around in a box
  • its behavior can be changed at run-time
  • various system failures can also be triggered/simulated
  • it is expected that the ball continues the movement without a noticeable interruption as long there's one machine available

demo

How to Run

In Docker Locally

docker compose up

http://localhost:4000

With Elixir

# once
mix setup

# one node
mix phx.server

# 3 nodes
process-compose

http://localhost:4000

  • use process-compose to start 3 nodes locally → (additional node ports: 4001, 4002)

To crash a node: triple click on an x.

Hot Code Updates

  • the demo fix: the new version contains a new ball movement module: RandomReboundV2NonSticky
  • release-two-versions.sh simulates the build of two versions with one compiled without the new module. The new behavior name is pre-configured and added in a particular version for the demo.
  • demo
    • start two versions running alongside in a cluster
    • look at the views of both versions, noting the new module
    • try to switch over to the new module while the ball runs on the old instance → safe failure
    • take down the old node → the new node takes over but still running the old behavior
    • switch over to the new behavior

using process-compose:

scripts/release-two-versions.sh
scripts/run-two-versions.sh

hot code update demo

Seamless Simulation With a Whole Node Failing

node-crash-ball-reschedule.mp4

link

Architecture

  • the application is clustered
  • a singleton process Ball runs on one of the nodes in the cluster
  • the ball is flying around in a box with an injectable behavior, fulfilling a BallMovement protocol
  • the list of movement behavior modules can be found in the config :available_ball_behaviors
  • the config includes one non-existent module NonExistentBehavior which simulates a sub-system update fault
  • the nodes (dangerously → demoware!) expose a kill switch which stops a node with an non-zero exit code, triggering a restart of the ball process on another node
  • the state of the ball is continuously externalized to a simple process called StateGuardian, local to each node
  • when the ball starts, it may load its state from the StateGuardian
  • the svg is rendered as a live view template, updating its position only
  • the list of nodes is updated periodically by ClusterInfoServer
  • upon start (with a short delay), and on detection of a new node by NodeListener, the compiled modules configured on one node are spread to other nodes via BehaviorModules. Pre-requisite: the module doesn't depend on modules not present on other nodes.

Details

Architecture at a Glance

flowchart TB   
    subgraph Replica1
    HordeSupervisor1([HordeSupervisor]) -- schedules start of --> Ball
    Ball(("Ball (singleton)"))-- publishes state changes to -->BallStateTopic@{ shape: das, label: "state:ball" }
    %% Ball-- updates -->BallUpdatesTopic@{ shape: das, label: "updates:ball" }
    Ball-- publishes changes to -->BallCoordinatesTopic@{ shape: das, label: "coordinates:ball, updates:ball" }
    StateGuardian1[StateGuardian] -- subscribed to --> BallStateTopic
    Ball -. restores state from .-> StateGuardian1 
    LiveView1@{ shape: manual-input, label: "LiveView"} -- subscribed to --> BallCoordinatesTopic
    end
    subgraph Replica2
    BallStateTopic2@{ shape: das, label: "state..." } -- distributed --- BallStateTopic
    StateGuardian2[StateGuardian] -- subscribed to --> BallStateTopic2
    HordeSupervisor2([HordeSupervisor]) -- distributed ---  HordeSupervisor1
    LiveView2@{ shape: manual-input, label: "LiveView"} -- subscribed to --> BallCoordinatesTopic0
    BallCoordinatesTopic0@{ shape: das, label: "coordinates..." }-- distributed ---BallCoordinatesTopic
    end
    subgraph Users[" "]
    User1("fa:fa-user User") -- interacts with --> LiveView2
    User2("fa:fa-user User") -- interacts with --> LiveView1
    end
Loading

Trade-Offs

  • Will it scale? → What do you mean by 'scale' exactly?
  • Why publish each ball state to the state guardian? Isn't it too chatty/expensive? → Yes. I wanted to demo simulating a whole node going down on which the singleton ball is running. Without it, the take-over of the ball by another node wouldn't look that spectacular.
  • Why not use technology XYZ for this? → Yes. That'd be nice, although, Phoenix LiveView, Elixir and Erlang provide so many primitives out of the box, making such architectural sketches effective, requiring fewer infrastructural moving parts.
  • Why not just use the standard Erlang/OTP mechanism for the hot code upgrade? → Yes, that'd be nice as well, and has been tried and tested all around the world. Many articles and docs on the subject suggest trying alternative approaches these days. Knowing something is possible and having tried it may lie far apart.
  • Why not demo XYZ as well? → Yes, that'd be nice too. There's no-one to stop you from doing it.

More Failure Modes

  • If the last known state has been a new module, downgrading will lead to a system failure, unless the module has already been distributed by BehaviorModules to the node that starts the ball. last_known_good_module cannot be relied upon, and some other behavior like a default behavior module or a stack of last known good modules can be designed.

Well-Known Patterns Identifiable in the Demo

  • Decoupling deployment from release
  • Keeping the state of a process outside of it (e.g. in another, logic-less process)
  • "The generic component should hide details of concurrency and mechanisms for fault-tolerance from the plugins. The plugins should be written using only sequential code with well-defined types."

About

Sometimes you just have to externalize the thoughts by implementing them. Zero downtime upgrade of a server-side rendered animation of svg. No k8s, extra databases or message brokers needed (trade-offs may apply)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published