Release notes for Shuffle 2.1.0 with new features

frikky · web-flow · commit 029d1ea53bb8 · 2025-09-10T01:06:02.000+02:00
This release introduces significant updates including Datastore Categories for improved data organization, enhancements for Local AI Models, and various scalability improvements. It also emphasizes the importance of automation and insights in cybersecurity operations.
diff --git a/articles/2.1_release.md b/articles/2.1_release.md
@@ -0,0 +1,139 @@
+<img width="1600" height="836" alt="image" src="https://github.com/user-attachments/assets/15d54faa-4d66-44d4-8aca-b1b5cb52dbd8" />
+
+# Shuffle 2.1.0: Datastore Categories, Local AI Models and Singul
+As always, this release focuses on the things that matter most: Ease of use and Scalability. Our architecture still works at crazy scales, even with millions and millions of runs occurring every hour at this point, both in onprem and cloud environments. We are further having to debate determinism and control of AI Agents, and want to bring everyone up to speed about the do's and don'ts, without losing the security thread. 
+
+## Overview
+1. Architecture & Scaling controls
+2. Datastore Categories & uses
+3. Statistics & Observability
+4. Local AI Models
+5. Standards, Singul & AI Agents
+6. What comes next
+
+# Architecture & Scale
+It is not as easy to scale Shuffle yet as we would like it to be. We are however getting there. The big scale-change in 2.0 was that we allowed ANYONE to scale Shuffle themselves by open sourcing parts that we originally kept proprietary until it was mature enough. We believe in our mission of helping companies with collaborative cybersecurity globally, and without open sourcing and having clear transparency guidelines, our platform and community would never have grown to what it is today. 
+
+<img width="1006" height="1069" alt="Shuffle's one-server architecture" src="https://github.com/user-attachments/assets/20193025-5692-42e2-a222-a376df057fcb" />
+
+One of the major pain points over the last year has actually been the green "Opensearch" part seen in the image above. Opensearch can scale well if managed well, and the default settings should allow you to get quite far. We have therefore made default size rollover on certain indexes, and made replication and available shards utilisation better. If you are looking to scale however, we recommend that you move to hosting Opensearch by itself. This additionally makes it possible to use it for log management, which may be relevant in the future. 
+
+Some relevant blogposts we like about scaling Opensearch clusters:
+* [Improve OpenSearch cluster performance by separating search and indexing workloads | OpenSearch](https://opensearch.org/blog/improve-opensearch-cluster-performance-by-separating-search-and-indexing-workloads/)
+
+* [OpenSearch Operator: Deployment, Scaling, and Optimization | Last9](https://last9.io/blog/opensearch-operator/)
+
+* [Creating your first OpenSearch® cluster and pro tips for success | Instaclustr](https://www.instaclustr.com/education/opensearch/creating-your-first-opensearch-cluster-and-pro-tips-for-success/)
+
+Another area that has gotten some attention has been Apps. This is tricky, as Python does not tend to scale very well. But thankfully Docker swarm or Kubernetes) does! And that means we can scale up the amount of containers per server dramatically without any clear impact on the server itself. 
+
+Here are a couple of the things that have been done:
+
+* Manual control of App container scale
+* Auto-controlling apps based on request amount to each container
+* Tracking the queue within the app containers to see if they get backed up
+* Tracking CPU usage per server that the app runs on
+* Tracking File I/O
+* Modifying thread pool controls within Python itself
+* And a whole lot more
+
+And the reality is.. we need to manage all of these at once. Shuffle being an orchestrator , it shouldn't matter what goes down or what is available right now - it should just work. And that is the direction we will keep moving. If you use Kubernetes or follow IAC (Infrastructure as Code) principles, Shuffle is now better equipped to work within your environment, no matter where it may be through additional controls available for Workers only. 
+
+## Datastore Categories
+Shuffle's Datastore was originally added to allow for simple key:value storage. We over time found that it had two major usecases everyone wanted: and ended up using
+
+1. Deduplication (e.g. ignore an ID after the 1st time it is handled)
+2. Storing keys as global variables (and secrets)
+
+With this in mind, we built out the first version back in 2020, which allowed for exactly these things. The main issue however came with discoverability: How do you find the one key out of 10.000 keys? The answer: Datastore Categories. 
+
+<img width="1067" height="314" alt="A simple view of categories" src="https://github.com/user-attachments/assets/841b7ca4-dcfd-41c0-a0a4-5a32c81afb42" />
+
+Datastore categories allow you to store full alerts or tickets, endpoints, IOCs, users and anything else you would like, structured directly in Shuffle. This means that you in theory can store every single alert, 
+
+Categories have some additional features as well:
+* A new action in Shuffle Tools called "Search Datastore Category", entirely for deduplication
+* Protected Keys, which can be used in workflows without being read*
+* Set a category timeout for when keys should be deleted
+* Publish the full list of values (e.g. sharing IOCs or firewall subscriptions)
+* Run a workflow (and many other functions) when a key is added/edited
+* Automatic uploads from Singul ingestions (tickets in OCSF format)
+
+<img width="601" height="600" alt="How to automate what happens when a category is changed" src="https://github.com/user-attachments/assets/ee108947-0e12-4387-8883-c11aa8b7168e" />
+
+*Protected keys are attempted to be cleared with a basic fuzzy-hashing mechanism. This allows for you to use them as secret keys without a high chance of them being available in the Workflow itself. This is best effort, and is _not_ bulletproof. When using Protected Keys within workflows, we recommend also setting up Result Cleanup for the node. 
+
+Categories are support through Bulk API's as well, meaning you can upload and check if up to 500 keys exist at once, with pagination helping you past that point. They can be used within workflows directly with category=x, such as self.get_cache("keyname", category="category"). 
+
+<img width="589" height="364" alt="Result cleanup used to delete the data for a whole node after the data is no longer useful" src="https://github.com/user-attachments/assets/ad857e0b-cf17-4605-ba45-d6bf2d1bc02a" />
+
+There are more mechanisms coming to Datastore Categories, including automatic ingest mechanisms, graph-comparisons (n-grams) and pages - all relevant to how automation will be done in the future. 
+
+## Statistics & Observability 
+With Shuffle being a quick-to-production product, it very quickly becomes a necessity to keep it stable 24/7/365. That is why Shuffle continuously tracks a few important metrics, which we will keep improving upon:
+
+<img width="437" height="284" alt="A new view you can enable on the Workflow UI to show Workflow Executions" src="https://github.com/user-attachments/assets/9696e6ad-6e78-47f4-8a60-7a65b5cc94f8" />
+
+* [What workflows run the most](https://shuffler.io/admin?admin_tab=billingstats) (and why)
+* [What workflows fail the most](https://shuffler.io/admin?admin_tab=billingstats)
+* [Historical debug search](https://shuffler.io/debug)
+* [Notifications for realtime alerting](https://shuffler.io/admin?admin_tab=notifications)
+* [A multi-tenant view to dig into other orgs](https://shuffler.io/admin?admin_tab=billingstats)
+
+
+A **new feature** added to this that makes things more clear is on the Admin statistic pages. We have added features to both track your current organisation, along with any other sub-organisation directly from the parent. This makes it easier, especially in cases where you are providing Shuffle directly to your own customers, to know who is doing what, and what their limits are.
+White-labeling and full re-branding is available as well.
+
+<img width="1067" height="723" alt="A new Child Org app run tracker has been added to parent orgs" src="https://github.com/user-attachments/assets/8c0857ee-59ec-4368-bf5d-e3bfaea707d1" />
+
+In addition to this, the work on custom dashboards has started. The missing piece to this was primarily a way for YOU to control the stats in the dashboard, which now exists through the POST /api/v1/stats API, where you can pass in {"key": "amount_handled", "value": 10} which will increment the key "amount_handled" by 10. This gives full timeline controls to you as a user, and further allows us to track statistics for you in the background.
+
+## Local AI Models
+Since 2019, the main goal has been to make automation available to everyone. The main failure has been the time get started, which is what we have kept building towards. And we're finally finding clarity. Back in 2021–2022 we ran our first tests to generate Workflows based on custom AI models and next-node predictions. We also tried using early OpenAI LLMs (pre chatgpt). None of these options were as successful as we hoped back then, but we never stopped trying. With recent reasoning models however, things are changing, and it is becoming feasible to do both full workflow generations and node predictions. 
+This means you may no longer need to become good at automation to do automation. It's not about the limit of what you yourself know and can do, but instead about the ideas you have. By being good at automation you still have an edge, but this will help those who aren't there yet tremendously. 
+
+
+### Here are a few key ideas:
+1. Generate Workflows from Text and/or Flowcharts based on your already selected Apps
+
+<img width="698" height="302" alt="image" src="https://github.com/user-attachments/assets/ab4ef8c9-b1c3-4475-a5d0-0d0ad10176b3" />
+
+2. Make edits to existing workflows controllable with AI and keyboard shortcuts 
+3. When a workflow runs, we automatically track errors. We can now auto-fix them for you with step 2
+
+
+With these three simple stages, Shuffle has the potential of solving your problems for you, while also giving you full access to control what is going on yourself. It is built to make automation accessible, while allowing you more time to work on critical cybersecurity issues instead of constant bugfixing. 
+And best of all: **We made it work with locally hosted models**. Shuffle supports any OpenAI-format compatible AI provider, which thankfully is most of them. This means you can provide a URL, a model and an API-key, and we will use that model for you. This will work with Hybrid and Onprem environments through Orborus as well, meaning any model will work anywhere. 
+
+[Read more about how to set up local models here.](https://shuffler.io/docs/AI#how-to-set-up-a-self-hosted-ai-model-with-shuffle)
+
+## Standards, Singul & Agents
+Last, and most important - Standards & Singul. This section is the least straightforward, but most important. We have over time found the usecases people are looking to solve with, while also mapping out how security operations should be done at scale. This has made it possible to build Shuffle in a direction that makes us capable of solving these problems directly, rather than you having to manually set it up. A few key parts are:
+
+1. Everyone is already focused on products that provide monitoring capabilities, whether for endpoints, IAM or otherwise 
+2. These products don't work seamlessly and need a lot of love to actually work how you want them to work
+3. The output of the tools are in form of detections/alerts, which require some kind of investigation and closure
+4. The investigation requires additional information from other tools in the stack, such as the endpoints & users affected, along with what the initial access and persistence mechanisms looked like etc. 
+5. The response action is often done fully manually and hastily. 
+
+**What is being asked for**: Automation of the basics like ticket handling
+
+**What everyone wants**: Insights to help you protect yours or your customers' company
+
+<img width="964" height="201" alt="A simplified mechanism to ingest information from multiple tools all at once" src="https://github.com/user-attachments/assets/0940bbe7-5fe1-46e3-9bcf-bf39435465e9" />
+
+The basics in automation should be just that - basics. They should be handled for you. And with Singul, this is the start. Insights is a bi-effect. If the data from step #1 to #5 is standardised and cross-correlation across types is possible (logs <-> ticket <-> user <-> endpoint), while not having to worry about the tool, 
+
+**That is what Singul is**. That is why we keep mentioning it, without bringing in substance and clear usecases yet. It is all happening, and you are going to be using it without know that you are using it. When we implement systems, we don't implement them to be in-your-face and sound sexy. We implement them to be useful. And if it's not useful, it's hidden until read or scrapped entirely. AI Agents is a perfect example of this. Without controllability and determinism, they will simply cause chaos if they aren't able to learn from their own and others' mistakes. 
+
+You will see a lot more of these features soon.
+
+## What's next
+
+That's a wrap! There is more to the release than meets the eye, and we are always open to feedback from the great community that has been built around Shuffle. It has been clear that AI is critical to the future, but we will be careful as to not implement just a shitty chatbot. 
+
+We will be updating our roadmap continuously below, and would love feedback to understand the direction you would like us to move:
+
+[Shuffle Roadmap](https://shuffler.io/docs/about#roadmap)
+
+I want to thank everyone for your patience, and  I promise it will be worth it.