Skip to content

Commit acc2776

Browse files
authored
Improve clarity and consistency in immune architecture documentation
1 parent 2d6cdc5 commit acc2776

File tree

1 file changed

+102
-58
lines changed

1 file changed

+102
-58
lines changed

docs/immune/immune_architecture.md

Lines changed: 102 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -161,91 +161,135 @@ The two pathway activation works like this
161161
4. The reception or generation of a PAMP triggers a simple defensive response, that is to block the IP address of the threat in the local Firewall.
162162
5. The second activation comes from the DAMPs. DAMPs can be generated by the local Slips or received from the P2P network.
163163
6. Only when both signals are received then the stronger defense is used and Slips isolates or blocks the attacker using ARP cache poisoning techniques.
164+
165+
164166
### Evolution of Pattern Matching Detectors Upon Search
165-
When the adaptive system receives a PAMP detection together with some context flows, it will try to search if it has stored detectors Zeek scripts that can detect the context.
167+
When the adaptive system receives a PAMP detection together with some context flows, it will try to determine whether it has stored Zeek detector scripts that can match the context.
168+
169+
Depending on how the Zeek detectors are created, they may output either the percentage of matches or the confidence level of the detection. These values provide an estimate of how good the match is—essentially a metric of the detector’s goodness of fit.
170+
171+
For example, the URL https://www.test.com might be matched by the regex:
172+
173+
\b(https?://\w+\.\w{2,})\b
174+
175+
However, this regex is too generic and would also match many other URLs. A more restrictive regex such as:
176+
177+
\b(https?:\/\/\w+\.([a-zA-Z]{2,})\.\w{2,})\b
178+
179+
would provide a tighter match to the specific URL.
180+
181+
Another idea to explore is detector complexity: more complex detectors that still match the target string may be considered more restrictive.
166182

167-
Depending how the Zeek detectors are created, they may output the percentage of matching, or the percentage of confidence of the detection. These numbers should give an idea of how _good_ the match is. This is a metric to measure the _goodness of fit_ of the detector.
183+
If a suitable metric can be defined, it can then serve as the foundation for a simple evolutionary algorithm to improve the Zeek detectors.
168184

169-
For example, the URL https://www.test.com will be matched by a regex `\b(https?://\w+\.\w{2,})\b` . However this regex is too generic and will also match many other URLs. These second regex `\b(https?:\/\/\w+\.([a-zA-Z]{2,})\.\w{2,})\b` will actually match more tightly the URL.
185+
The algorithm could work as follows:
186+
187+
1. When a Zeek detector achieves more than X% match (or only for the best detector):
188+
2. Ask the LLM module to adapt the detector, given the context, to make it more restrictive and likely to detect only those specific strings.
189+
3. It is true that the most restrictive detector would be to directly use the context strings and block them, but this may not be the best approach. It is important to note that the original context of an alert cannot be used directly for detection without a Zeek detector that matches it. The context might be wrong or might capture benign patterns. The only reliable way to use the context is by extending a currently verified Zeek detector that has already passed negative selection.
190+
4. Once the Zeek detector is improved, it is tested again against the context flows. The loop continues until a Zeek detector with a sufficiently high threshold (likely between 95% and 99%) is created.
191+
192+
Interestingly, the human immune system has a small hole in this process: newly created detectors (B-cells or T-cells) are not re-checked against benign “self” patterns after the adaptation. The body lacks the time and resources to do this, leading to many autoimmune diseases. However, in the Slips adaptive system, there may be extra time to run the improved Zeek detector against benign traffic again to confirm it does not produce false positives.
170193

171-
Another idea to test will be the complexity of the detector, more complex detectors that still match the URL can be considered probably more restricted.
172194

173-
If a good metric can be found then this can be used as basic metric for a simple evolutionary algorithm to improve the Zeek detectors.
174195

175-
The algorithm can be
176-
1. When a Zeek detector has more than X% match (or only for the best detector)
177-
2. Ask the LLM module to adapt the detector, given the context, to improve the restriction and very likely only detect those strings.
178-
3. It is true that the best and more restrict detector would be to _directly_ use the context strings and block them, but we are not sure if that is the best approach. It is important to note that the original context of the alert can __not__ be directly used for detection without a Zeek detector that matches. This is because the context can be wrong or detect benign patterns. The only way to be sure the context can be used is by extending a currently verified Zeek detector what went through negative selection.
179-
4. When the Zeek detector is improved, it is measured again against the context flows and the loop continues until a Zeek detector with more than a threshold is created (probably between 95% and 99%).
180-
5. Curiously, the human immune system has a small flaw in this part, which is that the newly created detector (b-cell or t-cell) is not double-checked against the benign (self). This is too much for the body that does not have the resources nor time for that. But in Slips adaptive system there may be some extra time to run the Zeek detector again agaist some bening traffic to re-check it does not have false positives.
181196
### Decommission of Detectors
182-
The performance of each detector will be stored, specially when there is some feedback from false positives.
197+
The performance of each detector will be recorded, especially when feedback from false positives is available.
198+
199+
This performance data, together with the time-to-live value, will be used to decide whether to decommission or retain detectors. If a detector does not perform well enough, it will be deleted after its time-to-live period (measured in days). If it does perform well, see [Memory Cells](#memory-cells).
200+
201+
### Memory Cells
202+
203+
Since the performance of each detector is tracked, well-performing detectors will be stored for faster future use by keeping them in the Zeek detection configuration. For the initial implementation, a fixed threshold will determine what qualifies as a “good” detector.
204+
205+
A potential problem arises if too many good detectors are retained, causing detection speed to degrade. If this happens, Slips will rank detectors by performance and by their most recent detection time. Detectors that continue to generate detections will have higher priority in the database, which will be pruned to maintain a fixed maximum number of remembered detectors.
206+
183207

184-
The performance will be used, together with the time-to-live value, to know when to decommission or keep detectors. If a detector does not have good enough performance, it will be deleted after time-to-live _days_. If it does have performance, see [[#Memory cells]].
185-
### Memory cells
186-
Since the performance of each detector is stored, those good detectors will be simply stored for later faster use by being kept in the Zeek detection configuration. What is a good detector will be selected with a fixed threshold for starters.
187208

188-
One problem will be what would happen when there are too many good detectors to keep and the detection is not fast enough. If that happens, Slips will rank them by performance and time of last detection. Good detectors that keep detecting will have priority in the DB, which will be cut to have a fixed maximum of remembered detectors.
189209
### Anergy
190-
Anergy, in the human immune system, is the killing of detector cells that used to be good and have good performance, but they started to detect self later on.
191210

192-
In Slips this will be done by tracking the performance of each detector when enough information is received from the context of human operator regarding a false positive. When a detector is bad enough, it will be deactivated, or its threshold decreased.
211+
Anergy, in the human immune system, is the elimination of detector cells that were once effective but later began to detect “self.”
212+
213+
In Slips, this will be implemented by tracking the performance of each detector once enough information is received from the human operator about a false positive. When a detector is determined to be sufficiently unreliable, it will either be deactivated or its threshold lowered.
193214

194-
There are many reasons why a good detectors starts to fail, the most commons are related with all the variants of concept drift.
215+
There are many reasons why a good detector may start to fail, the most common being different forms of concept drift.
195216

196-
In a way no Slips does Anergy when the designers detect a false positive and adjusts or delete a detector. But this is hand-made and slow.
217+
In a sense, Slips already performs a form of Anergy when its designers detect a false positive and adjust or remove a detector. However, this process is manual and slow.
197218

219+
## Immunoregulation
198220

199-
## Immuno regulation
200-
Immuno regulation are all the actions taken by the immune system to
201-
1. __Amplification__. Be sure all the system knows about the threat and is not missed.
202-
2. __Control Power of Answer__. Do not overdo the answer to a threat.
203-
3. __Be Sure the Answer is on Time__. Act on time to answer to a threat.
204-
4. __Slowdown After the Threat is Gone__. Be sure the actions are stopped once the treat is removed.
221+
Immunoregulation refers to all the actions taken by the immune system to:
205222

206-
In Slips this will be done in two ways. First, inside the local Slips host. Second, by communicating with the other peers in the P2P network.
223+
1. **Amplification** — Ensure the entire system is aware of the threat so it is not overlooked.
224+
2. **Control the Power of the Response** — Avoid overreacting to a threat.
225+
3. **Ensure the Response Is Timely** — Act quickly enough to counter the threat.
226+
4. **Slow Down After the Threat Is Gone** — Stop actions once the threat has been removed.
227+
228+
In Slips, immunoregulation will be implemented in two ways:
229+
230+
1. Inside the local Slips host.
231+
2. Through communication with other peers in the P2P network.
207232

208233
### Slips Host
234+
209235
#### Amplification
210-
There is no need for amplification inside one alone Slips hosts since the whole system has the information.
236+
There is no need for amplification inside a single Slips host, since the whole system already has the information.
237+
211238
#### Control Power of Answer
212-
The innate system blocks in the firewall.
213-
The adaptive system blocks in the ARP poisoning attack.
214-
#### Be Sure the Answer is on Time
215-
The firewall rules are added to the local firewall to be fast.
216-
The Zeek scripts are added to the Zeek process.
217-
The ARP poisoning attack is executed as soon as approved by the adaptive system.
218-
#### Slowdown After the Threat is Gone
219-
The slowdown depends on the number of evidences and alerts generated in the current timewindow.
220-
221-
Currently Slips implements that after the attacker is blocked all the new alerts from the attacker are still stored. When the attacker stops attacking it enters a probation period of 1 timewindow where it is expected for the attacker host not to generate any alert in the first time window after the last block. If this is true, then on the second time window after the last block the attacker is unblocked. If the attacker continues to attack and generate evidences and alerts, it is continued to be blocked.
239+
- The innate system blocks via the firewall.
240+
- The adaptive system blocks via an ARP poisoning attack.
241+
242+
#### Be Sure the Answer Is on Time
243+
- Firewall rules are added to the local firewall for fast response.
244+
- Zeek scripts are injected into the Zeek process.
245+
- The ARP poisoning attack is executed as soon as it is approved by the adaptive system.
246+
247+
#### Slowdown After the Threat Is Gone
248+
The slowdown depends on the number of evidences and alerts generated in the current time window.
249+
250+
Currently, Slips implements the following behaviour:
251+
- After the attacker is blocked, all new alerts from that attacker are still stored.
252+
- When the attacker stops attacking, it enters a probation period of one time window. During this period, the attacker host is expected not to generate any alerts.
253+
- If no alerts are generated in that probation period, then in the second time window after the last block, the attacker is unblocked.
254+
- If the attacker continues to attack and generate evidences and alerts, it remains blocked.
255+
256+
222257
### P2P Network
258+
223259
#### Amplification
224-
For alerts generated from PAMPs and profile violations (DAMPs) do:
225-
- Every time the local Slips generates an alert, send it in the P2P network.
226-
- Every time the local Slips receives in the P2P an alert from other peer, and that alert happened in that peer, then resend to the P2P network (this gets alerts sent by peer A and generated inside peer A, but ignores alerts sent by peer A but generated by peer B).
260+
For alerts generated from PAMPs and profile violations (DAMPs):
261+
- Every time the local Slips generates an alert, send it into the P2P network.
262+
- Every time the local Slips receives an alert in the P2P network from another peer, and that alert originated in that peer, resend it to the P2P network.
263+
- (This ensures alerts generated by peer A and sent by peer A are amplified, but ignores alerts that peer A forwards on behalf of peer B.)
264+
227265
#### Control Power of Answer
228-
The innate system blocks in the firewall.
229-
The adaptive system blocks in the ARP poisoning attack.
230-
#### Be Sure the Answer is on Time
231-
The firewall rules are added to the local firewall to be fast.
232-
The Zeek scripts are added to the Zeek process.
233-
The ARP poisoning attack is executed as soon as approved by the adaptive system.
234-
#### Slowdown After the Threat is Gone
235-
The slowdown of the response must depend of the number of PAMPs and DAMPs that are still received by all the peers in the network.
266+
- The innate system blocks via the firewall.
267+
- The adaptive system blocks via an ARP poisoning attack.
268+
269+
#### Be Sure the Answer Is on Time
270+
- Firewall rules are added to the local firewall for fast response.
271+
- Zeek scripts are injected into the Zeek process.
272+
- The ARP poisoning attack is executed as soon as it is approved by the adaptive system.
236273

237-
When the attack stops, then no more evidences are generated on the peers and no more alerts should be sent, then no more alerts should be amplified. When the PAMPs stop, then the firewall block and ARP poison should stop too (the FW only needs PAMPs to be activated, but ARP attack must have both PAMPs and DAMPs to be activated).
274+
#### Slowdown After the Threat Is Gone
275+
The slowdown of the response depends on the number of PAMPs and DAMPs still received by peers in the network.
238276

239-
When the attack stops, also the behavior of the host should go back to normal and the DAMPs should stop (this will take more time).
277+
- When the attack stops, no more evidences are generated by peers and no further alerts should be sent or amplified.
278+
- When PAMPs stop, the firewall block and ARP poisoning should also stop. (The firewall only requires PAMPs to be activated, but the ARP poisoning attack requires both PAMPs and DAMPs.)
279+
- When the attack stops, the host’s behaviour should also return to normal, which will eventually stop DAMPs. This typically takes more time.
280+
281+
A waiting function, based on the time window, will be applied so these changes do not happen immediately after flows stop being received.
240282

241-
There will be some waiting function depending on the timewindow in order to apply these changes, so it will not be immediately after each flows are stopped received.
242283

243284
# Stopping the Threats
244-
The main goal of the human immune system is to protect us by killing pathogens, neutralising them, or expelling them. Slips needs to do the same to be effective.
245285

246-
To stop the threats Slips implements two actions:
286+
The main goal of the human immune system is to protect us by killing pathogens, neutralising them, or expelling them. Slips needs to do the same in order to be effective.
287+
288+
To stop threats, Slips implements two actions:
289+
290+
1. **Block in the local firewall**
291+
- When an alert is generated for a host (PAMPs are detected by the innate system only), the host is blocked in the firewall of the local machine.
292+
293+
2. **ARP Cache Poison Attack**
294+
- When both PAMPs and DAMPs are detected, the attacker host is isolated through an ARP cache poisoning attack.
247295

248-
1. Block in the local firewall.
249-
1. When an alert is generated for a host (PAMPs are detected by the innate system only), it is blocked in the firewall of the local host.
250-
2. ARP Cache Poison Attack
251-
1. When PAMPs and DAMPs are seen, then the attacker host is isolated by an ARP cache poison attack.

0 commit comments

Comments
 (0)