Skip to content

Commit 899ba8d

Browse files
authored
Consolidate caches (#202)
* #199 Consolidating caches.
1 parent 26d8cc5 commit 899ba8d

File tree

127 files changed

+988
-2400
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+988
-2400
lines changed

docs/docs/other_features/anonymization.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,27 @@
11
# Consistent Anonymization
22

3-
Anonymization in the context of Phileas is the process of replacing certain values with random but similar values. For example, the identified name of “John Smith” may be replaced with “David Jones”, or an identified phone number of 123-555-9358 may be replaced by 842-436-2042. A [VIN](vins.md) number will be replaced by a 17 character randomly selected VIN number that adheres to the standard for VIN numbers.
3+
Anonymization in the context of Phileas is the process of replacing certain values with random but similar values. For
4+
example, the identified name of “John Smith” may be replaced with “David Jones”, or an identified phone number of
5+
123-555-9358 may be replaced by 842-436-2042. A [VIN](vins.md) number will be replaced by a 17 character randomly
6+
selected VIN number that adheres to the standard for VIN numbers.
47

5-
Anonymization is useful in instances where you want to remove sensitive information from text without changing the meaning of the text. Anonymization can be enabled for each type of sensitive information in the policy by setting the filter strategy to `RANDOM_REPLACE`. (See [Policies](policies_README.md) for more information.)
8+
Anonymization is useful in instances where you want to remove sensitive information from text without changing the
9+
meaning of the text. Anonymization can be enabled for each type of sensitive information in the policy by setting the
10+
filter strategy to `RANDOM_REPLACE`. (See [Policies](policies_README.md) for more information.)
611

712
## Consistent Anonymization
813

9-
Consistent anonymization refers to the process of always anonymizing the same sensitive information with the same replacement values. For example, if the name "John Smith" is randomly replaced with "Pete Baker", all other occurrences of "John Smith" will also be replaced by "Pete Baker."
14+
Consistent anonymization refers to the process of always anonymizing the same sensitive information with the same
15+
replacement values. For example, if the name "John Smith" is randomly replaced with "Pete Baker", all other occurrences
16+
of "John Smith" will also be replaced by "Pete Baker."
1017

11-
Consistent anonymization can be done on the document level or on the context level. When enabled on the document level, "John Smith" will only be replaced by "Pete Baker" in the same document. If "John Smith" occurs in a separate document it will be anonymized with a different random name. When enabled on the context level, "John Smith" will be replaced by "Pete Baker" whenever "John Smith" is found in all documents in the same context.
18+
Consistent anonymization can be done on the document level or on the context level. When enabled on the document
19+
level, "John Smith" will only be replaced by "Pete Baker" in the same document. If "John Smith" occurs in a separate
20+
document, it will be anonymized with a different random name. When enabled on the context level, "John Smith" will be
21+
replaced by "Pete Baker" whenever "John Smith" is found in all documents in the same context.
1222

13-
Enabling consistent anonymization on the context level requires a cache to store the sensitive information and the corresponding replacement values. If a single instance of Phileas is running, its internal cache service (enabled by default) is the best choice and no additional configuration is required.
23+
Enabling consistent anonymization on the context level requires a cache to store the sensitive information and the
24+
corresponding replacement values. If a single instance of Phileas is running, its internal cache service (enabled by
25+
default) is the best choice and no additional configuration is required.
1426

15-
If multiple instances of Phileas are deployed together, Phileas requires access to a Redis cache service as shown below. See Phileas' [Settings](settings.md) on how to configure the cache.
1627

17-
**When Phileas is deployed in a cluster, a Redis cache is required to enable consistent anonymization.**
18-
19-
The anonymization cache will contain PHI. It is important that you take the necessary precautions to secure the cache and all communication to and from the cache.

docs/docs/settings.md

Lines changed: 16 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22

33
Phileas has settings to control how it operates. The settings and how to configure each are described below.
44

5-
> The configuration for the types of sensitive information that Phileas identifies are defined in [filter policies](filter_policies/filter_policies.md) outside of Phileas' configuration properties described on this page.
5+
> The configuration for the types of sensitive information that Phileas identifies are defined
6+
> in [filter policies](filter_policies/filter_policies.md) outside of Phileas' configuration properties described on
7+
> this
8+
> page.
69
710
## Configuring Phileas
811

@@ -14,57 +17,39 @@ Phileas looks for its settings in an `application.properties` file.
1417

1518
Properties set via environment variables take precedence over properties set in Phileas' settings file.
1619

17-
All following properties can also be set as environment variables by prepending `PHILTER_` to the property name and changing periods to underscores. For example, the property `filter.profiles.directory` can be set using the environment variable `PHILTER_FILTER_PROFILES_DIRECTORY` by:
20+
All following properties can also be set as environment variables by prepending `PHILTER_` to the property name and
21+
changing periods to underscores. For example, the property `filter.profiles.directory` can be set using the environment
22+
variable `PHILTER_FILTER_PROFILES_DIRECTORY` by:
1823

1924
```
2025
export PHILTER_FILTER_PROFILES_DIRECTORY=/profiles/
2126
```
2227

23-
Using environment variables to configure Phileas instead of using Phileas' settings file can allow for easier configuration management when deploying Phileas.
28+
Using environment variables to configure Phileas instead of using Phileas' settings file can allow for easier
29+
configuration management when deploying Phileas.
2430

2531
## Policies
2632

2733
| Setting | Description | Allowed Values | Default Value |
28-
| --------------------------- | -------------------------------------------- | ------------------------- | ------------- |
34+
|-----------------------------|----------------------------------------------|---------------------------|---------------|
2935
| `filter.policies.directory` | The directory in which to look for policies. | Any valid directory path. | `./policies/` |
3036

3137
## Span Disambiguation
3238

33-
These values configure Phileas' span disambiguation feature to determine the most appropriate type of sensitive information when duplicate spans are identified. In a deployment of multiple Phileas instances, you must enable the [cache service](Settings#cache) for span disambiguation to work as expected.
39+
These values configure Phileas' span disambiguation feature to determine the most appropriate type of sensitive
40+
information when duplicate spans are identified. In a deployment of multiple Phileas instances, you must enable
41+
the [cache service](Settings#cache) for span disambiguation to work as expected.
3442

3543
| | Description | Allowed Values | Default Value |
36-
| ----------------------------- | --------------------------------------------- | --------------- | ------------- |
44+
|-------------------------------|-----------------------------------------------|-----------------|---------------|
3745
| `span.disambiguation.enabled` | Whether or not to enable span disambiguation. | `true`, `false` | `false` |
3846

39-
## Cache Service
40-
41-
The cache service is required to use [consistent anonymization](anonymization.md) and policies stored in Amazon S3. Phileas supports Redis as the backend cache. When Redis is not used, an in-memory cache is used instead. The in-memory cache is not recommended because all contents will be stored in memory on the local Phileas instance.
42-
43-
The cache will contain sensitive information. It is important that you take the necessary precautions to secure the cache itself and all communication between Phileas and the cache.
44-
45-
| Setting | Description | Allowed Values | Default Value |
46-
| ------------------------ | ----------------------------------------------------------------- | ------------------------- | ------------- |
47-
| `cache.redis.enabled` | Whether or not to use Redis as the cache. | `true`, `false` | `false` |
48-
| `cache.redis.host` | The hostname or IP address of the Redis cache. | Any valid Redis endpoint. | None |
49-
| `cache.redis.port` | The Redis cache port. | Any valid port. | `6379` |
50-
| `cache.redis.auth.token` | The Redis auth token. | Any valid token. | None |
51-
| `cache.redis.ssl` | Whether or not to use SSL for communication with the Redis cache. | `true`, `false` | `false` |
52-
53-
The following Redis settings are only required when using a self-signed SSL certificate.
54-
55-
| Setting | Description | Allowed Values | Default Value |
56-
| --------------------------------- | ---------------------------- | -------------------- | ------------- |
57-
| `cache.redis.truststore` | The path to the trust store. | Any valid file path. | None |
58-
| `cache.redis.truststore.password` | The trust store password. | Any valid file path. | None |
59-
| `cache.redis.keystore` | The path to the keystore. | Any valid file path. | None |
60-
| `cache.redis.keystore.password` | The keystore password. | Any valid file path. | None |
61-
6247
## Advanced Settings
6348

64-
> In most cases the settings below do not need changed. Contact us for more information on any of these settings.
49+
> In most cases, the settings below do not need changed. Contact us for more information on any of these settings.
6550
6651
| Setting | Description | Allowed Values | Default Value |
67-
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------- |
52+
|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------|---------------|
6853
| `ner.timeout.sec` | Controls the timeout in seconds when performing name entity recognition. Longer text may require longer processing times. | An integer value | `600` |
6954
| `ner.max.idle.connections` | The maximum number of idle connections to maintain for the named entity recognition. More connections may improve performance in some cases. | An integer value. | `30` |
7055
| `ner.keep.alive.duration.ms` | The amount of time in milliseconds to keep named entity recognition connections alive. Longer text may require longer processing times. | An integer value. | `60` |

phileas-core/pom.xml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,5 @@
114114
<version>${junit.version}</version>
115115
<scope>test</scope>
116116
</dependency>
117-
<dependency>
118-
<groupId>com.github.codemonstur</groupId>
119-
<artifactId>embedded-redis</artifactId>
120-
<version>${embedded.redis.version}</version>
121-
<scope>test</scope>
122-
</dependency>
123117
</dependencies>
124118
</project>

phileas-core/src/main/java/ai/philterd/phileas/services/FilterPolicyLoader.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
import ai.philterd.phileas.model.policy.filters.Identifier;
2828
import ai.philterd.phileas.model.policy.filters.Section;
2929
import ai.philterd.phileas.model.services.AlertService;
30-
import ai.philterd.phileas.model.services.AnonymizationCacheService;
30+
import ai.philterd.phileas.model.services.CacheService;
3131
import ai.philterd.phileas.model.services.MetricsService;
3232
import ai.philterd.phileas.model.services.SpanValidator;
3333
import ai.philterd.phileas.services.anonymization.AgeAnonymizationService;
@@ -99,13 +99,13 @@ public class FilterPolicyLoader {
9999

100100
private static final Logger LOGGER = LogManager.getLogger(FilterPolicyLoader.class);
101101

102-
private final AnonymizationCacheService anonymizationCacheService;
102+
private final CacheService anonymizationCacheService;
103103
private final AlertService alertService;
104104
private final MetricsService metricsService;
105105
private final Map<String, DescriptiveStatistics> stats;
106106
private final PhileasConfiguration phileasConfiguration;
107107

108-
public FilterPolicyLoader(final AlertService alertService, final AnonymizationCacheService anonymizationCacheService,
108+
public FilterPolicyLoader(final AlertService alertService, final CacheService anonymizationCacheService,
109109
final MetricsService metricsService, final Map<String, DescriptiveStatistics> stats,
110110
final PhileasConfiguration phileasConfiguration) {
111111

phileas-core/src/main/java/ai/philterd/phileas/services/PhileasFilterService.java

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import ai.philterd.phileas.model.domain.LegalDomain;
2222
import ai.philterd.phileas.model.enums.FilterType;
2323
import ai.philterd.phileas.model.enums.MimeType;
24+
import ai.philterd.phileas.model.services.Classification;
2425
import ai.philterd.phileas.model.filter.Filter;
2526
import ai.philterd.phileas.model.objects.Explanation;
2627
import ai.philterd.phileas.model.objects.PdfRedactionOptions;
@@ -33,8 +34,7 @@
3334
import ai.philterd.phileas.model.responses.FilterResponse;
3435
import ai.philterd.phileas.model.serializers.PlaceholderDeserializer;
3536
import ai.philterd.phileas.model.services.AlertService;
36-
import ai.philterd.phileas.model.services.AnonymizationCacheService;
37-
import ai.philterd.phileas.model.services.Classification;
37+
import ai.philterd.phileas.model.services.CacheService;
3838
import ai.philterd.phileas.model.services.DocumentProcessor;
3939
import ai.philterd.phileas.model.services.FilterService;
4040
import ai.philterd.phileas.model.services.MetricsService;
@@ -45,8 +45,7 @@
4545
import ai.philterd.phileas.model.services.SplitService;
4646
import ai.philterd.phileas.processors.unstructured.UnstructuredDocumentProcessor;
4747
import ai.philterd.phileas.service.ai.sentiment.OpenNLPSentimentDetector;
48-
import ai.philterd.phileas.services.alerts.AlertServiceFactory;
49-
import ai.philterd.phileas.services.anonymization.cache.AnonymizationCacheServiceFactory;
48+
import ai.philterd.phileas.services.alerts.DefaultAlertService;
5049
import ai.philterd.phileas.services.disambiguation.VectorBasedSpanDisambiguationService;
5150
import ai.philterd.phileas.services.metrics.NoOpMetricsService;
5251
import ai.philterd.phileas.services.policies.InMemoryPolicyService;
@@ -98,11 +97,11 @@ public class PhileasFilterService implements FilterService {
9897
// PHL-223: Face recognition
9998
//private final ImageProcessor imageProcessor;
10099

101-
public PhileasFilterService(final PhileasConfiguration phileasConfiguration) throws IOException {
102-
this(phileasConfiguration, new NoOpMetricsService());
100+
public PhileasFilterService(final PhileasConfiguration phileasConfiguration, final CacheService cacheService) throws IOException {
101+
this(phileasConfiguration, new NoOpMetricsService(), cacheService);
103102
}
104103

105-
public PhileasFilterService(final PhileasConfiguration phileasConfiguration, final MetricsService metricsService) throws IOException {
104+
public PhileasFilterService(final PhileasConfiguration phileasConfiguration, final MetricsService metricsService, final CacheService cacheService) throws IOException {
106105

107106
LOGGER.info("Initializing Phileas engine.");
108107

@@ -112,23 +111,20 @@ public PhileasFilterService(final PhileasConfiguration phileasConfiguration, fin
112111
final Gson gson = new GsonBuilder().registerTypeAdapter(String.class, new PlaceholderDeserializer()).create();
113112

114113
// Set the policy services.
115-
this.policyService = buildPolicyService(phileasConfiguration);
114+
this.policyService = buildPolicyService(phileasConfiguration, cacheService);
116115
this.policyUtils = new PolicyUtils(policyService, gson);
117116

118-
// Set the anonymization cache service.
119-
final AnonymizationCacheService anonymizationCacheService = AnonymizationCacheServiceFactory.getAnonymizationCacheService(phileasConfiguration);
120-
121117
// Set the alert service.
122-
this.alertService = AlertServiceFactory.getAlertService(phileasConfiguration);
118+
this.alertService = new DefaultAlertService(cacheService);
123119

124120
// Instantiate the stats.
125-
Map<String, DescriptiveStatistics> stats = new HashMap<>();
121+
final Map<String, DescriptiveStatistics> stats = new HashMap<>();
126122

127123
// The filter loader for policies.
128-
this.filterPolicyLoader = new FilterPolicyLoader(alertService, anonymizationCacheService, metricsService, stats, phileasConfiguration);
124+
this.filterPolicyLoader = new FilterPolicyLoader(alertService, cacheService, metricsService, stats, phileasConfiguration);
129125

130126
// Create a new unstructured document processor.
131-
this.unstructuredDocumentProcessor = new UnstructuredDocumentProcessor(metricsService, new VectorBasedSpanDisambiguationService(phileasConfiguration));
127+
this.unstructuredDocumentProcessor = new UnstructuredDocumentProcessor(metricsService, new VectorBasedSpanDisambiguationService(phileasConfiguration, cacheService));
132128

133129
}
134130

@@ -367,12 +363,12 @@ public BinaryDocumentFilterResponse filter(final List<String> policyNames, final
367363

368364
}
369365

370-
private PolicyService buildPolicyService(final PhileasConfiguration phileasConfiguration) {
366+
private PolicyService buildPolicyService(final PhileasConfiguration phileasConfiguration, final CacheService cacheService) {
371367

372368
if(StringUtils.equalsIgnoreCase(phileasConfiguration.policyService(), "memory")) {
373369
return new InMemoryPolicyService();
374370
} else {
375-
return new LocalPolicyService(phileasConfiguration);
371+
return new LocalPolicyService(phileasConfiguration, cacheService);
376372
}
377373

378374
}

0 commit comments

Comments
 (0)