Embedding data causing cache failures

### Describe the bug

We have a handful of Features that can be powered by embeddings generated by an LLM. These embeddings are currently stored in either post meta or term meta and then used to run comparisons.

It's a known issue that this doesn't scale very well, as running these comparisons within WordPress starts to slow down significantly once you have hundreds or thousands of items. We probably haven't done a good enough job of making that limitation known though.

But another issue that came up recently is that this embedding data can get quite large. The way this currently works is we take the content of an item (say a post) and we break that down into smaller chunks. Each chunk is then sent to the LLM to generate embeddings and each of those embeddings are then stored together under a single meta key.

For long content, this data can easily get over 1MB. WordPress has some built-in functionality that in certain situations (like when running `get_posts` or `get_post_meta`), it will run a database query to get all meta for that item and store that in the cache, with the idea that this will make any subsequent requests for this data faster.

The problem here is this means in certain situations, this embedding data gets pulled into the cache and it can easily be large enough to overwhelm the cache size limit, which then forces all cached data to be purged. For sites with lots of traffic, this can lead to performance issues as more requests need to make database queries to get the data they need.

### Approaches

I think there are two approaches we should look at implementing here:

1. For any Feature that uses embeddings that doesn't currently support storing those in elasticsearch, add that functionality (Classification and Recommended Content)
2. For sites that don't have access to elasticsearch, add a new database table to store embeddings instead of using the meta tables

### Elasticsearch

Right now, the Smart 404 and Term Cleanup Features can take advantage of elasticsearch (through ElasticPress) to store and query embeddings. This leads to significant performance improvements on the query side and does mean we don't need to store the data in the meta tables, fixing the issue described above.

We should look to bring this same functionality to all other Features that use embeddings, as well as adjust the current approach to only store in elasticsearch (right now, those two existing Features will store in both places).

### New DB table

In addition to the above, we should look at introducing a new database table, designed for this embedding data. This prevents the problem discussed above and also allows us to design this table specifically to handle embeddings, whereas right now the meta tables are set to handle lots of data types. This will likely lead to better performing queries but will take some experimentation on how best to structure this (I would start by looking at https://github.com/Jameswlepage/wpvdb and seeing if there's things there we can use/learn from). Will also need to consider backwards compat here, if we should look to migrate existing embedding data from meta tables to this new table.

I would recommend we tackle this part first and then the elasticsearch part second, as I think this has more applicable use cases.

### Steps to Reproduce

1. Enable a Feature that uses embeddings
2. Create a long post and trigger embedding generation for that
3. View in your database the size of the `classifai_openai_embeddings` post meta item
4. If desired, set up an environment that has caching enabled and see how the above impacts that

### Screenshots, screen recording, code snippet

_No response_

### Environment information

_No response_

### WordPress information

_No response_

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Embedding data causing cache failures #975

Describe the bug

Approaches

Elasticsearch

New DB table

Steps to Reproduce

Screenshots, screen recording, code snippet

Environment information

WordPress information

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Embedding data causing cache failures #975

Description

Describe the bug

Approaches

Elasticsearch

New DB table

Steps to Reproduce

Screenshots, screen recording, code snippet

Environment information

WordPress information

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions