Skip to content

Limit user fields which are indexed by default #408

@kadamwhite

Description

@kadamwhite

On a large multisite network, we are encountering many errors of this type when indexing users:

{
    "index": {
        "_index": "ep-sitename-user",
        "_type": "_doc",
        "_id": "338",
        "status": 400,
        "error": {
            "type": "illegal_argument_exception",
            "reason": "Limit of total fields [5000] has been exceeded"
        }
    }
}

We have manually excluded some fields using the ep_prepare_user_meta_data hook, but are still running into this error. This ticket proposes further limiting the fields which get indexed by default, so that it is less likely that user indexing will fail on a large network.

Specifically, I wonder whether these fields need to be indexed:

  • hm.workflows.*
  • metaboxhidden_*
  • meta-box-order_*
  • wp_##_capabilities
  • wp_##_user_level
  • wp_##_user_settings
  • wp_##_user-settings-time
  • wp_##_media_library_mode
  • wp_##_yoast_notifications
  • wp_##_dashboard_quick_press_last_post_id
  • Actually, any wp_##_... field
  • Possibly more...?

@rmccue suggested the roles and capabilities fields may be needed for user queries, although we weren't sure off the top whether Elasticpress involves itself in the normal queries Core does. But these fields represent a significant percentage of the meta stored against a user, and as a layperson I struggle to imagine how I'd want to search for somebody based on these values.

Acceptance crtieria:

  • Check field mapping for the user index on 3 of the larger client sites and note down the most common patterns that make up the bulk of the keys - use wp elasticpress get-mapping to find this
  • Ensure we are excluding specific meta key patterns from being indexed, at a least:
    • hm.workflows.*
    • any meta keys that include the user ID
    • any key patterns found from step 1 that account for more unique key names than the number of sites on a multisite network
  • Document how to filter user meta keys from being indexed
  • Document the error message noted above under an FAQs or troubleshooting section (add this if it doesn't exist) with the possible mitigations e.g. using filters to exclude certain meta keys and how to find common meta key patterns

Metadata

Metadata

Assignees

No one assigned

    Labels

    should haveShould be done, medium priority for now

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions