Skip to content

Comments

feat: Add support for /decide?v=4 along with start capturing $feature_flag_called events#22

Merged
haacked merged 31 commits intov1from
haacked/decide-v4
Apr 17, 2025
Merged

feat: Add support for /decide?v=4 along with start capturing $feature_flag_called events#22
haacked merged 31 commits intov1from
haacked/decide-v4

Conversation

@haacked
Copy link
Contributor

@haacked haacked commented Apr 17, 2025

💡 Motivation and Context

Prior to this PR, this library didn't capture $feature_flag_called events. Now it does it with the additional metadata that /decide?v=4 provides (see PostHog/posthog#29751). Similar to how it's done in PostHog/posthog-js-lite#427.

Updates to the $feature_flag_called event

When capturing the $feature_flag_called event, additional information are now captured:

The end result is $feature_flag_called events will have additional properties:

Property Name Descirption
$feature_flag_version The version of the feature flag. This is visible in the diffs of the feature flag activity log.
$feature_flag_reason The reason the feature flag matched or didn't match.
$feature_flag_id The database id of the feature flag (Just in case a flag was deleted and then recreated with the same key, we can determine the difference.
$feature_flag_request_id The id of the /decide request.

Also, we use an LRU to ensure that we don't raise the $feature_flag_called event twice for the same distinct_id and flag key combination, the same way posthog-python and other libraries do it.

Backwards compatibility:

The changes are all backwards compatible with /decide?v=3.

💚 How did you test it?

Unit tests and manual tests.

@haacked haacked requested review from Copilot, dmarticus and rafaeelaudibert and removed request for rafaeelaudibert April 17, 2025 02:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for utilizing the new /decide?v=4 endpoint and enhances the feature flag capture mechanism by renaming the value property to payload and including additional metadata on the $feature_flag_called event.

  • Updated documentation examples to use "payload" instead of "value" for the feature flag's return object
  • Added instructions for code formatting via bin/fmt in the main README

Reviewed Changes

Copilot reviewed 4 out of 15 changed files in this pull request and generated no comments.

File Description
examples/feature_flag_demo/README.md Updated feature flag demo to show "Payload: true" instead of "Value: true"
README.md Revised code examples to reflect the change from "value" to "payload" and added a code formatting section
Files not reviewed (11)
  • examples/feature_flag_demo/lib/feature_flag_demo.ex: Language not supported
  • lib/posthog.ex: Language not supported
  • lib/posthog/application.ex: Language not supported
  • lib/posthog/client.ex: Language not supported
  • lib/posthog/feature_flag.ex: Language not supported
  • mix.exs: Language not supported
  • test/posthog/client_test.exs: Language not supported
  • test/posthog_test.exs: Language not supported
  • test/support/fixtures/decide-v3.json: Language not supported
  • test/support/fixtures/decide.json: Language not supported
  • test/support/hackney_stub.ex: Language not supported
Comments suppressed due to low confidence (2)

README.md:112

  • [nitpick] Ensure consistency in property naming across the documentation. The example now uses 'payload' instead of 'value', so please confirm that related documentation (such as the event schema table in the PR description) is updated accordingly.
# Returns: %Posthog.FeatureFlag{name: "new-dashboard", payload: true, enabled: true}

README.md:118

  • [nitpick] Ensure naming consistency for the feature flag return value. The change from 'value' to 'payload' should be reflected across all documentation and examples to avoid confusion.
#   payload: %{"price" => 99, "period" => "monthly"},

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

Hmm, the CI server is reporting:

Run mix format --check-formatted
** (Mix) mix format failed due to --check-formatted.
The following files are not formatted:

  * test/posthog_test.exs

But when I run mix format --check-formatted locally, it doesn't report anything.

Going to try using an older version of elixir on my local machine to see if it fixes the formatting issue.

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

Ok, formatting issues are fixed, but some of the tests for older versions of elixir are broken. How far back do we need to go?

@rafaeelaudibert
Copy link
Member

Ok, formatting issues are fixed, but some of the tests for older versions of elixir are broken. How far back do we need to go?

1.12 is reasonable, I think we should keep it. Tests are failing because you're using Process. Can you something else rather than that?

@rafaeelaudibert
Copy link
Member

About formatting, we should make sure the CI is only checking formatting on the most recent version

children = [
# Start Cachex for feature flag event deduplication.
# The 50,000 entries limit is the same used for posthog-python, but otherwise arbitrary.
{Cachex, name: :posthog_feature_flag_cache, limit: 50_000, policy: Cachex.Policy.LRW}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense? Do we not wanna send $feature_flag_called in case we have this in the cache? This lasts across requests - it's a separate process.

Wont this mean that users with less than 50000 clients wont ever have more than 50000 FFs charged to them? (Users can make these supervisors survive servers restarts, Elixir is crazy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's the combination of distinct_id and feature_flag. So yeah, if they only have 1 flag and less than 50,000 distinct users, they'd never be charged for more than 50,000.

I took this approach from posthog-python. Many of the other clients do something similar. It seems to me that we should put a time limit on each cache entry. Something like one minute. @dmarticus, do you have context on why we do this in posthog-python?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the hash includes the feature_flag too, so it's a slightly smaller problem.
The problem is that servers probably restart pretty often in Python; that isn't necessarily true in Elixir: you can keep the Cachex process running indefinitely, so... it might fill the cache pretty quickly.

Will leave that decision up to y'all, just letting you know the quirkyness of working with a distributed language :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing to consider, we don't charge based on the number of $feature_flag_called events, we charge based on the number of /decide calls made. So this cache doesn't affect the cost. Just reporting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know that. In that case, I'm fine with keeping it this way. I'm working on improving the setup, I don't think the LRU we added actually works how we think, expect changes to this PR through the day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I worry about with such a big cache is people being confused why they don't see these events. My gut tells me the reason for this is if they're calling a feature flag in a tight loop, we don't want 10 of the same events all of a sudden.

So I'd suggest we add a default_ttl to the Cachex declaration. Something like 5 minutes would be fine. The 50,000 limit then just ensures we don't ever take up too much memory.

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

Tests are failing because you're using Process

Any suggestions?

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

Any suggestions?

Hmm, I have a GenServer approach that seems to work.

@rafaeelaudibert
Copy link
Member

@haacked I'm taking a stab at this. I dont think it was the Process thing, but rather Mimic acting poorly on older Elixir versions

Elixir has been improving their typesystem, and if we run these doctests with more recent Elixir (1.18+) then it fails at the type level - which is pretty cool, actually!
Our CI matrix is much more comprehensible of Elixir's actual support matrix, at the cost of being more complex. I think it's a fair tradeoff.
erlef/setup-beam@v1 doesn't support OTP 23
@rafaeelaudibert rafaeelaudibert changed the base branch from master to v1 April 17, 2025 15:38
@rafaeelaudibert
Copy link
Member

@haacked I've changed this to point to a new v1 branch. How soon do you want this to be merged? I can work on getting the rest of the v1 stuff over the line today

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

How soon do you want this to be merged? I can work on getting the rest of the v1 stuff over the line today

I'm not in any hurry. I appreciate the help getting this over the line!

Copy link
Member

@rafaeelaudibert rafaeelaudibert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haacked This should be good now! I've pointed it to a v1 branch where I'm working on more stuff. Take a look at my recent changes and let me know! Feel free to merge whenever you're ready

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

@rafaeelaudibert what do you think about giving cache entries a ttl? I think Cachex supports that?

@rafaeelaudibert
Copy link
Member

@rafaeelaudibert what do you think about giving cache entries a ttl? I think Cachex supports that?

Up to you, really. I particularly think it's to your best interest to keep implementation across SDKs the most similar possible, but if you believe a TTL would be benefitial here, yeah, we can do it

https://hexdocs.pm/cachex/expiring-records.html

@haacked
Copy link
Contributor Author

haacked commented Apr 17, 2025

Up to you, really. I particularly think it's to your best interest to keep implementation across SDKs the most similar possible

Good point. I'll leave as-is. If folks complain, we can revisit.

I also pushed some changes to get the sample app working again. Look good to you?

@haacked haacked merged commit ecfabf1 into v1 Apr 17, 2025
16 checks passed
@haacked haacked deleted the haacked/decide-v4 branch April 17, 2025 18:47
@rafaeelaudibert
Copy link
Member

@haacked Did you need that new Application for it to work? I though the existing extra_applications: [:logger, :posthog], would be enough for it to automatically start our Posthog.Application. It worked just fine for me. It shouldn't hurt, though.

@haacked
Copy link
Contributor Author

haacked commented Apr 18, 2025

Here's the error I get when I run the demo app without the new Application:

mix run run.exs --flag "simple-test" --distinct-id "user12345"
==> uniq
Compiling 4 files (.ex)
Generated uniq app
==> posthog
Compiling 8 files (.ex)
Generated posthog app
==> feature_flag_demo
Generated feature_flag_demo app

17:00:12.403 [notice] Application feature_flag_demo exited: exited in: FeatureFlagDemo.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function FeatureFlagDemo.Application.start/2 is undefined (module FeatureFlagDemo.Application is not available)
            FeatureFlagDemo.Application.start(:normal, [])
            (kernel 10.2.6) application_master.erl:295: :application_master.start_it_old/4

17:00:12.407 [notice] Application posthog exited: :stopped

17:00:12.407 [notice] Application cachex exited: :stopped

17:00:12.407 [notice] Application sleeplocks exited: :stopped

17:00:12.407 [notice] Application jumper exited: :stopped

17:00:12.408 [notice] Application ex_hash_ring exited: :stopped

17:00:12.408 [notice] Application eternal exited: :stopped

17:00:12.408 [notice] Application unsafe exited: :stopped

17:00:12.408 [notice] Application jason exited: :stopped

17:00:12.408 [notice] Application hackney exited: :stopped

17:00:12.408 [notice] Application metrics exited: :stopped

17:00:12.408 [notice] Application ssl_verify_fun exited: :stopped

17:00:12.408 [notice] Application parse_trans exited: :stopped

17:00:12.408 [notice] Application syntax_tools exited: :stopped

17:00:12.408 [notice] Application certifi exited: :stopped

17:00:12.408 [notice] Application mimerl exited: :stopped

17:00:12.408 [notice] Application idna exited: :stopped

17:00:12.408 [notice] Application unicode_util_compat exited: :stopped
** (Mix) Could not start application feature_flag_demo: exited in: FeatureFlagDemo.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function FeatureFlagDemo.Application.start/2 is undefined (module FeatureFlagDemo.Application is not available)
            FeatureFlagDemo.Application.start(:normal, [])
            (kernel 10.2.6) application_master.erl:295: :application_master.start_it_old/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants