-
Notifications
You must be signed in to change notification settings - Fork 8
feat: implement posit workbench credentials strategy and make credentials strategy fallback options more explicit #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| """ | ||
| def __init__(self, | ||
| client: Optional[Client] = None, | ||
| user_session_token: Optional[str] = None, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """ | |
| def __init__(self, | |
| client: Optional[Client] = None, | |
| user_session_token: Optional[str] = None, | |
| ): | |
| """ | |
| def __init__(self, | |
| client: Optional[Client] = None, | |
| *, | |
| user_session_token: Optional[str] = None, | |
| content_session_token: Optional[str] = None, | |
| ): |
Would keep these explicitly keyword arguments and then allows the dev to pass the token with an appropriately named param.
|
(1). "Posit", "Credentials", "Strategy", "posit_strategy", "credentials_strategy", "connect_strategy", "PositCredentials", "PositConnectCredentials", "PositWorkbenchCredentials" ... |
|
(2). |
|
(3). |
|
(4) |
|
(5). |
|
(6). |
|
(7) here's where I completely lost the logic trail. |
| Shiny for Python example application that shows user information and | ||
| the first few rows from a table hosted in Databricks. | ||
| """ | ||
| session_token = session.http_conn.headers.get("Posit-Connect-User-Session-Token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit, and this applies to all the examples:
From the developer’s POV, I am in local/workbench first. It feels out of order to have the Connect-specific session token defined so early. My mental model is to get through the parts about defining how to handle creds in development vs deployment, then I’d start putting Connect-specific info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a constraint given the custom nature of this integration. The way it is shown here, this code can be written once and then adapt to the environment it is running it. But that means there is a lowest common denominator in terms of DX as a tradeoff. Ideally, devs are thinking about how their code would work in production as well if that is where they plan to deploy to.
Referencing your other comments though, if we had a way to grab the user session token for the dev if/when it is needed then that could be done behind the scenes at the appropriate time simplifying this immensely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posit_strategy = PositCredentialsStrategy(
local_strategy=databricks_cli,
workbench_strategy=PositWorkbenchCredentialsStrategy(Config(profile="workbench")),
connect_strategy=PositConnectCredentialsStrategy(user_session_token=session_token),
)could become:
posit_strategy = PositCredentialsStrategy()allowing for overrides to defaults, but otherwise it just grabs what it needs under the hood.
|
🪱 🥫 |
| from posit.connect.external.databricks import ( | ||
| PositCredentialsStrategy, | ||
| PositConnectCredentialsStrategy, | ||
| PositWorkbenchCredentialsStrategy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a passing observation: it's odd to see PositWorkbenchCredentialsStrategy inside the posit.connect module. They aren't really relevant for Connect, right? Is it time to create posit.workbench to contain these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move this before the PR becomes final. For now it's easier to iterate with everything inside a single module.
Great point! I think the trouble with doing this for people though is that its location is dependent on the app/framework they are using. Would be neat to automatically find it though, I agree. |
Any of the various |
We can do my best to hide some of this from the user but this is an artifact of the databricks SDK, not something we are imposing in our client.
Right. As Matt said, this is framework-dependent. We could consider adding a helper for each framework but that feels like it will lead to even more confusion like we see with
Good call. I think we can do
We can revisit this but if the choice is The way this is implemented at the moment, if
If you want to write content that only works on workbench then you don't need the More broadly, the word choices "CredentialsStrategy" and "CredentialsProvider" are constructs from the databricks-sdk, not something we came up with here. We tried to be consistent with their naming in the SDK to avoid confusion but we can call these things whatever we want.
This one is just a var name so an easy fix. How about we just call this
This is a major part of the friction with implementing these helpers. We are trying to make something easy to do for our users when some of these libraries (databricks-sdk and databricks-sql) aren't even compatible inside of Databricks' own ecosystem of tools. |
|
Oh an regarding the circular reference mentioned in |
d695abf to
3d0e55b
Compare
|
@kmasiello could you take another look at this when you have a minute? I've done some refactoring based on your feedback. One of the main issues we had before was that there was this circular dependency between a import streamlit as st
from databricks.sdk.core import ApiClient
from databricks.sdk.service.iam import CurrentUserAPI
from posit.connect.external.databricks import (
new_config,
ConnectStrategy,
)
session_token = st.context.headers.get("Posit-Connect-User-Session-Token")
cfg = new_config(
posit_connect_strategy=ConnectStrategy(
user_session_token=session_token
),
)
databricks_user = CurrentUserAPI(ApiClient(cfg)).me()
st.write(f"Hello, {databricks_user.display_name}!")Unfortunately we can't really get rid of the need to pass in the session_token arg when specifying the connect strategy using viewer auth but hopefully this is an improvement. If you want to use a Service Account oauth integration when running on Connect then the empty default configuration should be sufficient. This code should work locally, on workbench, and on Connect: import streamlit as st
from databricks.sdk.core import ApiClient
from databricks.sdk.service.iam import CurrentUserAPI
from posit.connect.external.databricks import new_config
cfg = new_config()
databricks_user = CurrentUserAPI(ApiClient(cfg)).me()
st.write(f"Hello, {databricks_user.display_name}!") |
d747ea0 to
349a888
Compare
47c15ca to
a291455
Compare
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
9cccf77 to
072472d
Compare
tdstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just a few minor questions.
| databricks_user_info = CurrentUserAPI(ApiClient(cfg())).me() | ||
| return f"Hello, {databricks_user_info.display_name}!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; extract variables to improve readability.
| databricks_user_info = CurrentUserAPI(ApiClient(cfg())).me() | |
| return f"Hello, {databricks_user_info.display_name}!" | |
| current_user_api = CurrentUserAPI(ApiClient(cfg())) | |
| databricks_user_info = current_user_api.me() | |
| return f"Hello, {databricks_user_info.display_name}!" |
|
|
||
|
|
||
| class PositContentCredentialsProvider: | ||
| class _PositConnectContentCredentialsProvider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this inherit from CredentialsProvider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CredentialsProvider is a type not a base class
CredentialsProvider = Callable[[], Dict[str, str]]| user_session_token: Optional[str] = None, | ||
| ): | ||
| self._local_strategy = local_strategy | ||
| self._cp = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; add a type signature for clarity on the non-null state.
| self._cp = None | |
| self._cp: CredentialsProvider | None = None |
| @abc.abstractmethod | ||
| def __call__(self, *args, **kwargs) -> CredentialsProvider: | ||
| raise NotImplementedError | ||
| logger = logging.getLogger("posit.sdk") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we can use logger = logging.getLogger(__name__) here. Which will be equivalent to getLogger("posit.sdk.external.databricks").
https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial
|
This still feels like a lot of complexity for an app developer to have to wrestle with (or, more likely, just cargo cult): What if instead it were and everything else were set by defaults in the function signature? Seems totally reasonable to use conventions for env vars for defaults, and as for |
Agreed. You have all the building blocks now. Applying sensible defaults to |
That's basically how this is implemented with a minor difference. The example in the PR description shows how to override the default values. The env vars also also already handled the way that you describe, but it's useful to show them in our docs because if they aren't set then you get a rather opaque error message from Databricks SDK/SQL. The minimal viewer auth example would be: return databricks_config(
posit_connect_strategy=ConnectStrategy(user_session_token=session_token)
)And the minimal service account auth example would be: return databricks_config()However note that the minimal examples would not work when Workbench-managed Databricks credentials are used. You have to enable it by passing `WorkbenchStrategy(). There's some awkwardness with how the Databricks Config object is initialized. |
Databricks sdk helpers now include a helper `databricks_config()` which allows for more fine grained control over how the credentials_strategy is selected in content that needs to run in multiple environments (Workbench, Connect, local laptop). We now do our best to "discover" the user's environment before selecting a strategy. If no strategy was provided for the current environment then we use the Databricks SDK's `DefaultCredentials()` class as a fallback option. This is a breaking change but should simplify the API considerably for users.
3a5027a to
6d5cfe5
Compare
|
@tdstein FYI - I've moved the workbench credentials helper into a new module Planning to merge this later today and update the release notes for the next release. |
These changes have been reviewed at #384 --------- Co-authored-by: David <[email protected]>

Closes #382
These changes make the separation between the different credential strategies more explicit and tries to be better about detecting the difference between local and workbench content.
It also combines the
ContentandViewerstrategy into a single new class:ConnectStrategy. Ifuser-session-tokenis provided then we use the viewer implementation. If not then we should fall back to the content credentials (service account) implementation.I also modified some of the naming in the hopes of the choice of which strategy to use more obvious to end users of these helpers.
This example shows how to construct a Databricks SDK
Configthat is compatible with:Testing
This needs QA in Connect and Workbench if possible.
Workbench should test using the
WorkbenchStrategy()with Workbench-managed credentials.Connect should test with
ConnectStrategy()with both a Viewer and a Service Account OAuth integration.It would be nice to test the
azure_service_principalandoauth_service_principalstrategies as a replacement for the deprecatedPositLocalContentCredentialsProviderhelper that was removed.