-
Notifications
You must be signed in to change notification settings - Fork 516
Session refactor #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Make session management explicitly managed by callers to `access_url`. This enables each caller to protect the session from threading issues in a way that best matches their particular model. For Snowflakerestful this is managed as a pool of sessions; For chunk_downloader I will implement something similar that provides for good reuse and cleanup of each session while achieving as much connection pooling as can be done safely.
Fairly large refactor in the spirit of consolidating HTTP call-patterns and configuration. Changed `SnowflakeRestful.access_url` from a staticmethod to a regular instance method, reborn as a `fetch`. The name change helps distinguish the type of activity being performed, but is otherwise cosmetic. Requests session pooling is manged in one place by the SnowflakeRestful instance regardless of the type of HTTP activity. This provides the best session reuse possible with consistent thread safety characteristics. Note that the `requests.Session` objects are created outside their final execution thread, but they are never used by more than one request thread at a time. Other changes: - Moved network.py logger instance to module level. - DRY optimizations for _get_request, _post_request, etc. Namely consolidation of proxy and timeout settings. - Default `fetch`'s keyword arg `token` to `network.NOTOKEN`. I believe this is semantically acceptable (or desirable) but should be reviewed. - Renamed `chunk_downloader`'s `_get_request` to `_fetch_chunk` to more aptly associate it with `SnowflakeRestful.fetch()`. - Renamed `request_thread` function to `request_exec` as it is not always a thread target.
Thanks @mayfield! I'm OOO this week, so I'll check out by next week. |
network.py
Outdated
self._session = None | ||
sessions = list(self._active_sessions) | ||
if sessions: | ||
self.logger.warn("Closing %d active sessions" % len(sessions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use parameterized logging, e.g., self.logger.warn("Closing %s active sessions", len(sessions))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only avoid that syntax because it only works for logging functions and I sometimes switch back and forth with other printers like print(). But I can use printf style formatting from now on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. I'll fix logger and syntax error later.
s = requests.Session() | ||
s.mount(u'http://', HTTPAdapter(max_retries=REQUESTS_RETRY)) | ||
s.mount(u'https://', HTTPAdapter(max_retries=REQUESTS_RETRY)) | ||
s._reuse_count = itertools.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be used later or just monitoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was using it before these commits were pushed. I'm probably going to implement session garbage collection and may use reuse_count as a mechanism for deciding when to close out old sessions. But if not I'll yank before next PR. Thanks!
request_thread_timeout = 60 # one request thread timeout | ||
|
||
def request_thread(result_queue): | ||
def fetch(self, method, full_url, headers, *, data=None, timeout=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes an exception in python 2.7.
Traceback (most recent call last):
File "/home/stakeda/basic_test.py", line 13, in <module>
import snowflake.connector
File "/home/stakeda/Snowflake/trunk/Python/pvenv_2.7/lib/python2.7/site-packages/snowflake/connector/__init__.py", line 21, in <module>
from .connection import SnowflakeConnection
File "/home/stakeda/Snowflake/trunk/Python/pvenv_2.7/lib/python2.7/site-packages/snowflake/connector/connection.py", line 14, in <module>
from . import network
File "/home/stakeda/Snowflake/trunk/Python/pvenv_2.7/lib/python2.7/site-packages/snowflake/connector/network.py", line 607
def fetch(self, method, full_url, headers, *, data=None, timeout=None, **kwargs):
^
SyntaxError: invalid syntax
Maybe just remove *
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I was pretty sure I tested that particular signature on 2.7, but I do recall that keyword only arguments handling changed in py3k. (Disclaimer: I rarely use python2)
* Thread safety refactor of http session management. Make session management explicitly managed by callers to `access_url`. This enables each caller to protect the session from threading issues in a way that best matches their particular model. For Snowflakerestful this is managed as a pool of sessions; For chunk_downloader I will implement something similar that provides for good reuse and cleanup of each session while achieving as much connection pooling as can be done safely. * Session consolidation Fairly large refactor in the spirit of consolidating HTTP call-patterns and configuration. Changed `SnowflakeRestful.access_url` from a staticmethod to a regular instance method, reborn as a `fetch`. The name change helps distinguish the type of activity being performed, but is otherwise cosmetic. Requests session pooling is manged in one place by the SnowflakeRestful instance regardless of the type of HTTP activity. This provides the best session reuse possible with consistent thread safety characteristics. Note that the `requests.Session` objects are created outside their final execution thread, but they are never used by more than one request thread at a time. Other changes: - Moved network.py logger instance to module level. - DRY optimizations for _get_request, _post_request, etc. Namely consolidation of proxy and timeout settings. - Default `fetch`'s keyword arg `token` to `network.NOTOKEN`. I believe this is semantically acceptable (or desirable) but should be reviewed. - Renamed `chunk_downloader`'s `_get_request` to `_fetch_chunk` to more aptly associate it with `SnowflakeRestful.fetch()`. - Renamed `request_thread` function to `request_exec` as it is not always a thread target.
This is a review seeking PR for @smtakeda (and @blandonnimrat)
It does test well for me on python3.5 and python2.7 but I would consider it as part of ongoing work to increase download concurrency safely. Please feel free to provide feedback and criticism here and/or merge if you are happy enough with the intent.
I took some liberty in renaming existing mechanisms where it helped me make sense of the various pieces falling together but I can backtrack on them if they're undesired (named
access_url
->fetch
).There is more overview in the commit logs as well.