Skip to content

Commit 181f214

Browse files
committed
Tests and documentation
Also removes duplicate records before updating cache
1 parent bd0c58f commit 181f214

File tree

7 files changed

+74
-154
lines changed

7 files changed

+74
-154
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ Also adds a new ALFPath class to replace alf path functions and now returns UUID
4141
- one.alf.io.remove_uuid_recursive
4242
- one.util.ensure_list; use iblutil.util.ensure_list instead
4343
- one.remote.globus.create_globus_client; use one.remote.globus.Globus class instead
44-
- 'auto' cache mode has been removed
44+
- 'auto' and 'refresh' cache modes have been removed
45+
- One.refresh_cache
4546

4647
## [2.11.1]
4748

docs/FAQ.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,12 @@ context manager when calling ONE list, search and load methods. You can pass the
2828
argument AlyxClient: `one.alyx.rest(..., no_cache=True)`. More information can be found
2929
[here](https://int-brain-lab.github.io/ONE/notebooks/one_modes.html#REST-caching).
3030

31-
**Auto mode**
32-
Remote cache tables are downloaded used when ONE is in 'auto' mode (or when `query_type='auto'` is passed).
31+
**Local mode**
32+
Local cache tables may be used when ONE is in 'local' mode (or when `query_type='local'` is passed).
3333
These table contain info about all sessions and their associated datasets and is used instead of querying
3434
the database.
35-
For the IBL Alyx, the tables are generated every 6 hours, however by default ONE will only download a
36-
new cache once per day. To force a download you can run `ONE().refresh_cache('remote')`. More
37-
information, including increasing refresh frequency, can be found
35+
For the IBL Alyx, the tables are generated every 6 hours and can be downloaded using the `one.load_cache` method
36+
to only download when new data are available. More information, including increasing refresh frequency, can be found
3837
[here](https://int-brain-lab.github.io/ONE/notebooks/one_modes.html#Refreshing-the-cache).
3938

4039
## I made a mistake during setup and now can't call setup, how do I fix it?
@@ -83,7 +82,7 @@ one = ONE(tables_dir=Path.home() / 'tables_dir')
8382
# 2. Specify location after instantiation
8483
one.load_cache(Path.home() / 'tables_dir')
8584
```
86-
**Note**: Avoid using the same location for different database cache tables: by default ONE will automatically overwrite tables when a newer version is available. To avoid automatic downloading, set `mode='local'`.
85+
**Note**: Avoid using the same location for different database cache tables: ONE will overwrite tables when `load_cache` is called in remote mode.
8786

8887
## How do check who I'm logged in as?
8988
```python

docs/notebooks/one_modes.ipynb

Lines changed: 6 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,18 @@
6969
"source": [
7070
"## Query modes\n",
7171
"In 'local' mode, the list, search and load methods will use the local cache tables and not\n",
72-
"connect to Alyx, however the option to use Alyx is always there when a database is provided. \n",
72+
"connect to Alyx, however the option to use Alyx is always there when a database is provided.\n",
73+
"When instantiating ONE in local mode, any cache tables on disk are loaded.\n",
7374
"\n",
7475
"If 'remote' mode is specified, ONE will only query the remote database and will not use the\n",
7576
"local cache tables. Avoiding the database whenever possible is recommended as it doesn't rely\n",
7677
"on a stable internet connection and reduces the load on the remote database.\n",
7778
"\n",
7879
"While in 'remote' mode, the local cache may be used by providing the query_type='local' keyword\n",
79-
"argument to any method. Likewise, in 'local' mode, a remote query can be made by\n",
80-
"specifying `query_type='remote'`\n",
80+
"argument to any method. This will then search based on the results of previous remote queries.\n",
81+
"Likewise, in 'local' mode, a remote query can be made by specifying `query_type='remote'` (if a\n",
82+
"database has been configured). The local cache tables will then be supplemented with the result\n",
83+
"of this remote query. \n",
8184
"\n",
8285
"<div class=\"alert alert-info\">\n",
8386
"NB: The 'remote' query type is not valid in offline mode as there is no database associated to\n",
@@ -232,53 +235,6 @@
232235
"one = ONE(cache_rest=None, mode='remote')"
233236
]
234237
},
235-
{
236-
"cell_type": "markdown",
237-
"metadata": {
238-
"collapsed": false,
239-
"pycharm": {
240-
"name": "#%% md\n"
241-
}
242-
},
243-
"source": [
244-
"## Refreshing the cache tables\n",
245-
"While in 'auto' mode, ONE will try to update the cache tables once every 24 hours.\n",
246-
"This can be changed by changing the 'cache_expiry' property:"
247-
]
248-
},
249-
{
250-
"cell_type": "code",
251-
"execution_count": 4,
252-
"metadata": {
253-
"collapsed": false,
254-
"pycharm": {
255-
"name": "#%%\n"
256-
}
257-
},
258-
"outputs": [
259-
{
260-
"data": {
261-
"text/plain": [
262-
"{'expired': False,\n",
263-
" 'created_time': datetime.datetime(2021, 9, 14, 13, 0),\n",
264-
" 'loaded_time': datetime.datetime(2021, 9, 14, 18, 15, 54, 384591),\n",
265-
" 'raw': {'datasets': {'date_created': '2021-09-14 13:00', 'origin': 'alyx'},\n",
266-
" 'sessions': {'date_created': '2021-09-14 13:00', 'origin': 'alyx'}}}"
267-
]
268-
},
269-
"execution_count": 4,
270-
"metadata": {},
271-
"output_type": "execute_result"
272-
}
273-
],
274-
"source": [
275-
"from datetime import timedelta\n",
276-
"one.cache_expiry = timedelta(hours=1) # Check for new remote cache every hour\n",
277-
"\n",
278-
"# The time when the cache was generated can be found in the cache metadata:\n",
279-
"one._cache._meta"
280-
]
281-
},
282238
{
283239
"cell_type": "markdown",
284240
"metadata": {
@@ -292,21 +248,6 @@
292248
"download. The cache can be explicitly refreshed in two ways:"
293249
]
294250
},
295-
{
296-
"cell_type": "code",
297-
"execution_count": 5,
298-
"metadata": {
299-
"collapsed": false,
300-
"pycharm": {
301-
"name": "#%%\n"
302-
}
303-
},
304-
"outputs": [],
305-
"source": [
306-
"loaded_time = one.refresh_cache('refresh') # Explicitly refresh the cache\n",
307-
"eids = one.search(lab='cortexlab', query_type='refresh') # Calls `refresh_cache` before searching"
308-
]
309-
},
310251
{
311252
"cell_type": "markdown",
312253
"metadata": {

one/api.py

Lines changed: 20 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,12 @@ def __init__(self, cache_dir=None, mode='local', wildcards=True, tables_dir=None
8787
self.uuid_filenames = False
8888
# init the cache file
8989
self._reset_cache()
90-
self.load_cache()
90+
if self.mode == 'local':
91+
# Ensure that we don't call any subclass method here as we only load local cache
92+
# tables on init. Direct calls to load_cache can be made by the user or subclass.
93+
One.load_cache(self)
94+
elif self.mode != 'remote':
95+
raise ValueError(f'Mode "{self.mode}" not recognized')
9196

9297
def __repr__(self):
9398
return f'One ({"off" if self.offline else "on"}line, {self.cache_dir})'
@@ -237,39 +242,6 @@ def _save_cache(self, save_dir=None, force=False):
237242
_logger.debug(f'Saved {filename}')
238243
meta['saved_time'] = datetime.now()
239244

240-
def refresh_cache(self, mode=None):
241-
"""Check and reload cache tables.
242-
243-
Parameters
244-
----------
245-
mode : {'local', 'refresh', 'remote'}
246-
Options are 'local' (reload if expired); 'refresh' (reload); 'remote' (don't reload).
247-
248-
Returns
249-
-------
250-
datetime.datetime
251-
Loaded timestamp.
252-
253-
"""
254-
# NB: Currently modified table will be lost if called with 'refresh';
255-
# May be instances where modified cache is saved then immediately replaced with a new
256-
# remote cache. Also it's too slow :(
257-
# self.save_cache() # Save cache if modified
258-
mode = mode or self.mode
259-
if mode == 'remote':
260-
pass
261-
elif mode == 'local':
262-
loaded_time = self._cache['_meta']['loaded_time']
263-
if not loaded_time or (datetime.now() - loaded_time >= self.cache_expiry):
264-
_logger.info('Cache expired, refreshing')
265-
self.load_cache()
266-
elif mode == 'refresh':
267-
_logger.debug('Forcing reload of cache')
268-
self.load_cache(clobber=True)
269-
else:
270-
raise ValueError(f'Unknown refresh type "{mode}"')
271-
return self._cache['_meta']['loaded_time']
272-
273245
def _update_cache_from_records(self, strict=False, **kwargs):
274246
"""Update the cache tables with new records.
275247
@@ -309,6 +281,8 @@ def _update_cache_from_records(self, strict=False, **kwargs):
309281
if isinstance(records, pd.Series):
310282
records = pd.DataFrame([records])
311283
records.index.set_names(self._cache[table].index.names, inplace=True)
284+
# Drop duplicate indices
285+
records = records[~records.index.duplicated(keep='first')]
312286
if not strict:
313287
# Deal with case where there are extra columns in the cache
314288
extra_columns = list(set(self._cache[table].columns) - set(records.columns))
@@ -323,7 +297,7 @@ def _update_cache_from_records(self, strict=False, **kwargs):
323297
records.insert(n, col, val)
324298
# Drop any extra columns in the records that aren't in cache table
325299
to_drop = set(records.columns) - set(self._cache[table].columns)
326-
records.drop(to_drop, axis=1, inplace=True)
300+
records = records.drop(to_drop, axis=1)
327301
records = records.reindex(columns=self._cache[table].columns)
328302
assert set(self._cache[table].columns) == set(records.columns)
329303
records = records.astype(self._cache[table].dtypes)
@@ -1767,8 +1741,6 @@ def load_cache(self, tables_dir=None, clobber=False, tag=None):
17671741
location and creation date of the remote cache. If newer, it will be download and
17681742
loaded.
17691743
1770-
Note: Unlike refresh_cache, this will always reload the local files at least once.
1771-
17721744
Parameters
17731745
----------
17741746
tables_dir : str, pathlib.Path
@@ -1802,7 +1774,7 @@ def load_cache(self, tables_dir=None, clobber=False, tag=None):
18021774
if not (clobber or different_tag):
18031775
super(OneAlyx, self).load_cache(tables_dir) # Load any present cache
18041776
expired = self._cache and (cache_meta := self._cache.get('_meta', {}))['expired']
1805-
if not expired or self.mode in {'local', 'remote'}:
1777+
if not expired:
18061778
return
18071779

18081780
# Warn user if expired
@@ -2598,9 +2570,10 @@ def _dset2url(self, dset, update_cache=True):
25982570
if isinstance(dset, str) and dset.startswith('http'):
25992571
url = dset
26002572
elif isinstance(dset, (str, Path)):
2601-
url = self.path2url(dset)
2602-
if not url:
2603-
_logger.warning(f'Dataset {dset} not found in cache')
2573+
try:
2574+
url = self.path2url(dset)
2575+
except alferr.ALFObjectNotFound:
2576+
_logger.warning(f'Dataset {dset} not found')
26042577
return
26052578
elif isinstance(dset, (list, tuple)):
26062579
dset2url = partial(self._dset2url, update_cache=update_cache)
@@ -2914,7 +2887,12 @@ def path2url(self, filepath, query_type=None) -> str:
29142887
return super(OneAlyx, self).path2url(filepath)
29152888
eid = self.path2eid(filepath)
29162889
try:
2917-
dataset, = self.alyx.rest('datasets', 'list', session=eid, name=Path(filepath).name)
2890+
params = {'name': Path(filepath).name}
2891+
if eid is None:
2892+
params['django'] = 'session__isnull,True'
2893+
else:
2894+
params['session'] = str(eid)
2895+
dataset, = self.alyx.rest('datasets', 'list', **params)
29182896
return next(
29192897
r['data_url'] for r in dataset['file_records'] if r['data_url'] and r['exists'])
29202898
except (ValueError, StopIteration):

one/tests/test_converters.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,7 @@ class TestOnlineConverters(unittest.TestCase):
243243
def setUpClass(cls) -> None:
244244
# Create ONE object with temp cache dir
245245
cls.one = ONE(**TEST_DB_2)
246+
cls.one.load_cache() # load local cache tables
246247
cls.eid = UUID('4ecb5d24-f5cc-402c-be28-9d0f7cb14b3a')
247248
cls.pid = UUID('da8dfec1-d265-44e8-84ce-6ae9c109b8bd')
248249
cls.session_record = cls.one.get_details(cls.eid)

0 commit comments

Comments
 (0)