Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion Doc/library/dbm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,16 @@
* :mod:`dbm.ndbm`

If none of these modules are installed, the
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
is a `third party interface <https://www.jcea.es/programacion/pybsddb.htm>`_ to
the Oracle Berkeley DB.

.. note::
None of the underlying modules will automatically shrink the disk space used by
the database file. However, :mod:`dbm.sqlite3`, :mod:`dbm.gnu` and :mod:`dbm.dumb`
provide a :meth:`!reorganize` method that can be used for this purpose.


.. exception:: error

A tuple containing the exceptions that can be raised by each of the supported
Expand Down Expand Up @@ -186,6 +192,17 @@ or any other SQLite browser, including the SQLite CLI.
The Unix file access mode of the file (default: octal ``0o666``),
used only when the database has to be created.

.. method:: sqlite3.reorganize()

If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
space will be kept and reused as new (key, value) pairs are added.

.. note::
While reorganizing, as much as two times the size of the original database is required
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.

.. versionadded:: next

:mod:`dbm.gnu` --- GNU database manager
---------------------------------------
Expand Down Expand Up @@ -284,6 +301,10 @@ functionality like crash tolerance.
reorganization; otherwise, deleted file space will be kept and reused as new
(key, value) pairs are added.

.. note::
While reorganizing, as much as one time the size of the original database is required
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.

.. method:: gdbm.sync()

When the database has been opened in fast mode, this method forces any
Expand Down Expand Up @@ -438,6 +459,11 @@ The :mod:`!dbm.dumb` module defines the following:
with a sufficiently large/complex entry due to stack depth limitations in
Python's AST compiler.

.. warning::
:mod:`dbm.dumb` does not support concurrent read/write access. (Multiple
simultaneous read accesses are safe.) When a program has the database open
for writing, no other program should have it open for reading or writing.

.. versionchanged:: 3.5
:func:`~dbm.dumb.open` always creates a new database when *flag* is ``'n'``.

Expand All @@ -460,3 +486,15 @@ The :mod:`!dbm.dumb` module defines the following:
.. method:: dumbdbm.close()

Close the database.

.. method:: dumbdbm.reorganize()

If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
space will not be reused.

.. note::
While reorganizing, no additional free disk space is required. However, be aware
that this factor changes for each :mod:`dbm` submodule.

.. versionadded:: next
16 changes: 14 additions & 2 deletions Doc/library/shelve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,15 @@ Two additional methods are supported:

Write back all entries in the cache if the shelf was opened with *writeback*
set to :const:`True`. Also empty the cache and synchronize the persistent
dictionary on disk, if feasible. This is called automatically when the shelf
is closed with :meth:`close`.
dictionary on disk, if feasible. This is called automatically when
:meth:`reorganize` is called or the shelf is closed with :meth:`close`.

.. method:: Shelf.reorganize()

Calls :meth:`sync` and attempts to shrink space used on disk by removing empty
space resulting from deletions.

.. versionadded:: next

.. method:: Shelf.close()

Expand Down Expand Up @@ -116,6 +123,11 @@ Restrictions
* On macOS :mod:`dbm.ndbm` can silently corrupt the database file on updates,
which can cause hard crashes when trying to read from the database.

* :meth:`Shelf.reorganize` may not be available for all database packages and
may temporarely increase resource usage (especially disk space) when called.
Additionally, it will never run automatically and instead needs to be called
explicitly.


.. class:: Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')

Expand Down
17 changes: 17 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,30 @@ New modules
Improved modules
================

dbm
---

* Added new :meth:`!reorganize` methods to :mod:`dbm.dumb` and :mod:`dbm.sqlite3`
which allow to recover unused free space previously occupied by deleted entries.
(Contributed by Andrea Oliveri in :gh:`134004`.)


difflib
-------

* Improved the styling of HTML diff pages generated by the :class:`difflib.HtmlDiff`
class, and migrated the output to the HTML5 standard.
(Contributed by Jiahao Li in :gh:`134580`.)


shelve
------

* Added new :meth:`!reorganize` method to :mod:`shelve` used to recover unused free
space previously occupied by deleted entries.
(Contributed by Andrea Oliveri in :gh:`134004`.)


ssl
---

Expand Down
32 changes: 29 additions & 3 deletions Lib/dbm/dumb.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,14 @@
- seems to contain a bug when updating...

- reclaim free space (currently, space once occupied by deleted or expanded
items is never reused)
items is not reused exept if .reorganize() is called)

- support concurrent access (currently, if two processes take turns making
updates, they can mess up the index)

- support efficient access to large databases (currently, the whole index
is read when the database is opened, and some updates rewrite the whole index)

- support opening for read-only (flag = 'm')

"""

import ast as _ast
Expand Down Expand Up @@ -289,6 +287,34 @@ def __enter__(self):
def __exit__(self, *args):
self.close()

def reorganize(self):
if self._readonly:
raise error('The database is opened for reading only')
self._verify_open()
# Ensure all changes are committed before reorganizing.
self._commit()
# Open file in r+ to allow changing in-place.
with _io.open(self._datfile, 'rb+') as f:
reorganize_pos = 0

# Iterate over existing keys, sorted by starting byte.
for key in sorted(self._index, key = lambda k: self._index[k][0]):
pos, siz = self._index[key]
f.seek(pos)
val = f.read(siz)

f.seek(reorganize_pos)
f.write(val)
self._index[key] = (reorganize_pos, siz)

blocks_occupied = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
reorganize_pos += blocks_occupied * _BLOCKSIZE

f.truncate(reorganize_pos)
# Commit changes to index, which were not in-place.
self._commit()



def open(file, flag='c', mode=0o666):
"""Open the database file, filename, and return corresponding object.
Expand Down
4 changes: 4 additions & 0 deletions Lib/dbm/sqlite3.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
ITER_KEYS = "SELECT key FROM Dict"
REORGANIZE = "VACUUM"


class error(OSError):
Expand Down Expand Up @@ -122,6 +123,9 @@ def __enter__(self):
def __exit__(self, *args):
self.close()

def reorganize(self):
self._execute(REORGANIZE)


def open(filename, /, flag="r", mode=0o666):
"""Open a dbm.sqlite3 database and return the dbm object.
Expand Down
5 changes: 5 additions & 0 deletions Lib/shelve.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,11 @@ def sync(self):
if hasattr(self.dict, 'sync'):
self.dict.sync()

def reorganize(self):
self.sync()
if hasattr(self.dict, 'reorganize'):
self.dict.reorganize()


class BsdDbShelf(Shelf):
"""Shelf implementation using the "BSD" db interface.
Expand Down
61 changes: 61 additions & 0 deletions Lib/test/test_dbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,67 @@ def test_anydbm_access(self):
assert(f[key] == b"Python:")
f.close()

def test_anydbm_readonly_reorganize(self):
self.init_db()
with dbm.open(_fname, 'r') as d:
# Early stopping.
if not hasattr(d, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")

self.assertRaises(dbm.error, lambda: d.reorganize())

def test_anydbm_reorganize_not_changed_content(self):
self.init_db()
with dbm.open(_fname, 'c') as d:
# Early stopping.
if not hasattr(d, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")

keys_before = sorted(d.keys())
values_before = [d[k] for k in keys_before]
d.reorganize()
keys_after = sorted(d.keys())
values_after = [d[k] for k in keys_before]
self.assertEqual(keys_before, keys_after)
self.assertEqual(values_before, values_after)

def test_anydbm_reorganize_decreased_size(self):

def _calculate_db_size(db_path):
if os.path.isfile(db_path):
return os.path.getsize(db_path)
total_size = 0
for root, _, filenames in os.walk(db_path):
for filename in filenames:
file_path = os.path.join(root, filename)
total_size += os.path.getsize(file_path)
return total_size

# This test requires relatively large databases to reliably show difference in size before and after reorganizing.
with dbm.open(_fname, 'n') as f:
# Early stopping.
if not hasattr(f, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")

for k in self._dict:
f[k.encode('ascii')] = self._dict[k] * 100000
db_keys = list(f.keys())

# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_before = _calculate_db_size(os.path.dirname(_fname))

# Delete some elements from the start of the database.
keys_to_delete = db_keys[:len(db_keys) // 2]
with dbm.open(_fname, 'c') as f:
for k in keys_to_delete:
del f[k]
f.reorganize()

# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_after = _calculate_db_size(os.path.dirname(_fname))

self.assertLess(size_after, size_before)

def test_open_with_bytes(self):
dbm.open(os.fsencode(_fname), "c").close()

Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1365,6 +1365,7 @@ Milan Oberkirch
Pascal Oberndoerfer
Géry Ogam
Seonkyo Ok
Andrea Oliveri
Jeffrey Ollie
Adam Olsen
Bryan Olson
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:mod:`shelve` as well as underlying :mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!reorganize` methods to
recover unused free space previously occupied by deleted entries.
12 changes: 10 additions & 2 deletions configure

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 11 additions & 2 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -8001,6 +8001,15 @@ AC_SUBST([LIBHACL_CFLAGS])
LIBHACL_LDFLAGS= # for now, no specific linker flags are needed
AC_SUBST([LIBHACL_LDFLAGS])

dnl Check if universal2 HACL* implementation should be used.
if test "$UNIVERSAL_ARCHS" = "universal2" -o \
\( "$build_cpu" = "aarch64" -a "$build_vendor" = "apple" \)
then
use_hacl_universal2_impl=yes
else
use_hacl_universal2_impl=no
fi

# The SIMD files use aligned_alloc, which is not available on older versions of
# Android.
# The *mmintrin.h headers are x86-family-specific, so can't be used on WASI.
Expand All @@ -8017,7 +8026,7 @@ then
# available on x86_64. However, performance of the HACL SIMD128 implementation
# isn't great, so it's disabled on ARM64.
AC_MSG_CHECKING([for HACL* SIMD128 implementation])
if test "$UNIVERSAL_ARCHS" == "universal2"; then
if test "$use_hacl_universal2_impl" = "yes"; then
[LIBHACL_BLAKE2_SIMD128_OBJS="Modules/_hacl/Hacl_Hash_Blake2s_Simd128_universal2.o"]
AC_MSG_RESULT([universal2])
else
Expand Down Expand Up @@ -8049,7 +8058,7 @@ then
# implementation requires symbols that aren't available on ARM64. Use a
# wrapped implementation if we're building for universal2.
AC_MSG_CHECKING([for HACL* SIMD256 implementation])
if test "$UNIVERSAL_ARCHS" == "universal2"; then
if test "$use_hacl_universal2_impl" = "yes"; then
[LIBHACL_BLAKE2_SIMD256_OBJS="Modules/_hacl/Hacl_Hash_Blake2b_Simd256_universal2.o"]
AC_MSG_RESULT([universal2])
else
Expand Down
Loading