Skip to content

v2.4.0

Choose a tag to compare

@unexpectedpanda unexpectedpanda released this 11 Aug 10:51
· 29 commits to main since this release

Consider this update a bonus, as I wanted to correct a few things. It isn't a sign of future updates to come.

Redump clone lists have been updated. No-Intro clone lists... maybe, maybe not. Pulling No-Intro data is just as arduous as it's ever been, its bot protection and missing downloads frustrating automating the process.

The primary focus of this update is fixing coding sins of the past, while making memory and speed improvements.

Benchmarking platform:

  • CPU: Intel Core i7 14700K (28 threads across 8 P-Cores and 12 E-cores)
  • RAM: 64GB DDR DD5-6400
  • Disk: Samsung SSD 990 Pro 2TB

Windows 10, Python 3.13.3 results:

DAT file Number of titles Options

DAT processing time

(seconds, averaged over five runs)

Peak memory usage

(Resident size, MB)

Measured by memory-profiler

v2.3.9 v2.4.0 Improvement v2.3.9 v2.4.0 Improvement
Commodore - Amiga CD 567 Default 1.45 0.35 4.14x 54.70 47.60 1.15x
Nintendo - Nintendo 3DS (Digital) (CDN) 10,152 Default 66.51 4.67 14.24x 298.10 197.80 1.51x
Nintendo - Nintendo Entertainment System 6,965 Default 3.71 2.68 1.38x 143.00 120.20 1.19x
Sony - PlayStation 10,776 Default 4.88 2.83 1.72x 286.50 207.70 1.38x
Sony - PlayStation 10,776 Exclude aABcdDefmMopPruv 10.58 3.04 3.48x 271.20 192.50 1.41x

Ubuntu 24.04.2 LTS on WSL, Python 3.12.3 results:

DAT file Number of titles Options

DAT processing time

(seconds, averaged over five runs)

Peak memory usage

(Heap size, MB)

Measured by Memray

Peak memory usage

(Resident size, MB)

Measured by Memray

v2.3.9 v2.4.0 Improvement v2.3.9 v2.4.0 Improvement v2.3.9 v2.4.0 Improvement
Commodore - Amiga CD 567 Default 0.35 0.34 1.03x 15.47 12.64 1.22x 67.13 62.92 1.07x
Nintendo - Nintendo 3DS (Digital) (CDN) 10,152 Default 63.75 3.70 17.23x 235.40 70.96 3.32x 336.40 126.10 2.67x
Nintendo - Nintendo Entertainment System 6,965 Default 5.93 1.56 3.80x 52.50 28.01 1.87x 113.30 79.79 1.42x
Sony - PlayStation 10,776 Default 3.63 2.24 1.62x 181.20 65.76 2.76x 255.80 118.10 2.17x
Sony - PlayStation 10,776 Exclude aABcdDefmMopPruv 8.27 2.31 3.58x 182.10 66.83 2.72x 257.60 118.80 2.17x

While I've done my best to equalize performance between Windows and Linux, the Python interpreter on Linux is much better than on Windows for multiprocessing, even when Linux is running under Windows. Mostly this is because of the different ways Python starts a process on different operating systems. Perhaps by the time the thread-free model becomes the default in Python and has been optimized things will change a little — but that's not going to happen for at least another year or more.

For curiosity, I ran the same Sony - PlayStation DAT file on Retool 0.53, which dates back to 2020. It took 2m, 10.51s to complete. Now it's down to 2.83s, a 46x speed improvement in five years. Knowledge is a crazy thing. At this point in time there probably isn't many big performance wins left to squeeze out, just a collection of infinite tiny tweaks of diminishing returns that are likely not worth making.

Here are the changes for 2.4.0:

  • Feature: Retool can now assign titles as RetroAchievements compatible by adding a retroachievements="yes" attribute on game or machine tags. You can also set your 1G1R title selection to prefer RetroAchievement titles. RetroAchievements data is retrieved from an external source. If the source stops updating or becomes unavailable, using RetroAchievements features won't be effective.

  • Feature: Unrecognized attributes in game and machine elements are now passed through to the output DAT file.

  • Feature: Unrecognized child elements in the game and machine elements are now passed through to the output DAT file.

  • Change: There's now a versionIgnore array in internal-config.json, which details the titles that shouldn't be picked up by automatic version detection. Retool's version detection originally caused confusion in creating clone lists, where you'd have to get tricky with workarounds for titles like Pokemon - Black Version 2, as Retool would see it as version 2.0 of Pokemon - Black — not its own game.

    Now, so long as those problem titles are in the versionIgnore array, you can refer to them directly in the clone lists instead of using workarounds.

  • Change: Windows no longer uses the maximum amount of CPU cores available to it in all scenarios. The cost of spinning up a process in Python under Windows is very high, meaning that using more cores can mean less performance than fewer cores in many cases. Instead, Retool makes a ballpark guess at the best number of processes to use for best performance. While there is a penalty for adding processes in Linux as well, the total processing time is still so small you may as well just use all cores anyway. MacOS likely suffers the same fate as Windows as it also uses the spawn method instead of fork to create a process, but I don't have the hardware to test, so no changes have been made there.

  • Change: Removed the Include titles without hashes or sizes option, as it was tied to old code that was no longer used.

  • Change: CLRMAMEPro DAT files are no longer converted to LogiqX before processing. Instead, data gets ingested directly.

  • Change: Retool no longer checks for XML external entity attacks, as its DAT file parsing doesn't resolve these entities anyway.

  • Change: DTD validation is no longer performed against a LogiqX DAT file. It's clear people aren't really following the DTD, and the validation just takes up processing time.

  • Change: Output files now terminate with an empty line for easier diff comparisons.

  • Change: MIAs are no longer labelled by default. The --nolabelmia flag has been inverted to be --labelmia.

  • Change: MIA data is now downloaded from an external source. If the source stops updating or becomes unavailable, marking MIAs won't be effective.

  • Change: The option to remove MIAs now removes the entire title if it's missing a file, not just the individual file from the title.

  • Change: Numbered DAT files are now output sorted according to their number, not their name without the number.

  • Change: Since Retool's code base has matured enough, and there are now enough tests to find problems, Retool no longer checks for clones that are also assigned as parents. Files should process faster as a result.

  • Fix: Redump now uses two sets of language tags for some titles. For example, Gears of War 2 (Europe) (En,Fr,De,Es,It,Zh,Ko,Pl,Ru,Cs,Hu) (En,Es,It) and Ultimate Action Triple Pack (Europe) (En,Fr,De,Es,It,Nl,Pt) (En,Fr,De). The second set of languages unfortunately is used to mean more than one thing, so is not useful for filtering based on the filename alone. For example, in Gears of War 2 (Europe) (En,Fr,De,Es,It,Zh,Ko,Pl,Ru,Cs,Hu) (En,Es,It), English, Spanish, and Italian are the spoken languages available for the title. For Ultimate Action Triple Pack (Europe) (En,Fr,De,Es,It,Nl,Pt) (En,Fr,De) however, the second set of languages represents the common languages found in each of the games in the compilation:

    • Deus Ex: Human Revolution (BLES-01151) (v01.00) (Fr,De,Es,It)
    • Hitman: Absolution (BLES-01403) (v01.00) (En,Fr,Es)
    • Thief (BLES-01982) (v01.01) (En,Fr,De,Es,It,Pl,Ru)

    In the compilation's case, the second language block is being used as a marker of where the title was intended to be distributed, since Redump is unable to add new regions to their system.

    Since this data is not presented in a syntax that's useful for filtering, the second language tag is now stripped by Retool, and only the first is used to determine language.

  • Fix: You could turn on a system setting override in the GUI, but turning it off again wouldn't save the off state in the config. This has been fixed.

  • Fix: When you enabled Prefer titles ripped from modern rereleases, the output file name added a code of -r, which is actually for Prefer licensed over unlicensed titles. A code of -z is now used, as intended. The content of the output DAT file was always correct, so no fix was required there.

  • Fix: Fixed a crash when using include overrides.

  • Fix: If one line was marked in the include or exclude overrides to remove related titles, Retool removed the related titles for all lines in the include or exclude overrides. This has been fixed.

  • Fix: If titles can be removed due to more than one exclusion setting, they now always show up in the same exclusion category in the report.

  • Fix: Fixed a stat counting bug to do with supersets.

  • Fix: Fixed the cloneof property in legacy mode being set as a non-numbered name in numbered DAT files.

  • Fix: Fixed CLRMAMEPro BIOS DAT files from Redump exporting with a category of Console instead of BIOS.

  • Fix: Fixed the exclude and include options comment in system config files, so it now correctly says "SYSTEM" instead of "GLOBAL".

  • Fix: Some version comparison bugs were squashed.

  • Chore: Created custom checkboxes as SVGs, as QT's default checkboxes don't scale properly. Amusingly I'd done this before for Retool's first GUI, so was able to take some of that work as a starting point. Also fixed the down arrow on dropdown boxes, as the default moves weirdly on 4k monitors when you mouse over it.

  • Chore: Retool now more aggressively skips some title comparisons to avoid doing work it doesn't have to, improving performance for large groups. This was put in place due to Doko Demo Honya-san in the Nintendo 3DS CDN DAT file, which has 1,519 titles bundled into the one group with no clones, and dramatically slowed down processing.

  • Chore: Multiprocessing performance improvements. When each new process is spawned/forked, Python does a bunch of serializing of data for each process. The more data that goes in, the more performance tax there is for starting a process. As such, things have been refactored to push as little as possible into each process.

  • Chore: Memory usage improvements.

    • Entire DAT files are no longer loaded into memory to read them. Instead, one game, set, or machine element is ingested at a time.
    • A full copy of the original data read in from the DAT file is no longer kept to makes override includes work.
    • Metadata and clone list content are discarded when they're no longer used.
  • Chore: The progress bar now increments per title during the Selecting 1G1R titles stage, instead of incrementing milestone style after each sub-stage completes.

  • Chore: Deduped search strings found in both internal-config.json and Retool's code.

  • Chore: Moved most of the version normalization code out of the comparison loop, so it's only performed once per title instead of for every comparison.

  • Chore: Overhauled how Retool handles exclusions and stats, and got a nice speed boost out of it.

  • Chore: Overhauled how Retool handles config file incompatibilities between versions.

  • Chore: Fixed doc strings so they format properly in Visual Studio code when hovering over the function name.

  • Chore: Corrected multiple typing hints.

  • Chore: Added multiple tests.

  • Chore: Did some profiling between Austin, Memray, and memory-profiler to reduce performance issues.

  • Chore: Updated dependencies. Unpinned QT version, as the bug that interfered with testing was fixed.

  • Chore: Almost six years into Python and apparently I missed the basics of setting defaults in a function. Check out this function:

    def message(
        message_add: str = '',
        message_list: list[str] = []
        ) -> list[str]:
            message_list.append(message_add)
    
            return message_list
    
    a: list[str] = message('Hello')
    b: list[str] = message('Goodbye')

    What is the value of a going to be?

    >>> print(a)
    ['Hello', 'Goodbye']

    Huh, that's weird. What about b?

    >>> print(b)
    ['Hello', 'Goodbye']

    The same?!

    It turns out that when you set a default for a function argument to an empty value of a mutable type, that value only gets used the first time the function runs as part of constructing and assigning that variable.

    So the first time message is called without specifying message_list, it creates the empty list, []... but every subsequent time the function is called, it says "oh hey, I've already got a reference to this variable... I'm going to use that instead!"

    So when a = message('Hello') is called, message_list starts as [], and then Hello is added to it. When b = message('Goodbye') is called... as far as Python is concerned, the value for message_list already exists, and it's ['Hello']... so it appends 'Goodbye' to it. And because that value is just a reference, and both a and b are pointing to that reference... they now both equal ['Hello', 'Goodbye'].

    To work around this, we need some Python boilerplate that's apparently well known... that somehow I missed.

    def message(
        message_add: str = '',
        message_list: list[str] | None = None
        ) -> list[str]:
            message_list = message_list if message_list is not None else []
    
            message_list.append(message_add)
    
            return message_list
    
    a: list[str] = message('Hello')
    b: list[str] = message('Goodbye')

    Now let's see how things go:

    >>> print(a)
    ['Hello']
    
    >>> print(b)
    ['Goodbye']

    Much better, much more predictable, and this is now fixed in Retool's code.