Skip to content

Reverse-engineering vdot (and friends) #21070

@fp64

Description

@fp64

The previous topic (how it started) is old, so let's consider it archived, and restart the discussion here.

The story so far:

  1. The transcendental FPU instructions from the previous topic (sin/cos/sqrt/rsqrt/asin/exp2/rexp2/log2/rcp) are assumed bitwise exact (with understanding that NaN bitpatterns may stop being exact the moment they are loaded into a register - PSP's NaNs are interpreted as signaling e.g. by x86).
  2. The "assumed" part is important: they have not been exhaustively tested against their HW PSP counterparts. The test data they were produced against is several gigabytes, but not exhaustive (that would be 32 GB per function; which takes a while to test on PSP; and also to transmit over internet, although the data is fairly compressible). The available test data was used to form an intellegent guess of what range reduction looks like, and then (presumed) range-reduced inputs were tested exhaustively. Ideally, this produces correct results for all inputs, but it hasn't been tested. Perhaps the most obvious suspects are the junk outputs for vsin/vcos for large inputs (x>2^32), the code for which is looking a bit strange (the current code does match all of the test data).
  3. If anyone is willing to undertake the exhaustive testing mentioned above, it would be much appreciated (it may take a long time; not sure about wear on PSP for running that much number-crunching). The implementation in PPSSPP should be fairly straightforward to extract into something self-contained (you also need the tables in assets/vfpu, obviously), that can be turned into code on real PSP (or used to generate full input/output tables locally, to be compared against tables produced on PSP).
  4. The current implementation is "opaque" in that we do not gain much knowledge about what the HW is actually doing (current implementation is basically a compression exercise). We do have something that indicates what the underlying behavior might be, but not nearly the full picture. Figuring out what HW does would be interesting, but might be not that important to emulation (it might make things smaller - getting rid of assets/vfpu - and/or faster).
  5. The vrnd emulation is assumed bitwise exact. It does seem to match all test data (though, there are some funky things going on with the data itself).
  6. The situation with vrnd is in a sense opposite to transcendentals: we do have a fairly good idea what the HW is doing, but less confidence about bitwise exactness (the carry behavior of E pseudo-register is simply a guess).
  7. Again, there hasn't been much any testing of vrnd on real HW, separate from the test data used to create the current implementation. The exhaustive testing is unfeasible, of course (state space is 2^256), but some verification would be welcome.
  8. The vfpu_dot function is confirmed to not exactly match the PSP HW.
  9. Not sure what other instructions need investigating.
  10. Beside instructions themselves, there might be other floating-point peculiarities, not sure what.

And that brings us to the current position.
The setup is the same as before: I'm willing (an interested) to look at the behavior of these instructions, but do not have access to physical PSP, and am solely reliant on data dumped by other people.
If anyone is able and willing to get said data - that would be nice.

The obvious candidate is the vdot instruction (likely heavy contributor to e.g. "Ridge Racer" replays desynchronization issue).
One issue raised is that this might need a lot of test data, and it is not obvious which it would be (transcendentals are univariate, and dumping entirety of selected floating-point exponents worked well; vdot is an octavariate function though, so exhaustive testing is not feasible).
If that is a concern, I can simply write and post a self-contained program that generates (hopefuly enough of) test cases of interest (or just datafile generated by it, but that may be a bit large).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions