Skip to content

✨ Mcdoc bigint#1932

Open
NeunEinser wants to merge 9 commits intoSpyglassMC:mainfrom
NeunEinser:mcdoc-bigint
Open

✨ Mcdoc bigint#1932
NeunEinser wants to merge 9 commits intoSpyglassMC:mainfrom
NeunEinser:mcdoc-bigint

Conversation

@NeunEinser
Copy link
Contributor

@NeunEinser NeunEinser commented Dec 21, 2025

Mcdoc long literals, and ranges on long types are now read as a bigint, retaining maximum precision.

A toJSON function is patched onto the BigInt class according to the best practices outlined here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt#use_within_json and here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/BigInt_not_serializable#providing_a_custom_serialization_method

Per default, any bigint will be serialized as a raw JSON number. Consumers of the JSON need to read integers without precision loss in order to benefit from this. If this is not wanted and should e.g. be serialized as a JSON string instead, this should be done explicitly in this case.

The SymbolTable has been altered to be able to serialize and deserialize bigints on its data field consistently. For this purpose, during unlink all bigints in the data field will be replaced with an object of the following structure: {"$$type": "bigint", "value": "123456789"}.

When linking an unlinked table (or a deserialized table), this replacement is undone again, retaining the original structure.

The SymbolTable serialization is important for our cache. If it stayed untouched, the precision would be lost by JS's JSON.parse implementation, resulting in no benefit after a cache write and read cycle.

@NeunEinser NeunEinser marked this pull request as ready for review December 21, 2025 21:44
@NeunEinser NeunEinser changed the title Mcdoc bigint ✨ Mcdoc bigint Dec 21, 2025
@misode misode self-requested a review December 23, 2025 00:03
Copy link
Member

@misode misode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that we need this. On the surface, using bigint for long ranges might seem cleaner and better, but the reality is that there is no clean way to implement this, while keeping mcdoc types serializable to JSON. I really dislike the special $$type JSON format and rawJSON patching.

I understand there is a benefit in using bigint for these, but I don't think that weighs up against all the disadvantages.

@SPGoding
Copy link
Member

I really dislike the special $$type JSON format and rawJSON patching.

The rawJSON replacement of BigInts could technically be done in a replacer in JSON.stringify() so that no prototype patching is needed, and we could access the raw text through the reviver argument of JSON.parse() to restore the number literal in JSON back to BigInt. The challenge would be to know which number literals to convert to BitInts and which to keep as numbers in the reviver.

@MulverineX
Copy link
Member

MulverineX commented Jan 20, 2026

The challenge would be to know which number literals to convert to BitInts and which to keep as numbers in the reviver.

IMO we move all long literals to BigInt, and proceed with the {"$$type": "bigint", "value": "123456789"} serialization. Checking for bounds during the serialization step and the parse step is pretty annoying.

As for implementation path, I agree with SPG here, we should just use the built in JSON method arguments

@SPGoding
Copy link
Member

SPGoding commented Jan 24, 2026

To clarify, my point was that it's also possible to avoid the {"$$type": "bigint", "value": "123456789"} serialization as I am not a fan of it either.

As an illustrative example (not that I am for or against it), we could stringify all JavaScript bigints as integers in JSON and all JavaScript numbers as floats in JSON:

function wokeStringify(data) {
	return JSON.stringify(data, (_key, value) => {
		if (typeof value === 'bigint') {
			return JSON.rawJSON(value.toString())
		} else if (typeof value === 'number') {
			let valStr = value.toString()
			if (!valStr.includes('.')) {
				valStr = `${valStr}.0`
			}
			return JSON.rawJSON(valStr)
		} else {
			return value
		}
	})
}

function wokeParse(text) {
	return JSON.parse(text, (_key, value, { source }) => {
		if (typeof value === 'number') {
			return source.includes('.')
				? value
				: BigInt(source)
		}
		return value
	})
}

This allows us to parse and stringify BigInt loselessly from/to JSON:

> const thefuck = { foo: 10000000000000000000000000000000000000000000000000000001n, bar: 2, baz: { qux: 6.7 } }
undefined
> wokeStringify(thefuck)
'{"foo":10000000000000000000000000000000000000000000000000000001,"bar":2.0,"baz":{"qux":6.7}}'
> wokeParse(wokeStringify(thefuck))
{
  foo: 10000000000000000000000000000000000000000000000000000001n,
  bar: 2,
  baz: { qux: 6.7 }
}

This comes with the problem that semantic integers that use the number type in JavaScript have to be encoded as floats in JSON, otherwise we wouldn't know which integers in JSON to parse as numbers and which to parse as bigints (what I meant by "the challenge would be to know which number literals to convert to BitInts and which to keep as numbers in the reviver"). This problem could be solved by (1) using bigint instead of number for all semantic integers in our types or (2) hardcode some keys (which is the only other context that the reviver has access to) that need to be revived as bigints.

@MulverineX
Copy link
Member

Oh thats actually a pretty clean solution, I was envisioning something way more cursed

@NeunEinser
Copy link
Contributor Author

NeunEinser commented Feb 27, 2026

I am not convinced that we need this. On the surface, using bigint for long ranges might seem cleaner and better, but the reality is that there is no clean way to implement this, while keeping mcdoc types serializable to JSON. I really dislike the special $$type JSON format and rawJSON patching.

I understand there is a benefit in using bigint for these, but I don't think that weighs up against all the disadvantages.

The raw json patching needs to be solved, because we are using this as a library.

I don't see any disadvantage in special encoding like this for internal caches that are non-userfacing. Could you elaborate?

I on the contrary strongly dislike saving JSON floats for integers, that's just semantically completely wrong and feels hacky/relies on this being done correctly (i.e. if the .0 is missing for some reason, you have a corrupt cache that will cause exceptions down the line. This would right now happen with existing caches from old versions). In my case, it's explicit when you want a bigint, and retrieving a number instead doesn't break anything.

@NeunEinser
Copy link
Contributor Author

NeunEinser commented Feb 27, 2026

I updated this now to the following:

  • removed prototype patching
  • Use of two different implementations for json reviver and replacers

bigintJsonNumberReplacer and bigintJsonNumberReviver

This is the general one which simply writes bigints as raw json numbers and should be possible to use in all circumstances without disadvantages.

The corresponding reviver reads all json numbers first and foremost as a js number. If it encounters an integer whose js number representation differs from the one that was originally serialized, it uses a bigint. This should only ever happen when the number stored was originally a bigint. Smaller bigints that can be represented losslessly as a normal js number will be deserialized as a number instead of a bigint.

I did not think this would be possible btw, because the { source: string } parameter of the reviver is not documented anywhere I looked. Thanks to SPG for making me aware of that! <3

bigintJsonLoslessReplacer and bigintJsonLoslessReviver

The second implementation that I am using specifically when serializing symbol data for e.g. the cache will store bigints as a string of the following format: $$type:bigint;$$value:<n>. This allows a lossless conversion when it is desired. I disagree with SPG's suggestion to store numbers as floats, as ints from old caches would always become a bigint, and I don't trust this procedure in general. For internal things I think this is totally fine. For external things, expecting non-bigint integers to be serialized with a .0 is dangerous anyways.

Potential further changes

Personally I prefer having a lossless 1:1 serialization and desrialization for our cache, as it ensures types stay exactly like they were before.

If you don't like this for some reason (please do name those reasons in that case), I would be in general open to simply not use the losless replacer/reviver implementation at all.

*/
export function bigintJsonNumberReplacer(_key: string, value: any) {
return typeof value === 'bigint'
? (<any> JSON).rawJSON(value.toString())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern Typescript uses JSON as any. <any> JSON is not usually used and I had to look up the syntax lol

* @param value The value to encode
* @returns Replaced value
*/
export function bigintJsonLoslessReplacer(_key: string, value: any) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelling is Lossless btw

@MulverineX
Copy link
Member

MulverineX commented Feb 27, 2026

Why do we have the "lossless" replacer/reviver when we have these?

export function bigintJsonNumberReplacer(_key: string, value: any) {
	return typeof value === 'bigint'
		? (<any> JSON).rawJSON(value.toString())
		: value
}

export function bigintJsonNumberReviver(_key: string, value: any, data?: { source?: string }) {
	return typeof value === 'number'
			&& data?.source !== undefined
			&& !data.source.includes('.')
			&& value.toString() !== data.source
		? BigInt(data.source)
		: value
}

Are we trying to support environments that do not support JSON.rawJSON & JSON.parse(replacer(data: {source}))? In that situation, shouldn't we just gracefully fallback in these same functions to the "lossless" method?

@MulverineX
Copy link
Member

MulverineX commented Feb 27, 2026

image

source is supported everywhere, and web deployments that need rawJSON support on iOS can use https://github.com/zloirock/core-js?tab=readme-ov-file#ecmascript-json

@NeunEinser
Copy link
Contributor Author

NeunEinser commented Feb 28, 2026

Why do we have the "lossless" replacer/reviver when we have these?

Have you read my comment above your review? If yes please explain what you do not understand about what I am saying there.

@MulverineX
Copy link
Member

I hadn't read your entire comment, thats my bad.

I still think its fine for the cache to lose the fact that a long that fits inside a JS number was initially created as a BigInt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants