Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of value semantics? #387

Open
AprilArcus opened this issue Dec 24, 2023 · 68 comments
Open

Status of value semantics? #387

AprilArcus opened this issue Dec 24, 2023 · 68 comments

Comments

@AprilArcus
Copy link

According to @littledan on JS Party Episode 305, value semantics for records and tuples are being reconsidered by the committee. Could you expand a little more on the concerns here?

@acutmore
Copy link
Collaborator

acutmore commented Jan 8, 2024

Hi @AprilArcus, I'm hoping something more complete can be written up on the status of this soon.

In lieu of that, a short summary is that we have received feedback from multiple major JS engine implementation teams that their investigations around === equality for R&T suggests this will be more complicated than we initially hoped and even if implemented there is doubt that the performance of the operation would match developer expectations. Additionally due to the way most JS engines are architected it would add an extra load instruction to existing usages of ===, so existing apps that haven't adopted R&T could still be negatively impacted.

This was not the only feedback but is maybe the most significant in terms of the proposal now looking at other potential designs for how the equality could work.

I'll be sure to post an update here when there is a link to more information.

@acusti
Copy link
Contributor

acusti commented Jan 11, 2024

@acutmore thanks for the update! does this preclude different R&T instances with equivalent values being SameValue equals? my immediate expectation would be that if strict equality isn’t possible, that they could be compared using Object.is.

@demurgos
Copy link

What about using these types as keys in Map and Set? Using a tuple such as [x, y] as key currently is a bit awkward since you have to convert it to a primitive first. Being able to represent composite key more easily was a great benefit of this proposal.

@acutmore
Copy link
Collaborator

does this preclude different R&T instances with equivalent values being SameValue equals? my immediate expectation would be that if strict equality isn’t possible, that they could be compared using Object.is.

@acusti mostly like it does preclude that. SameValue and Object.is are very exact comparisons. Ignoring NaN canonicalisation, when two values are considered "same" then there is no way to tell them apart. They are literally the same value from the language's perspective.

That said there could be a new API Something.equals that would be a new form of equality that includes checking R&T.

@demurgos, similar to above. It's likely that by default R&T would behave the same as any other object in Map and Set, but there can be either new collection types or a flag that is passed to the constructor which provides a type of Map or Set which would allow R&T to be used as a more descriptive key.

@demurgos
Copy link

Thank you for your reply. It feels like R&T will end up being syntax sugar for Object.freeze (and no prototype). If so, it will still be useful but it's disappointing that perf concerns caused such a scope reduction ☹️

@mhofman
Copy link
Member

mhofman commented Jan 17, 2024

@demurgos the R&T champions and other delegates are still very much interested in supporting the use cases that R&T would solve. For example, composite keys as you mention is definitely something I would like to be better supported natively as it's extremely painful to build in userland. However for various compatibility and performance reasons it likely will require some kind of explicit opt-in.

@AprilArcus
Copy link
Author

If === is too performance-sensitive to be extended to cover a special case for Records & Tuples, could a useful set of semantics be reclaimed from the already slower and more complicated == operator? Or could a novel operator be introduced such as =#=?

@acutmore
Copy link
Collaborator

Hi @AprilArcus, are there particular situations where you would find having an operator useful over having to use a function call?

Adding a new overload to == is, I feel, also unlikely. While it does have slow paths, comparing two objects is not one of them. There is also a general theme in the ecosystem to prefer === over == (though == null is sometimes seen as an OK exception), so adding new functionality to it might muddy the water in terms of advice on which to use.

Adding a new operator, like any new syntax, tends to need a stronger justification compared to adding new API. This is likely because the remaining design space is somewhat limited (only so many non-letter ascii characters to choose from), and they can be harder to look up their documentation (IDEs currently tend to expose docs for APIs but not syntax). A new operator could also be researched as a separate follow on proposal.

@gregmartyn
Copy link

If the problem (interning) already exists with strings, (does it?) why is it not acceptable for R&T? Shouldn't the problem be addressed in the same way? I.e. instead of introducing an e.g. RnT.sometimesSlowCompare(), have === be the sometimes-slow one -- like it already is for strings -- and introduce a constantTimeCompare() function for interned strings & R&T?

I'm sure there's something(s) I'm missing though because I didn't follow the bit about why that would "add an extra load instruction to existing usages of ===".

@errorx666
Copy link

I'd rather not have R&T at all than have them without value comparison semantics.

@acutmore
Copy link
Collaborator

Hi @gregmartyn, I'll try and respond as best I can!

If the problem (interning) already exists with strings,

You are correct that comparing string with === has a similar risk of also being a 'slow' (non-constant-time) comparison. The concerns are a mix of:

  • strings already having this risk does not necessarily make it OK to increase the number of types with this risk
  • applying the same string lazy-interning optimisations to R&T makes it much-harder for R&T to re-use the existing optimisations that exist for Objects and Arrays - as the memory layouts won't match.

I didn't follow the bit about why that would "add an extra load instruction to existing usages of ===".

This does not apply to all JS engines, it depends on the information that comes with their 'tagged-pointers'. In some engines the pointer itself encodes that it points to a JS-object or a JS-string. This means that the instructions for === have a fast path where if the two pointers point to JS-objects they can immediately return the comparison of the two addresses. If R&T are built re-using most of the existing mechanisms for JS-objects, so they can befit from the existing optimisations and GC, then this means that whenever two objects are compared with === the instructions now lose that fast path and will need to load/follow the two addresses to further inspect if the two objects are regular objects or R/T to know if they can be compared by address or by content. This extra load instruction while small on it's own could add observable overhead when used in a hot-path of an application.

@lautarodragan
Copy link

What about using these types as keys in Map and Set? Using a tuple such as [x, y] as key currently is a bit awkward since you have to convert it to a primitive first. Being able to represent composite key more easily was a great benefit of this proposal.

I'd rather not have R&T at all than have them without value comparison semantics.

@demurgos

Myself, I'd still prefer this to go forward, even without them.

fwiw, even though I agree it'd be ideal to have strict equality, a Map[Symbol.keyEquality] would more or less address this issue. Less than ideal since it moves the problem to userland and will probably have less performance, but it could work. Or a new, nativeRecordTupleMap that implements a new key equality algorithm, provided by js engines, which would be more performant.

None of these options are ideal, but moving forward with R&Ts without strict equality doesn't close these doors, and decoupling it from this requirement seems to be an acceptable path forward.

A future, separate proposal could address === comparing by value rather than reference for records and tuples, which may have much less friction if records and tuples are already implemented in engines, or at least the proposal is in stage 3 or 4.

Also, I'm not sure changing === would affect Same-value-zero equality — isn't that an internal engine-specific algorithm that is not necessarily using === internally? If so, even if this proposal did change how this operator works, that might still not affect how Maps and Sets compare keys. These problems might be decoupled from each other — proposing that the Same-value-zero equality compares records and tuples by value regardless of what ===, Object.is, etc do, might be a more addressable proposal.

@ljharb
Copy link
Member

ljharb commented Feb 2, 2024

A future proposal would never ever be capable of changing how any operator behaves with a specific kind of object. Once R&T ships as a non-primitive, that is forever an unavailable path for them.

@demurgos
Copy link

demurgos commented Feb 19, 2024

Myself, I'd still prefer this to go forward, even without them.

If we take a step back, what is even the goal that this proposal is trying to achieve? Is it a new feature or new syntax? In a reply above I was disappointed in the scope reduction but felt the sugar may still be useful. Since then I had some time to think more about it, and I don't want to give up the value semantics. Please keep them, they are the core of this proposal.

The first paragraphs of the README are (emphasis mine):

Records and Tuples can only contain primitives and other Records and Tuples. You could think of Records and Tuples as "compound primitives". By being thoroughly based on primitives, not objects, Records and Tuples are deeply immutable.

Records and Tuples support comfortable idioms for construction, manipulation and use, similar to working with objects and Arrays. They are compared deeply by their contents, rather than by their identity.

The expectation from the start was that the goal was to get compound primitives, aka value semantics. These are an incredibly powerful feature because they fully integrate with regular operators and methods. ===, Object.is, Array#indexOf, etc. just work and it allows to avoid round-tripping to a string serialization. Compound values become regular values with R&T.

I don't want to be too harsh for all the people working on this who want to drop value semantics, but it's also one of the most important ongoing proposals for JS. Syntax sugar can be great, for example the class keyword did not unlock new features initially but made an awful lot of patterns more convenient. Without value semantics, #[] and #{} could be an Object.deepFreeze method, saving a bit of keystrokes. But is it really what the proposal intended to achieve?

I understand that compound primitives are harder to support, but it may lead to a way better language in the long run. The JS module system is one of the best of any language I ever used, it required huge amounts of effort but the result was well worth it. Native async support was also great and deeply transformed JS for the better. R&T with value semantics is one of those landmark proposals that should be fully realized IMO. JS does not have stuff like operator overloading, so R&T is the best shot to get first class language support for composite values without stringifying the world. Compound primitives are also a solid foundation to expand on in the future; so it's better to take a bit more time and have the best version of them that we can.

@Samuel-Martineau
Copy link

The most common case I run into where I feel like records and tuples would be a godsend is when I am trying to use structured data as a map key, without having to encode it as string or resorting to any other hacks. However, this doesn't work at all if records / tuples are just frozen objects instead of primitives. Although I appreciate the technical difficulty of adding them, I feel like it isn't worth it without this key feature.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

That use case - being able to control the "key" of Maps and Sets - is valuable to solve, and doesn't need R&T to do so.

@Samuel-Martineau
Copy link

That use case - being able to control the "key" of Maps and Sets - is valuable to solve, and doesn't need R&T to do so.

Whilst you are right that R&T is not needed for this use case, it feels natural to use an immutable, compared by value data structure for a Map key, doesn't it? It is simply my opinion that the real value of R&T isn't so much the immutability as the possibilities that are opened by a complex data primitive.

@mhofman
Copy link
Member

mhofman commented Feb 20, 2024

Immutability is a requirement for any stable keying (it's also the reason the value must not be key-comparable before being stable). What is not automatically needed is for the composite key value to expose the values composing it.

The original R&T proposal solved this by making values born deeply immutable, which made it itself usable as a composite key. What the champions are now exploring is the problem space if the immutable R/T and the composite key are not the same thing.

@demurgos
Copy link

demurgos commented Feb 20, 2024

Having key semantics different from equality leads to many inconsistencies. It's possible to build many examples, but they all boil down to expecting that a Map/Set having a key implies that there exists an item equal to the key. In particular, I expect all three functions below to be equivalent*.

function setHas1(s: Set<unknown>, x: unknown): boolean {
  return s.has(x)
}

function setHas2(s: Set<unknown>, x: unknown): boolean {
 for (const v of s) {
    if (v === x) {
      return true;
    }
    return false;
  }
}

function setHas3(s: Set<unknown>, x: unknown): boolean {
  return [...s].indexOf(x) >= 0;
}

The strength of value semantics is that it's not a special case for Map and Set, but that it fully integrates with the language at all levels.

*: Yes, I'm aware of NaN, it does not mean that we should add more edge cases

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

@demurgos thats not an assumption that holds for subclasses, so it’s a false assumption. Only the first is valid.

@mhofman
Copy link
Member

mhofman commented Feb 20, 2024

boil down to expecting that a Map/Set having a key implies that there exists an item equal to the key

I don't think anyone has suggested changing the default key semantics of Map/Set. However one line being explored is that Map/Set could be configured at construction with a keying function to derive the keying value from the provided "key" (This is roughly equivalent to an alternative Map/Set implementation). The keying value would have normal equality / SameValue semantics.

I expect all three functions below to be equivalent*.

I don't believe a program can assume the behavior of dynamically invoked methods on parameters. Anyone today can already provide a custom map/set/array instance with overriden methods.

The program can only trust the behavior of instances it itself creates (modulo prototype pollution which is an orthogonal concern). Since noone is proposing to change the default behavior of existing Map/Set, I don't believe there is a problem.

@demurgos
Copy link

I understand that JS is dynamic by nature, everything can be patched, etc. But I would still argue that any subclass or method override where iteration disagrees with membership no longer provides a consistent Set interface.

JS already has multiple ways to check for equality Object.is, ==, ===; so I get why parametrizing a map/set so it uses a different behavior is a reasonable idea. It's actually a good idea, but I don't think it's the best for R&T. I'll wait to see what this exploration will lead to.

@Maxdamantus
Copy link

The keying function idea seems a bit awkward to me, because it's possible that the keying function will return different values for the same original key over time (eg, k => JSON.stringify(k) will vary if someone modifies the original k object, or an object referred to by it, etc).

This means that either the mapped key needs to be stored indefinitely in the Map, or else the Map would be able to get into an undefined state due to inconsistencies in derived key hashing (eg, if the implementation uses a hash map or similar) or key ordering (eg, if the implementation uses a tree map or similar).

const map = new Map(k => JSON.stringify(k));
const k = { foo: 42 };
map.set(k, "hello");
k.foo++;
map.set(k, "world");

Presumably the Map above now contains two entries with the same original key value k, but if it's backed by a hash map and the mapped values ("{\"foo\":42}", "{\"foo\":43}") are not stored, there's a chance that both set operations hash to the same bucket so it only ends up with one entry, thus exposing behaviour of the underlying hashing/bucket.

I can imagine there might be some use for it, but I think if it becomes the main way to implement "compound keys", it could be a big footgun.

Having a separate constructor for compound keys just seems more sound, but it does indeed just sound like a less useful version of tuples (since they're basically just tuples that can't be read).

@mhofman
Copy link
Member

mhofman commented Feb 20, 2024

The keying function idea seems a bit awkward to me, because it's possible that the keying function will return different values for the same original key over time (eg, k => JSON.stringify(k) will vary if someone modifies the original k object, or an object referred to by it, etc).

It's the responsibility of the creator of the Map/Set to use a keying function that's stable. Relying on the data in the key itself with the key being mutable is obviously not compatible with stability, hence why I mentioned that immutability is a requirement for any stable keying

the mapped key needs to be stored indefinitely in the Map

Yes, the Map implementation would not rely on the keying function for existing entries, not just because of stability, but because otherwise it would expose the internal implementation of the Map to the keying function.

Having a separate constructor for compound keys just seems more sound

Right, a composite key is a useful building block for custom collection keying: the keyBy options can just return a composite key. I personally believe it actually should be the only accepted return values of keyBy, to avoid potential footguns like you describe.

@Maxdamantus
Copy link

Maxdamantus commented Feb 20, 2024

I personally believe it actually should be the only accepted return values of keyBy, to avoid potential footguns like you describe.

How does that avoid the footgun? Presumably k => Map.key(k.foo) isn't any more stable than k => JSON.stringify(k) in the example I gave. Even if it returns an object, I would expect the underlying equality (and hash) operation to continue to be stable (eg, SameValue or SameValueZero isn't going to give different results for the same pair of values, even if the values are mutable objects).

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

My expectation is that a keying function would only ever be called once for any given value, and the result memoized.

@acutmore
Copy link
Collaborator

While a dedicated key creating API does not prevent mutated objects being unstable sources of key generation, it does still have benefits compared to JSON.stringify - it could help avoid worrying about key order (JSON.stringify({ a: 1, b: 2 }) !== JSON.stringify({ b: 2, a: 1 })), and support more values than JSON (JSON.stringify([1n]) // throws).

While the primitive, value semantics, R&T design guarantees stability - the same indirection issues can arise because R&T could only store other primitives. So indirection (via symbols in weakMaps) was required to store non-primitive values (e.g. Temporal.ZonedDateTime), just like someone can provide a non-stable keyby function to a Map or Set, someone could have a non-stable lookup for the objects referenced by their R&T.

So to me the question is less of, is it possible to create an odd scenario with the API, and more of if the APIs encourage good patterns by default and does less safe usages of the API clearly standout when reading the code.

@Maxdamantus
Copy link

My expectation is that a keying function would only ever be called once for any given value, and the result memoized.

If that's taken literally, it means it would need to remember multiple original key values, eg:

const map = new Map(k => {
    console.log("called");
    return JSON.stringify(k);
});
const k0 = { foo: 42 };
const k1 = { foo: 42 };
map.set(k0, "hello"); // "called"
map.set(k1, "world"); // "called"
map.get(k0); // ??
map.get(k1); // ??

Either both distinct key objects (k0 and k1) would need to be stored in the map, or at least one of the get operations will need to call the keying function again.

The former sounds very weird to me, since repeated sets of the "same" key will grow the map. Perhaps the implementation can mitigate this by having an implicit weak map associated (so temporary objects at least are not maintained), but this seems like a lot of undesirable overhead.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

yes, i'd expect k0 and k1 to both be stored somewhere, mapping to the json stringification of the key. Object keys would indeed need to be held weakly in the "somewhere".

@sethfowler
Copy link

I'm just a JS developer, and not part of the standards body, so this is just my two cents, but my preference would be to either retain the value semantics or abandon the entire proposal.

Records and tuples with value semantics would be transformative. As I'm sure everyone posting here already knows, many popular libraries and frameworks in the JavaScript ecosystem (React, Redux, RxJS, etc) are "reactive" in the sense that they involve computing the difference between a new state and a prior state and taking some action based on the changes. Any programmer who works professionally with these libraries always needs to be aware of the reference semantics of objects and arrays; carelessly replacing one object or array with another, identical object or array can trigger a large amount of unnecessary downstream work or even an infinite loop. Records and tuples with value semantics would make it much easier to write correct, performant code in these situations.

Record and tuples without value semantics, by contrast, don't really help with this problem at all. It doesn't seem to me that they would offer much value in terms of solving my day-to-day problems programming in the JS ecosystem. Since there's a limited amount of nice syntax available in the language, if value semantics are not on offer, my preference would be to save the syntax for something more impactful.

@bakkot
Copy link
Contributor

bakkot commented Apr 4, 2024

That wouldn't be a breaking change, at least not the way those words are usually used. Do you have a link to the discussion where browsers refused a "canBeHeldWeakly" predicate? I'd hope that if we went this route a such a predicate would be palatable.

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

The reason they rejected it, as I understand, is because they didn't want the answer such a predicate gives, for a given value, to change over time. The counterargument I and others offered was that that was the entire benefit of such a predicate - that you didn't have to look at the object, you just would branch on the predicate, and so it would be safe to migrate all current checks to use this predicate since it would always do what's expected even if the set of things changed.

I can't locate the specific point in the notes, but I believe @syg was the more vocal folks on the other side of that discussion?

@bakkot
Copy link
Contributor

bakkot commented Apr 4, 2024

The reason they rejected it, as I understand, is because they didn't want the answer such a predicate gives, for a given value, to change over time.

Ah, sure. But now that we have symbols as WeakMap keys, it wouldn't change over time - there's no more values for which the answer to that question could plausibly go from "false" to "true", and of course you can't change the answer from "true" to "false" for anything without that actually being a breaking change (because people could have been using those values in a WeakMap). So if that was the only objection I don't see a reason not to add the predicate now, especially if we take the "tuples are objects but can't be held weakly" route.

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

Because current code will already exist that says:

if (x && typeof x === 'object') {
  weakset.add(x);
}

and rely on it not throwing for any x.

@bakkot
Copy link
Contributor

bakkot commented Apr 4, 2024

That code will not break unless someone passes one of the new values in. That's not a breaking change any more than adding a new kind of primitive would be - in a very similar vein, lots of code assumes that they have an exhaustive switch over possible typeof values.

"Old code will break if you give it a new kind of value" isn't something we regard as a breaking change.

@syg
Copy link

syg commented Apr 4, 2024

Right, the compat requirement is that the predicate does not change its return value for any particular value over time. Adding new kinds of values means you have to decide on what the new return value ought to be for the predicate, which isn't the same thing as changing the result for existing values.

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

Fair enough. I still think that making the description of "what can be weakly held" even more complex, absent a predicate, is not a positive change in the language.

@bakkot
Copy link
Contributor

bakkot commented Apr 8, 2024

WeakSets and WeakMaps do guarantee that values are held as long as the key is reachable, so no, it can't just do nothing. But throwing is fine.

@travislwatson
Copy link

If we wanted to avoid impacting performance of existing applications, couldn't the performance penalty of equality be paid when #[] and #{} are used, rather than waiting for them to be compared? The constructor could first look for existing reference before allocating new memory. That could then fall back to standard reference equality. That seems to be what the userland offerings do.

@acutmore
Copy link
Collaborator

couldn't the performance penalty of equality be paid when #[] and #{} are used, rather than waiting for them to be compared

Yes, this approach is referred to as "interning" and is how the polyfill works. There are downsides of this approach. -0 can't be both stored in the structure and also equal to 0, so one of those would have to give. Also this makes the values more expensive in cases where the values are not being compared, if they are only being used for the immutability. There was also concern from implementors that this approach would be difficult to optimize the garbage collector to handle as it could cause frequent invalidation of "the generational hypothesis" that most objects are short lived.

@travislwatson
Copy link

There are downsides of this approach. -0 can't be both stored in the structure and also equal to 0, so one of those would have to give.

Isn't that just a limitation of having to use Weak{Map,Set,Ref} for userland/polyfill?

this makes the values more expensive in cases where the values are not being compared, if they are only being used for the immutability.

Are we certain on this? If the lookup is not performed, then new objects have to be created that may not be needed, both unnecessary memory allocations and garbage collections.

@AprilArcus
Copy link
Author

AprilArcus commented May 10, 2024

-0 can't be both stored in the structure and also equal to 0, so one of those would have to give. Also this makes the values more expensive in cases where the values are not being compared, if they are only being used for the immutability.

these seem like reasonable and easily documented trade-offs to me.

@Maxdamantus
Copy link

I agree about interning being generally undesirable performance-wise (otherwise engines would be interning strings and bigints, and they demonstrably don't: #292 (comment)), but it would be nice if someone can explain the actual performance reasons as to why another type of content-compared value can't be added.

Was this not a concern when adding bigint (another type of arbitrary-sized value that in general isn't interned)? Does this mean that there will never be another similar type (eg, decimal)?

Is the performance issue contingent on typeof #[] === "object", or does it apply even if typeof #[] === "tuple"? In the former case I imagine someone would need to look at the proportion of comparisons where a similar expression condition has been applied (I suspect this is very low), and in the latter case I think some explanation would be needed as to why R/T values are fundamentally different to bigint or string.

@ljharb
Copy link
Member

ljharb commented May 10, 2024

@Maxdamantus yes, my understanding is that the lack of adoption of bigint has made browsers feel that the effort isn't worth it, which means that (unless that changes) there will never again be another new primitive type, which precludes both R&T and Decimal as primitives.

@Maxdamantus
Copy link

@ljharb That doesn't really answer the question though. Was there a significant performance impact from adding bigint to the language? All major browsers currently support it. And if there was an impact, will it be increased by R/T?

As for usefulness, I suspect the addition of R/T values is more important than bigint values was, since emulation of compound comparisons is fairly common (confer React.useMemo, or the discussions above about "keying functions" in Map).

I suspect another issue with bigint usability that it was harder to polyfill (maybe this doesn't apply nowadays, but I feel like that probably limited adoption when people should have been excited to use it), whereas R/T can already be polyfilled (with the minor caveat of normalising -0 to 0).

@ljharb
Copy link
Member

ljharb commented May 10, 2024

I'm not clear on how much of the large impact was performance and how much was implementation difficulty, but the impact was large, and for R&T it would again be, and they aren't willing to repeat it.

I 100% agree that R&T (and Decimal) are each much more useful than BigInt, and would have gotten way more adoption, and that it's turned out to be a shame that BigInt went first - but this is where we are.

@errorx666
Copy link

I don't really have a use-case that needs BigInts, but R&T helps with the ubiquitous problem that is state management.

I also don't feel that implementation difficulty should be a driving factor in steering the language design. My primary concern is always what's best for the language itself.

@jayaddison
Copy link

@Maxdamantus I'm just chiming in to share my sense of alignment (that might change if I learn more, I'll admit..) that non-object-based implementations could provide significant benefits.

To respond nitpickily to one paragraph of yours:

Is the performance issue contingent on typeof #[] === "object", or does it apply even if typeof #[] === "tuple"? In the former case I imagine someone would need to look at the proportion of comparisons where a similar expression condition has been applied (I suspect this is very low), and in the latter case I think some explanation would be needed as to why R/T values are fundamentally different to bigint or string.

My understanding from the current spec (refs: README.md mention, expressions.html spec) is that compliant implementations would return typeof results for Record and Tuple that are distinct from Object -- strictly speaking I suppose an implementation could still use an object-like representation internally, though (at the cost of a somewhat-dynamic implementation of typeof comparison, I expect).

@Maxdamantus
Copy link

Maxdamantus commented Nov 7, 2024

#387 (comment)

compliant implementations would return typeof results for Record and Tuple that are distinct from Object

Right, but it was pointed out that that could cause security issues in user implementations of membranes that assume that, eg, typeof x !== "object" && typeof x !== "function" implies that x contains no object (more specifically, that having direct access to x doesn't provide access to objects such as Function and Object.prototype, which could be called/manipulated to cause cross-realm effects).

Hopefully I'm not creating too many levels of indirection, but here's a post with links to earlier discussions about this: (#390 (comment)), where the outer post has some follow up discussion about this topic. #292 probably has more of it, but there's more in general to read through.

I'll also copy this thought I've expressed somewhere else:

Personally, I'm still not 100% convinced that the assumptions made from the above condition are important enough to cause such a change to the proposal, but I don't see the value of typeof #[] as being a usability issue. Code that needs to check the types of things is a bit smelly to me, but in cases where you do need to check the type, typeof v === "tuple" and Tuple.isTuple(v) both seem usable to me, so just making typeof #[] === "object" should be fine and it solves this hypothetical issue. This is similar to array objects, which are also fundamentally special (Object.create(Array.prototype) is not an array object) and are detected using Array.isArray(v).

#387 (comment)

strictly speaking I suppose an implementation could still use an object-like representation internally, though (at the cost of a somewhat-dynamic implementation of typeof comparison, I expect).

I think of it the other way around. I don't think the value of typeof x or Object.is(x, Object(x)) should have much impact on how x is implemented, so the implementation will likely treat it the way it treats "primitives" in all other cases.

@jayaddison
Copy link

#387 (comment)

compliant implementations would return typeof results for Record and Tuple that are distinct from Object

Right, but it was pointed out that that could cause security issues in user implementations of membranes that assume that, eg, typeof x !== "object" && typeof x !== "function" implies that x contains no object (more specifically, that having direct access to x doesn't provide access to objects such as Function and Object.prototype, which could be called/manipulated to cause cross-realm effects).

Ok, yep - those are valid concerns. As I understand it currently, Function values are not permitted within either Tuple or Record per the spec (ref; [[Value]] is any primitive value) -- and neither Tuple nor Record provides access to a prototype (ref; unlike the other primitives, there is no prototype to lookup the appropriate methods on).

So, I think that the safety level provided by !(typeof x in {'object': true, 'function': true}) is unchanged by this proposal (rewriting it for no other reason than to try to demonstrate that I understand the general intention of it) -- but that's probably worth confirming, as I'm fairly new to this.

FWIW: for the use case that gathered my interest in this proposal, the ability to reliably (deterministically) round-trip a datastructure to a language-agnostic representation, that has neither a prototype nor mutability when the contents are loaded, would be a benefit.

@acutmore
Copy link
Collaborator

acutmore commented Nov 7, 2024

strictly speaking I suppose an implementation could still use an object-like representation internally

This is one of the challenges. It would be beneficial for engines to re-use the existing machinery they have for objects, e.g. hidden classes and inline caches that make property access fast. However the current specification for R&T makes them different enough from objects that it is a non-trivial task to implement them in terms of objects.

One of the other concerns is that a common stated use case for R&T from the community is to increase performance but, similar to strings, as R&T grow in size (imagine a very long tuple) comparing two of them is a linear operation. I.e. it might be the case that switching an entire app over to R&T would actually make it slower. Instead they would need to be used only when appropriate and it's not entirely clear how to clearly state the guidance on for appropriate use as it's very much a "it depends" situation.

One of the alternative designs that reduces the largest amount of concern is dropping the === equality. However that is also the part that allows their equality to "fit" with the rest of the language e.g. Array.prototype.includes and Map.prototype.has so it certainly has downsides and is why it has been challenging to make meaningful progress.

@jayaddison
Copy link

One of the other concerns is that a common stated use case for R&T from the community is to increase performance but, similar to strings, as R&T grow in size (imagine a very long tuple) comparing two of them is a linear operation. I.e. it might be the case that switching an entire app over to R&T would actually make it slower. Instead they would need to be used only when appropriate and it's not entirely clear how to clearly state the guidance on for appropriate use as it's very much a "it depends" situation.

That makes sense. I would probably suggest not following the possibilities to optimize those O(n) situations (such as requiring hash-equality or interning) initially -- but to ensure that they're possible in future and/or on a per-implementation basis. Reasonining: sufficiently-motivated folks will benchmark/profile their code, and that'll help to inform whether R&T provide a benefit despite the naive-comparions -- and also in cases where people question why an anticipated benefit hasn't materialized, they can trace that to their engine and request/contribute improvements there.

One of the alternative designs that reduces the largest amount of concern is dropping the === equality. However that is also the part that allows their equality to "fit" with the rest of the language e.g. Array.prototype.includes and Map.prototype.has so it certainly has downsides and is why it has been challenging to make meaningful progress.

From reading a number of requests for something that could replace Map keys and/or stringify operations currently used as a workaround -- initially I'd say that there is significant demand for the === operator there. I don't personally have a need for it in my use-case. Strictly speaking I suppose it could also be deferred to some future version -- but I think it could create situations where both engine implementors and also developers write code that -- intentionally or otherwise -- expects that comparison not to be available (something like an instance of Hyrum's law -- if a particular-enough behaviour is implemented, it becomes relied upon, regardless of whether it is robustly-specified or not).

@acutmore
Copy link
Collaborator

acutmore commented Nov 8, 2024

initially I'd say that there is significant demand for the === operator there. I don't personally have a need for it in my use-case

Would you be able to describe your use case? It would be much appreciated.

Reasonining: sufficiently-motivated folks will benchmark/profile their code, and that'll help to inform whether R&T provide a benefit despite the naive-comparions -- and also in cases where people question why an anticipated benefit hasn't materialized, they can trace that to their engine and request/contribute improvements there.

The counter to that though is: if developers benchmark and find they are too slow so don't adopt R&T then the engines would have done a lot of work for little gain. And if developers benchmark and find they are slow and respond by raising a bug this doesn't change the fact that implementations have already warned that these will be hard to optimise and that may not be possible.

In other words if a majority of developers expect them to be fast, then we need to ensure the design is one that we are confident can be fast without significantly re-writing of the existing engines.

This is the general catch-22 of language design. It's hard to predict how a new feature will be used at scale across large projects because large projects are unlikely to adopt an unstable language feature. So we have to make educated guesses based on smaller samples, or looking at other languages that already have similar features. Other languages tend to do equality by an equals(other) method dispatch, not interning.

@jayaddison
Copy link

initially I'd say that there is significant demand for the === operator there. I don't personally have a need for it in my use-case

Would you be able to describe your use case? It would be much appreciated.

Certainly; I've made a few contributions to the client-side HTML/JS search implementation in the Sphinx documentation generator, and that's where this originates. When Sphinx builds a documentation set to HTML, the entire output is a static, self-contained website, and when using client-side search, the search index is represented as a JavaScript file loaded when the search page is opened. Most of the content of that index is a JSON blob, but there is a small JS function call around it.

Because the use case involves user-controlled input (the search query), and also because the search index contents itself should not change between or during page viewing, I'd like to sanitise the data that the code operates on. Sanitise is a term that I've made up there (vaguely derived from home-brewing experience) - it involves freezing the objects in the index and removing their prototype references.

To some extent I'd be glad for any runtime/interpreter optimizations based on an immutable search index - but that's not my main objective; primarily I'd like to improve the assurance that the search index content retains its integrity and does not provide access to other runtime code.

R&T would, I think, allow much of this in an elegant and effective way, and JSON.parseImmutable would allow for some further refinement. Re-loading the search index from Python code might gain some short-term complexity (because the file format representation would no longer be a subset of JSON) - but I think that's manageable and acceptable as a tradeoff.

@jayaddison
Copy link

A self-correction:

From reading a number of requests for something that could replace Map keys and/or stringify operations currently used as a workaround -- initially I'd say that there is significant demand for the === operator there.

From reading the ecma262 spec for Map.get (as of 2024-11-08) - I now understand that the Map key operations don't rely on === itself, but instead a key normalization and SameValue check (as specified to include BigInt comparisons).

@gregmartyn
Copy link

I wanted to note that there can be a performance improvement even if R&T's === operator is O(n) from the savings outside of that comparison. E.g. it may avoid useMemo allocations & lookups entirely. It may also be beneficial for the conceptual simplicity alone. Also I think it's expected that things that start off slow can improve over time.

@Maxdamantus
Copy link

Maxdamantus commented Nov 8, 2024

I wanted to note that there can be a performance improvement even if R&T's === operator is O(n) from the savings outside of that comparison.

To add to this point, I think it's worth thinking about how much worse (in terms of both performance and DX) the language would be if we didn't have string values with content-based equality.

String equality is also O(n), but operations such as using strings as Map keys or searching for strings using Array.prototype.indexOf or Array.prototype.includes are cheaper in performance and code complexity compared to if we had to implement that ourselves, even if the implementation only involved calling a hypothetical .equals method (since we'd have to do it over and over again for every concept that involves equality).

#387 (comment)

Other languages tend to do equality by an equals(other) method dispatch, not interning.

Tying this to my point above, this is already more-or-less how JS works for strings, where comparison involves an O(n) operation, and the O(n) operation is the one used throughout the standard library. In the languages that distinguish between == and .equals, the former will likely be an O(1) comparison (even for variable-length data such as strings or unbounded integers), and the latter is more like JS equality, potentially performing O(n) comparisons after dereferencing pointers.

I think the main difference between JS and other languages is that the non-O(1) comparison can be overridden, but I'm not sure this is a good idea, since if the language simply distinguishes between pointers to mutable data (eg, {}, []) and pointers to immutable data (eg, #{}, #[], "foo", 99999999999999999999999n), the language can use an "obvious" implementation for equality, where the former is a pointer comparison and the latter is a content comparison after dereferencing the pointers. This allows the language to guarantee stability of the equality operation.

In the past in Java, there hasn't been a very clear distinction between mutable and immutable things, and people would tend to make their IDE generate the "obvious" implementation for .equals and .hashCode in classes in the latter category, but now the language includes "record classes" that basically do that for you, so in general you shouldn't need to provide custom implementations anymore.

Just for clarity, when you say "interning", do you mean eager deduplication of allocations containing the same content, or do you mean where the language doesn't provide a way to distinguish between allocations of the same content? Usually "interning" means the former, but JS only does the latter.

I've pointed out in other posts (eg, #292 (comment)) that string implementations in JS are not based on interning, and I wouldn't expect R/T values to be either.

@acutmore
Copy link
Collaborator

acutmore commented Nov 9, 2024

I now understand that the Map key operations don't rely on === itself, but instead a key normalization and SameValue check (as specified to include BigInt comparisons).

While Map,Set, and Array.prototype.includes are not built on ===. StrictlyEqual ===, SameValueZero [].includes, and SameValue Object.is all give the same result for all types apart from number. Ignoring NaN and -0 all three predicates are the same. This consistency would be broken if R&T were treated equal in Map but not ===. Some may see that breaking this consistency is worth the value add, others disagree and would prefer to retain the consistency. One way to not break the existing consistency would be to introduce a new type of equality into the language which has new semantics, but introducing a 5th type of equality (LooselyEqual being the 4th) to the language does push up the complexity bar. It also means it wouldn't automatically apply to existing packages, such as React. They would need to incorporate this new equality which they may not want to do.

@Maxdamantus - lots of great points here

I think it's worth thinking about how much worse (in terms of both performance and DX) the language would be if we didn't have string values with content-based equality.

I completely agree that strings being immutable and === equal in JavaScript is great. While they are also O(n) to compare, they are a simpler data structure than records so in real terms I think the performance of comparing records will be slower than comparing strings. But I do also agree than the comparison will be faster when implemented natively and not in user land. For one thing userland JavaScript can't access the cached hash value of a string - it needs to re-compute it on each request. That isn't to say that a native implementation would definitely live up to developer expectations of how fast R&T should be. So again there are two equally valid sides to this.

It also comes back to the 'ability to re-use existing object machinery'. String optimisations such as ropes could also apply to tuples, making it more efficient to keep concatting new values to the end. However this means that tuples would have their own new representation - instead of being able to leverage the existing optimisations that exist for arrays.

Just for clarity, when you say "interning", do you mean eager deduplication

Yes, eager deduplication so equality becomes pointer equality.

I've pointed out in other posts (eg, #292 (comment)) that string implementations in JS are not based on interning, and I wouldn't expect R/T values to be either.

Yep, the way this proposal was designed and was being implemented in SpiderMonkey was by not using interning. However this revealed the level of implementation complexity this approach had. While equality is defined in only a few places in the spec, equality in implementations is spread out and inlined into different places as they have evolved and added multi-tier JITs. It's not technically impossible but is a large undertaking, and the feedback we received is that it is being considered too large an undertaking. Interning would simplify things, but does bring downsides such as not being lazy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests