Should we continue pursuing the stable spec idea? #557

gregsdennis · 2023-11-30T22:00:13Z

gregsdennis
Nov 30, 2023
Maintainer

For the past year, we've followed along with an idea to make a "stable"/"living" specification. In that time, we've identified multiple aspects of managing the specification that had to change, including some changes to the specification itself and how JSON Schema works fundamentally. Below is some of the fallout of making this decision:

disallow unknown keywords (people hated this, probably wasn't a good idea to publicize it the way we did)
- had to backtrack to allow unknown keywords with prefix x- (how to do this was divisive)
- optional vocabularies become meaningless, so we'd need to change how $vocabulary works
remove "unstable" features from the main spec
- add a feature introduction process (divisive)
add a feature deprecation process
how to identify a changing meta-schema?
- there's no single source of truth for the meta-schema since it comes down to what the implementation supports
how can implementations deal with a changing feature set?
- how can an implementation convey what it supports?

After a year of trying to put everything in place to transition to a stable spec, the only thing I've witnessed is how it creates cascading complications where providing a solution for one just leads to identifying another. It's very likely there will be more fundamental changes as we continue down this road.

Publishing disjoint specifications (as we have until now) doesn't need any of this. Furthermore, with the creation of alterschema, there's less of a need to have a stable spec since converting between versions is as simple as running a tool. Updating to later versions is less of a hassle.

If we abandon the stable spec idea, none of the above is in question. Additionally, we can still:

make an effort not to break things between publications
identify features as experimental or unfinished
commit to annual releases
provide open/closed meta-schemas

My conclusion is that we should return to iterative releases, abandoning the stable spec concept. It was a good exercise to explore the idea, and there are some things we can take away from it, but I don't think that this is the right direction for the project in the long term.

jdesrosiers · 2023-12-01T20:52:10Z

jdesrosiers
Dec 1, 2023
Maintainer

I think introducing a stable spec is one of the most valuable things we can do for our community and for us as both implemeters and as people who provide support for the community. Yes, it's going to require some changes and change is never easy, but the benefit in the long run I think is huge.

Our reasons for wanting to have a stable spec hasn't changed. The proliferation of versions of JSON Schema is a problem and I think it would be a mistake to continue on that path. It's confusing for users and a burden on implementers as well as schema maintatiners. I believe this is the biggest complaint we have about the health of JSON Schema as a whole. People are tired of the churn and are sticking with older versions like draft-07 because they feel it's stable (there are other reasons people are choosing draft-07, but that's definitely one of them). With a stable version of JSON Schema, people no longer have to worry about updating their legacy schemas or the libraries they depend on dropping support for the version their using. Those schemas will continue to work the way they always have with no effort on their part. Sure, they could use alterschema to update their schemas, but that requires that they make the effort to do that upgrade and many will consider it risky to change their schemas even if the process is automated. Alterschema isn't a solution, it's a workaround for the problem. A stable spec solves the problem so that the workaround is no longer necessary.

disallow unknown keywords (people hated this, probably wasn't a good idea to publicize it the way we did)

had to backtrack to allow unknown keywords with prefix x- (how to do this was divisive)

optional vocabularies become meaningless, so we'd need to change how $vocabulary works

This is something we needed to do regardless of whether we have a stable spec. We addressed this problem in the context of solving for forward compatibility, but it also addresses one of the biggest problems and biggest complains we have. It allows for detection of typos in schema keywords as well as many structural errors we see in schemas.

That said, I don't think it's fair to say that people hated this change. The feedback from hacker news was almost entirely uninformed or unrelated to the issue. The few productive comments did lead to us adding the x- prefix, but I don't see that result as a backtrack. It was never the purpose of the change to disallow custom annotations. The intention was that they the vocabulary system would address that need. Community feedback convinced us that the vocabulary system wasn't usable enough for this purpose and we needed to introduce something else to address this case. I see x- as a result of problems with vocabulary system usability, not as a result of problems with disallowing unknown keywords.

As for issues this change creates for the vocabulary system, the vocabulary system was always experimental and has many issues that need to be resolved regardless of what we do with the rest of the spec, so I'm not worried about any issues this creates with the vocabulary system because it needs a full rework anyway.

the only thing I've witnessed is how it creates cascading complications where providing a solution for one just leads to identifying another.

I wouldn't describe what's happened as "cascading complications". I think the only thing we've discussed that wasn't anticipated was optional vocabularies becoming meaningless and I'm not concerned about that for now. There are certainly many things we need to work out, but I don't think we're seeing unexpected problems arise from the decisions we're making. I feel like we're working through an expected set of changes one at a time. I don't perceive that set expanding as we go, although it wouldn't be unexpected to see some of that happen.

Also, I see all of the changes we're discussing as positive changes, not compromises, concessions, or complications necessary to get us to a stable spec. They make JSON Schema less complicated for users, implementers, and schema maintainers alike. I think it feels complicated because things are changing, but I think these changes will all serve to reduce complication as a whole for everyone.

how to do this was divisive

Agreeing on how to move to a stable spec has certainly been more difficult than I expected, but I don't think that's a sign we're on the wrong track. I think it's just that we've been doing a poor job working together in a constructive and productive way. I'd hate to see us abandon this initiative that would be so beneficial for JSON Schema because we're too dysfunctional to get it done and it's easier to stick with the status quo.

4 replies

gregsdennis Dec 12, 2023
Maintainer Author

I think introducing a stable spec is one of the most valuable things we can do for our community and for us as both implemeters and as people who provide support for the community.

I don't disagree that stability will benefit the community greatly. However, I don't think that the proposed approach to achieving stability, i.e. a living spec, is the right solution. This is evidenced by the fact that we've been working full time on this process for a year and haven't made significant progress toward releasing a new version.

If two or more people full-time on a project for a year can't publish something in accordance with a particular vision, then there's likely a problem with the vision.

There are other ways that we can provide stability, and I think it's time that we moved on from the living spec idea and explored those other options.

The proliferation of versions of JSON Schema is a problem and I think it would be a mistake to continue on that path. It's confusing for users and a burden on implementers as well as schema maintatiners.

The problem isn't proliferation of versions; it's the pressure to support the features of older versions, which I brought up 18 months ago. This vision of a stable spec isn't going to fix that; in fact, it will likely make things worse.

Suppose in our stable spec, we have a keyword, foo. After some time, we decide that it's not doing what we want or it could be better, so we make improvements, but we can't just change it because we need backward compatibility because we're stable, so we have to release the improvement as a new keyword, foo2. Now, as an implementor, I have to maintain both foo and foo2... FOREVER.

But if we're still making iterative releases, then we're perfectly fine updating foo between releases. As an implementor, I'm only required to support the older version of foo for as long as I choose to support that release.

Regarding stability, we strive to make sure that it's a non-breaking change, but in the case we can't, we provide tooling that help users migrate, and we announce very loudly that the change is coming (possibly with a policy that such changes must not be made in certain proximity to the release that contains them).

As a more concrete example, JsonSchema.Net currently supports drafts 6 through 2020-12. $ref's behavior changed in there, and because I choose to support that range of releases, I have to support both behaviors, and in the proper context. If I drop support for 6 & 7, my life becomes easier because I can remove the code for the old support (the code that ignores sibling keywords). I never have to maintain that code again. If one of my users decides to remain with draft 6/7, they can; they just need to stay with an older version of my lib that supports them, and that's completely fine!

A fully stable spec under this vision will increase the burden on maintainers. This approach leads to bloat, which generally has a fall-on effect on performance in load times and memory usage. I don't think users would be pleased that the spec is requiring implementations to do this.

C# and .Net (which are versioned separately, albeit in step now) don't have a problem with independent releases. They have a "not unless completely necessary" policy against breaking changes, but they do occasionally happen. When they do, they're quite overt about it. There are many examples of stable ecosystems that don't involve "forever compatibility," and maybe we should follow.

With a stable version of JSON Schema, people no longer have to worry about updating their legacy schemas or the libraries they depend on dropping support for the version their using. Those schemas will continue to work the way they always have with no effort on their part.

This presumes that a user always wants the latest version of a library. In my experience, if there's no reason to update, people tend not to. (If it ain't broke...) Pressure to update comes from:

desire to use new features
required dependencies updating (e.g. other systems with which you require interoperability)

Updating to a new version (even patch versions) incurs risk, and most developers (especially in the enterprise sector) will avoid the risk when they can.

Also, an implementation should be identifying breaking changes (e.g. dropping support for an older JSON Schema version) in either their version number (if using SemVer) or in their docs, or both. It's then up to the developer to determine whether they want to do the work to update to the new version. This is just everyday software development.

[Disallowing unknown keywords] is something we needed to do regardless of whether we have a stable spec... It allows for detection of typos in schema keywords as well as many structural errors we see in schemas.

My problem here isn't so much that we disallowed unknown keywords. My problem is that this vision of a stable spec requires us to disallow unknown keywords. It removes the option.

The solutions we've historically used to respond to this specific complaint are:

We need better tooling to catch these kinds of mistakes (e.g. a linter), which we've not built because.... reasons.
You can build a custom meta-schema yourself that disallows custom keywords... it's easy!

Well, we can do the second one very easily by just publishing such a meta-schema ourselves in addition to the open one. But that doesn't solve the full problem because maybe people want to use custom keywords as annotations, but they also want to catch things like proprties (missing e) caught.

We don't need to disallow custom keywords to have stability. We need it for this vision's stability.

jdesrosiers Dec 13, 2023
Maintainer

This is evidenced by the fact that we've been working full time on this process for a year and haven't made significant progress toward releasing a new version. If two or more people full-time on a project for a year can't publish something in accordance with a particular vision, then there's likely a problem with the vision.

I really don't see this as evidence of a problem with the approach. I think it's clear that the problem is our inability to agree on details and make compromises when necessary and in a timely fashion. Unless we address that problem, changing approach isn't going to help, we're just going to get stuck arguing details of a different approach.

This vision of a stable spec isn't going to fix that; in fact, it will likely make things worse.

The problems you're bringing up here aren't taking into account the concept of new features going through a maturity process before being declared stable. We'll have to work out the details of that process as we go, but the point is that we no longer just add things that are unproven or experimental. They don't get to be considered stable until they've been used and the issues have been flushed out and resolved. The case of a feature needed to be deprecated and replaced with something very similar should be extremely rare. When they do happen, we should be doing retrospectives to figure out why it happened and trying to figure out how we can be more careful in the future.

Now, as an implementor, I have to maintain both foo and foo2... FOREVER.

Even in this case, which should be so rare it almost never happens, we decided that deprecation and removal would be allowed in some cases.

I have to support both behaviors, and in the proper context

Dealing with different behaviors in different contexts is a problem that's solved with the new model. Under the model we've been working toward, there would be only one context, the stable dialect (not counting custom dialects).

If one of my users decides to remain with draft 6/7, they can; they just need to stay with an older version of my lib that supports them, and that's completely fine!

There are important reasons not to pin to old versions. Most importantly, you don't get bug fixes and security updates. Yes, there's risk to updating a dependency, but there's also risk to not updating. If you're properly managing the risk of updates through testing, the risk is low and consequences are generally minor. But, if a security vulnerability is discovered on an old unsupported version you're stuck on, you have no choice but to update, but now it's a bigger job because you have to update all cumulative changes you haven't been keeping up with and and you're probably rushing because you need to eliminate the vulnerability asap. Both of those things makes that update much more risking than if you kept up with changes regularly.

We don't need to disallow custom keywords to have stability. We need it for this vision's stability.

I think we need to disallow custom keywords for any vision of stability. At least we'd have to define stability differently than we've already decided. If unknown keywords are allowed, any addition of a new keyword is a breaking change. If we add foo, it could change the validation result of someones schema that uses foo for their own purposes.

gregsdennis Dec 13, 2023
Maintainer Author

I think we need to disallow custom keywords for any vision of stability.

Disallowing unknown keywords isn't necessarily a problem, and any vision of a stable spec can do this. This one requires it, and I'm uncomfortable with not being able to choose whether this is the right direction.

A different vision of a stable spec could allow unknown keywords but warn of possible future collision. Historically, JSON Schema has followed this "warning" approach, allowing users to do what they want so that their individual use case could better be served, and explicitly disallowing unknown keywords feels like we're tying our users' hands.

This isn't a general requirement of any stable spec.

The problems you're bringing up here aren't taking into account the concept of new features going through a maturity process before being declared stable.

A maturity process doesn't solve the problem of keyword churn. A feature can make it through the entire maturity process, be added to the spec, and we could still find a problem with it. And others will still find ways that they wished it worked but doesn't. We then have to create a completely new keyword rather than augment or tweak the existing one in a follow-up release.

There are important reasons not to pin to old versions. Most importantly, you don't get bug fixes and security updates. Yes, there's risk to updating a dependency, but there's also risk to not updating. If you're properly managing the risk of updates through testing, the risk is low and consequences are generally minor. But, if a security vulnerability is discovered on an old unsupported version you're stuck on, you have no choice but to update, but now it's a bigger job because you have to update all cumulative changes you haven't been keeping up with and and you're probably rushing because you need to eliminate the vulnerability asap. Both of those things makes that update much more risking than if you kept up with changes regularly.

This is a dependency management philosophy that you (and undoubtedly others) hold, but not everyone does. The fact that I said "in my experience" proves that. I will not, as a representative of JSON Schema, support pushing people into a particular philosophy. It's incorrect to assume that we have any sway on how developers manage their dependencies. JSON Schema is a tool.

On the maturity process itself

You initially stated that this vision is inspired by how ECMAScript manages adding features, and I can see how that's the case. In particular, the feature introduction process (from here) is:

Idea
Proposal
Draft
Candidate
Finished

Importantly, at the "candidate" stage, browsers implement support for the feature before it's officially added to the next release (yeah, they do releases). This is similar to your requirement that at least two implementations support the feature before it's added to the spec.

But our implementations are not browsers. They're not auto-updating applications that are used by billions of people worldwide, running code written by millions of people worldwide. We don't have that kind of coverage, and TWO implementations supporting a feature isn't even close, especially when you consider that our implementations are completely siloed by programming language.

Historically we don't get feedback on a feature until it's released with the new version of the spec, even when two implementations (typically yours and mine, sometimes Julian's) support that feature ahead of the release. We would need something more like 80-90% of commonly used implementations supporting a feature, and a significant amount of targeted feedback from the community (both users and implementors) on that feature specifically. I just don't see that happening given that our community (aside from a handful of individuals) is largely silent.

This strategy works for ECMAScript because of a high user-to-implementation ratio (billions to 6*?), and that browsers basically force their users to stay updated to the latest version. Generously, we have moderate use of a LOT of implementations with little to no feedback, and as a result, I think we need a different strategy.

* Chrome, Firefox, Safari, Edge, Opera (maybe?), ... I'm struggling to think of a sixth that people would regularly use. The point is that it's not over 100.

The problem with a constant-`$id` meta-schema

Having a constant-$id meta-schema at first appears like it implies that a user can write a schema and have that schema be valid forever. But that's not any different from versioned meta-schemas: a schema that uses a versioned meta-schema will also be valid forever. A version defines a set of features. If you want newer features, you need to use a newer version. (This is how literally every piece of software works.)

People aren't complaining that they have to change http://json-schema.org/draft-07/schema# to https://json-schema.org/draft/2020-12/schema. They're complaining that they have to change items:[] to prefixItems:[] and additionalItems:{} to items:{}, and that it's super confusing, and why the hell would we make this change. The complaints of stability are that we keep changing function, not that they have to change a $schema value. This constant-$id meta-schema idea is trying to fix a problem that doesn't exist.

More importantly, it doesn't actually fix the problem that we keep changing things. We'd still be able to make the items change above, except that we wouldn't be able to re-use items. It's still a frustrating functional change for users, and it's still frustrating to implementors who now have to support both until items and additionalItems can be deprecated. I recognize that versioned meta-schemas don't solve this problem either, but they're a better dev experience than a constant-$id meta-schema.

A constant-$id meta-schema will inevitably lead to stale implementations which say they support the stable JSON Schema (by recognizing the $id) but ultimately end up not supporting newer keywords. This will only add to user frustration because not finding out until runtime that a library doesn't support your schema is a HORRIBLE dev experience. With versioned meta-schemas, it's easy at dev time to compare the $schema value that you're using to the support of the library.

Above I stated, "A different vision of a stable spec could allow unknown keywords but warn of possible future collision." Versioned meta-schemas will uphold this because a developer will be able to use their "unknown keyword" for as long as they want, even if a future version defines it. They'll only be up against a wall if and when they decide to update.

jdesrosiers Dec 15, 2023
Maintainer

A different vision of a stable spec could allow unknown keywords but warn of possible future collision. Historically, JSON Schema has followed this "warning" approach, allowing users to do what they want so that their individual use case could better be served, and explicitly disallowing unknown keywords feels like we're tying our users' hands.

I would have a hard time calling that a vision of stable spec. If there's a possibility of future collision, it's not stable. It's a different vision we could follow, but it would be something other than a stable spec approach. I think that's a fundamentally different thing and doesn't address the problems we set out to solve.

This is a dependency management philosophy that you (and undoubtedly others) hold, but not everyone does. The fact that I said "in my experience" proves that. I will not, as a representative of JSON Schema, support pushing people into a particular philosophy. It's incorrect to assume that we have any sway on how developers manage their dependencies. JSON Schema is a tool.

The approach we've working toward doesn't require people keep their dependencies up-to-date, it allows for it. The status quo doesn't give you a choice. If you're using a version that your implementation drops support for, you have not choice but to either stick with an unsupported older version or make changes to all your schemas in order to update, which many will consider very risky, even with alterschema.

A maturity process doesn't solve the problem of keyword churn.

The entire point of a maturity process is to address the problem of keyword churn. Is it going to be perfect? No. It's certainly possible that we'll miss things, but the whole point of having a maturity process is to dramatically reduce or eliminate cases where we feel like we need to make changes to a stable keyword. A maturity process won't necessarily fully solve the problem of keyword churn, but it should make it so rare that it's not a concern.

Historically we don't get feedback on a feature until it's released with the new version of the spec

Yes, but we're introducing the maturity process specifically to address that problem. The maturity process isn't intended to be something that we do internally before releasing a stable feature. It's intended to allow us to release something without having to commit to it being stable in order to elicit the feedback we need to be confident we got it right. Releasing the feature and getting people to implement/use before it's stable is the cornerstone of this process. I agree with your concerns that getting people to implement features that are labeled as unstable and not required will be a challenge. That's why I argued for releasing unstable extension features directly in the spec and making them required to implement. I've agreed to not doing that yet since you didn't think it would be a problem at the time, but I wouldn't be surprised if we decide to revisit that decision in the future to more aggressively encourage implementations.

This constant-$id meta-schema idea is trying to fix a problem that doesn't exist.

The problem the stable dialect URI is solving is supporting schemas written for a range of releases. It allows any schema, no matter how old to be consumed by modern tooling without any modification. Consider IDE tooling that provides validation and code completion using JSON Schemas. That tooling can be written with only the latest release in mind, yet be able to support any compatible release without any special consideration for what version the schema was originally written for. Evergreen schemas are a much better developer experience than having your tooling randomly break because the language server dropped support of the release you were using, especially when it's just the $schema declaration that broke and the rest of the schema works fine. It may not be hard to fix, but it's an unnecessary distraction and will cause anxiety because why would it break if nothing actually changed.

A constant-$id meta-schema will inevitably lead to stale implementations which say they support the stable JSON Schema (by recognizing the $id) but ultimately end up not supporting newer keywords.

We would still have yearly releases that implementations would have to claim support for. It would be no different that what we have now.

This will only add to user frustration because not finding out until runtime that a library doesn't support your schema is a HORRIBLE dev experience. With versioned meta-schemas, it's easy at dev time to compare the $schema value that you're using to the support of the library.

The plans is to still release meta-schemas for each release. People can use those meta-schemas to check at dev time whether their schemas are constrained to a particular release just like they always have.

mwadams · 2023-12-08T10:05:15Z

mwadams
Dec 8, 2023
Collaborator

It's good to see both of these perspectives so clearly (and reasonably!) articulated.

Without coming to a complete conclusion, I would like to draw out three separate things:

Making sure we understand what declaring a "stable/non-draft" spec means internally from a process perspective
(I think the work there is excellent, and I feel we're pretty much there; YMMV)
Making sure we understand what declaring a "stable/non-draft" spec means externally from a marketing perspective
(I think it is essential or we will never really move on from Draft7 as per Jason's point; but Greg is right - alterschema and "guaranteed" ways of migrating forward from version to version is a very important part of that story.)
The much thornier question of which features need to be in/out/generally shaken about
(Discussions are clearly ongoing!)

One thing that is challenging is that we will never have the perfect spec, and we are really deciding how many "known issues" we are going to "ship with", including things which we know are going to get a total overhaul. That's all fine by me - but how that is messaged (to the marketing point) is critical to containing the future (inevitable) HN pile-on.

0 replies

jviotti · 2023-12-09T14:43:18Z

jviotti
Dec 9, 2023
Collaborator

I think this is a complex issue. The "stable" and "living" approaches both have clear pros & cons. While I certainly don't represent the entirety of the JSON Schema community (which has incredibly wide set of use cases), the outcomes I look forward to are:

I want to see the JSON Schema fragmentation problem go away. As @jdesrosiers pointed out, I really think this one the biggest problems with JSON Schema right now. It is confusing to users, most implementations (beyond validators from the core team) have extremely different and disjoint version support, and consumers (Postman being a prime example) can often not even upgrade to latest JSON Schema versions even if they want to, because they might be relying on JSON Schema tooling that only supports certain older versions of the specification.
I want to continue seeing the fundamental improvements that have recently landed or have been discussed, as they all fix important on-going issues with JSON Schema. Various things that @gregsdennis mentioned, like disallowing unknown keywords, and improvements to vocabularies are really great. Of course, they might initially annoy people (like HN) as most users just dislike change (and might not even truly understand the why behind it), but they are crucial to keep JSON Schema evolving into the best schema language.

I think the problem that @gregsdennis is trying to point out is that those two above points might conflict. On one side, JSON Schema could really use some stability, but at the same time, stability might hinder the ability to innovate and shake things up on a positive way. If JSON Schema stops making major changes for the sake of backwards compatibility, then it might get "stuck" with current decisions instead of continuing to push forward. It'd mean often accepting the current status quo even if we all think it can be improved, which doesn't really make sense either.

I don't have any specific proposals in mind, but I do wonder if there is a third option, that is somewhat hybrid between the two, and that would ensure a high degree of stability while still welcoming breaking changes when they make sense.

Take HTTP as an example. There is rarely a new version of the HTTP protocol going out. But when there is, they are not afraid from truly shaking up how the protocol works (i.e. HTTP/2 vs HTTP/1.1). Maybe there could be a "living" specification of JSON Schema (that users opt-in for) that focuses on innovation and resolving foundational problems with JSON Schema without being afraid to break things, and only at certain points (and not very often, say every 5 years?), a "stable"/"long term support" version is cut out of it?

4 replies

jdesrosiers Dec 13, 2023
Maintainer

I do wonder if there is a third option, that is somewhat hybrid between the two, and that would ensure a high degree of stability while still welcoming breaking changes when they make sense.

We've discussed this and decided that removing a keyword would be allowed in extreme cases, but only when it's been deprecated for a reasonable period of time.

There is rarely a new version of the HTTP protocol going out. But when there is, they are not afraid from truly shaking up how the protocol works (i.e. HTTP/2 vs HTTP/1.1).

When we discussed whether or not to have the new stable dialect URI be the default (making $schema unnecessary in most cases), one of the reasons we decided not to was because we might decide in the future that a significant breaking change is desirable. In that case, we would mint a new stable dialect URI and continue with compatible releases using that URI. I think that's a similar to the HTTP example except we would continue to evolve the language in a compatible way in between those major changes.

So, I think we we're already on the balanced and pragmatic path that you're looking for.

jviotti Jan 15, 2024
Collaborator

Makes sense. Thanks a lot for the clarifications and links

gregsdennis Jan 15, 2024
Maintainer Author

The "stable" and "living" approaches both have clear pros & cons.

Not sure what you mean. These are the same proposal.

jviotti Jan 15, 2024
Collaborator

Sorry, I might have messed up the wording. I meant the stable and (non stable?) approaches

Julian · 2023-12-11T14:28:21Z

Julian
Dec 11, 2023
Maintainer

To me this is an over-complex issue :)

As I tried to put forward quite awhile ago, in my opinion we should be "living" this philosophy rather than spending all this time talking about it -- specifically I would love to see us put out a stable specification purely by it being stable, rather than talking about all of these things ahead of time.

If we do that 2 or 3 times in the next year or two, it will be clear we're getting in a groove, and can then formalize what we're doing, and address how it affects any existing behavior (e.g. by obviating the need for $schema, or by opening up possibilities for new concepts).

I would love to see a spec version with no breaking changes in it -- or with only the ones with overwhelming consensus -- purely to indicate we're moving forward.

Nearly none of what's listed in the original comment is necessary in order to do that -- all of what's there indeed are things we have chosen to discuss alongside it. So it's very odd to me to label those things as being what "stable spec" means rather than stable spec meaning what it means, and all those things being things that if we indeed find contentious then we don't do them, or we do them later. But there's absolutely no good reason why we couldn't put out a stable spec in the same way as we've put out all previous specifications, we've simply continued to choose not to do so in my view.

1 reply

jdesrosiers Dec 13, 2023
Maintainer

While I agree that we should only do the minimum necessary and release a new spec asap, I disagree that it's as simple as just doing it and figuring it out as we go. There are a few things we need to work out before we can say the spec is now stable.

There are three things I think need to happen before we can start being stable.

Disallow unknown keywords. This is necessary because if we allow unknown keywords, any new keyword is a breaking change. And, making the change to disallow unknown keywords is a breaking change so we have to do that before we start being stable.
Determine how dialect URIs work in a stable spec where we aren't releasing a new dialect with each release, but rather updating the current dialect.
Determine what shouldn't be considered stable.

That said, (1) is already resolved, (2) has been resolved with the exception of the specific URI that will be used, and (3) I believe has fairly broad consensus and we know generally what we're going to do with the unstable features. Therefore, I do think we are ready to put out a spec at this point and the rest of the details can be worked out as we go. We were finally at a point where we were preparing for that release when we hit the brakes. The only thing holding us back at this point is us.

benjagm · 2023-12-13T17:00:57Z

benjagm
Dec 13, 2023
Maintainer

We are discussing stable vs iterative releases when we still need to agree in the Specification Development Process and this is having us in a chick egg paradox status limiting our progress.

My suggestion is to drive our efforts towards defining the Specification Development Process which is something we can define with less limitations. Once we have the spec development process clear I am sure it will be much easier to manage the scope of the next release, otherwise we will stay in this chicken egg status.

What is our motivation to release a new version of JSON Schema?
Our only motivation should be to serve the JSON Schema users and with that in mind we can better determine what to include into the next release. Why we need to think just in one release? Why not thinking in the next 2 or 3 releases?

We can work in a backwards compatible release with changes that make sense for 2020-12 users
We should work in an experimental version with the breaking changes we plan to add in the future
We can work in a future version with all the breaking changes required to achieve a stable release

With a spec development process we will have someone championing each release and a way to discuss the scope, make it visible and track the progress.

My conclusion: This lack of a clearly defined spec development process is in my opinion the main hurdle.

1 reply

jdesrosiers Dec 13, 2023
Maintainer

we still need to agree in the Specification Development Process

This is where I tried to start, but no one else wanted to do that first. However, I do think we've agreed on the framework for our process and I agree with others that's enough to move forward. We can figure out the details within that framework as we go.

Why we need to think just in one release? Why not thinking in the next 2 or 3 releases?

My concern about this approach is that it introduces one or two more incompatible releases before we get to the stable version. Adding more versions of JSON Schema adds to the problem that there are too many versions already. I don't want to do that if we don't have to and I don't think we have to. If we weren't so far along in this process, I might have a different opinion, but at this point I think we can get out the stable release in 3 months (6 months max) if we avoid arguing about every detail and just get it done. I don't think it's a big enough job that an intermediate release is necessary. I really think we've worked out enough to do a release. We just need to do it.

benjagm · 2023-12-14T08:16:14Z

benjagm
Dec 14, 2023
Maintainer

we've agreed on the framework

I wasn't sure that was official. If this is the decided approach we should update the contributing guidelines of the spec repo.

1 reply

gregsdennis Dec 14, 2023
Maintainer Author

This discussion is disputing the framework.

nyalex · 2023-12-17T13:57:26Z

nyalex
Dec 17, 2023

Hi. While being a newcomer to this project, I've known about it for quite some time. I have learned that the most recent draft had expired almost exactly a year ago [1] (?!). Additionally, based on what I am reading here, there doesn't seem to be much progress to get to an updated version of even a draft, let alone something stable. This is unfortunate as this project could be so useful in my work.

Am I correct in saying that this should not be/is not ready to be used in production at the moment?

The longer it takes to ship, the less likely it is to ship.

There are times when this is necessary to say, and this is one of those times. I see many technical reasons why things haven't progressed (and they may be valid) but it's all useless if users can't get use out of JSON Schema.

I'm new here, so I think I've said enough, but want to ask you all to just pick a direction. Bugs and pushback are inevitable whichever direction you take, anyway.

1: https://json-schema.org/draft/2020-12/json-schema-core#section-boilerplate.1-3

2 replies

jviotti Dec 17, 2023
Collaborator

I have learned that the most recent draft had expired almost exactly a year ago [1] (?!). Additionally, based on what I am reading here, there doesn't seem to be much progress to get to an updated version of even a draft, let alone something stable. This is unfortunate as this project could be so useful in my work.

On the draft status and IETF, you might enjoy reading https://json-schema.org/blog/posts/future-of-json-schema and https://github.com/json-schema-org/json-schema-spec/blob/main/adr/2022-09-decouple-from-ietf.md. TL;DR: the "draft" status applied by IETF and its expiration is irrelevant for JSON Schema. It doesn't mean JSON Schema is unfinished or that is not production ready, or that is not being used.

nyalex Dec 18, 2023

I stand corrected. Thank you for pointing that out. I was reacting to the language listed in the link + what is said in these comments.

Apologizes for the misunderstanding.

gregsdennis · 2024-01-15T00:02:24Z

gregsdennis
Jan 15, 2024
Maintainer Author

I've now had a month to think about all of this, and it seems to me that the root problem we're trying to address is how to manage breaking changes between versions of JSON Schema.

The current "stable spec" proposals tries to address breaking changes by saying that we just shouldn't have them. I question the sustainability of such an approach.

Breaking changes aren't all that bad

Personally, I don't mind breaking changes so long as there is:

good reason
an obvious indicator that a breaking change exists
documentation on what the change is and how to handle it
perhaps migration tooling

A question I'd like to explore is, why does a dev update to newer versions of JSON Schema or to newer versions of a library? I can only think of three reasons:

Security/bug fixes.
Access to a new feature.
An "always use the latest" philosophy.

Security/bug fixes

This really only applies to libraries, and even then will only cause a break if the newer version (that contains the fix) drops some support that is needed. This is ultimately a decision for the implementor on how they want to manage their library, and, as JSON Schema, we can't do anything here.

New features

If you want access to a new feature, you need to use a version that supports that feature. But that doesn't mean that you need the latest version. You need the earliest version that supports the feature and contains any fixes that apply to your use case.

As an example, while we generally recommend that people use the latest version of JSON Schema, we also regularly recognize that people don't typically need the dynamic references or other features and can usually just use draft 7 as it usually it contains the features they need and it's the most widely supported version. As such, a user doesn't need to reference the latest version of a library; they just need to reference some version that supports draft 7.

"Latest" mindset

The primary argument for maintaining references at their latest versions is that it reduces the workload (or at least spreads it out) for when you eventually do need those latest features while also keeping you updated on security patches and bug fixes.

However, this assumes that you will eventually need those new features and that all patches are applicable to your use case. I'd argue that in most cases, a dev isn't going to need new features in their day-to-day. Updating religiously creates an unnecessary workload when that time can be better spent on the work the business needs done. It's a YAGNI issue, and if new features aren't needed and none of the fixes apply, then there's no reason to update.

This is especially evident in the case where a library drops support for an older spec version (for whatever reason). Keeping the library reference perpetually updated means that the user has to update all of their schemas right then. This is waste because:

They likely don't need their schemas to do anything they're not already doing.
Changing schemas requires regression testing, and no matter how much automated testing they have in place, some amount of manual testing will be required.
They may not have the time or personnel to do the update right now.

It doesn't make sense to spend the time/effort/money performing continuous no-op updates; accumulated small efforts typically have a non-trivial cost. However, it does makes sense to stay with the older library version that supports their existing and proven schemas. Later, when JSON Schema eventually updates with features they want to use, then it makes sense to update the library to a version that supports those new features because they're going to be updating their schemas anyway.

A case for breaking changes

From draft 7 to draft 2019-09, $ref changed to allow sibling keywords, which was a breaking change. This was done (after much discussion) because a significant number of people would use it alongside other keywords expecting all the keywords to be evaluated, only for those other keywords to be ignored. It was unintuitive for users, so we changed it. Importantly, it wasn't something that we chose to do on a whim, but something that we discussed at length, exploring the ramifications of the change.

If we had made this change under the proposed "no breaking changes" policy, we would have had to make a new keyword to do what $ref does but also support sibling keywords. Then, having two keywords that do basically the same thing with only this minor difference isn't ideal, so we'd probably deprecate $ref, pushing everyone use the new keyword.

Sure, $ref would still work like it always has (as long as implementations kept supporting it), but if we're suddenly discouraging its use in favor of another keyword, then users will still need to update their schemas, which is what the proposal tries to avoid by requiring that we introduce a new keyword.

This doesn't solve the problem of people needing to update their schemas because of a breaking change. This approach hides the problem (which is arguably worse) by separating the old way and the new.

Instead, we kept the existing keyword and tweaked its behavior so that it worked in a way that many users already expected. It didn't change previous "proper" use (where there were no siblings), and the only breaking case was that it would evaluate siblings where previously it didn't, which would only break schemas by authors who expected siblings to be ignored and used them anyway.

This is a good breaking change, and I'd hate for us to handcuff ourselves via policy in a way that doesn't allow us to do something like this in the future.

Proper management

I'm not saying all of this to imply that we just throw all caution to the wind and make whatever changes we want whenever we want to. I still think that we should be decisive and very conservative in the changes we propose. But let's not completely throw out the idea of breaking changes; let's find a good way to manage them. Historically, that has been through extensive discussion and impact analysis, and I think it's fine to continue with that.

I think ADRs are a good mechanism to help manage and communicate these kinds of changes. But because we haven't published since we introduced them (we published 2020-12m, but the first ADR is Apr 2022), we haven't had a legitimate opportunity to trial them.

Future keywords

The proposal requires that we disallow unknown keywords because they might collide with keywords that either we (JSON Schema) or some third party vocabulary have yet to define (i.e. "future proofing" or "forward compatibility"). I think this is rather forceful and overly strict.

Instead, I'd like to simply add language that says something like:

Schemas SHOULD NOT use keywords which are not defined by their meta-schema's vocabularies as such keywords may be defined in future versions and could result in invalid schemas or unexpected behavior.

(We can even still protect the x- family if we want to create a safe space for custom annotations.)

This enables:

users to keep their custom keywords without requiring that they update to the x-, as the current proposal does.
supporting experimental functionality without having to jump through "stable meta-schema" hoops like each implementation defining its own meta-schema for what it supports (honestly, this scares me; this will lead to interoperability problems).
reduced need for a "process" for adding new keywords to the spec.
allowing for implementations to warn that such keywords are being used.

Moreover, I'd like to propose that we include language that says support for experimental features MUST be hidden behind a configuration, defaulted to off. This way, users opt-in to experimental support.

Catching spelling errors

The other claimed benefit to disallowing unknown keywords is catching spelling errors, e.g. proprties instead of properties.

We don't need to disallow unknown keywords to do this; we can just publish a closed meta-schema in addition to the open one that we've been publishing. Creating and using a closed meta-schema is the advice that we've been giving people for as long as I can remember, and there's nothing stopping us from just publishing it ourselves.

Deprecated keywords

We're still going to want to deprecate keywords for various reasons.

I'm okay with having a policy that deprecated keywords remain supported for one or two versions, but they should not be supported after that; otherwise, we would only be holding ourselves and implementors back, not moving forward, by having requirements that incur bloat.

(I recognize that since these deprecated keywords don't change, there's not much of a maintenance cost because the code to support them is written, but there is still a runtime cost for end users who have to load libraries into memory, e.g. .Net, or download them, e.g. JS. The maintenance cost that does exist is more around code refactors and such, which don't/shouldn't happen often.)

That said, and in conjunction with what I said above, re-using keywords by merely tweaking their function slightly, with deliberation and discussion, as was done with $ref, mitigates the need for much of the deprecation that would be required by a "replace" approach.

Process

I'm not sure that we need to define everything about the process right away. I find myself in agreement with @Julian's comment in that, at least for now, we continue releasing versioned specs without this idea of a live/active version (which I still don't think that we have the numbers for), while also keeping a mindset that we, being the responsible party, need to minimize breaking changes where we can.

I think we had done well up through the change from draft 7 to draft 2019-09, but some of the changes in draft 2020-12 were particularly disruptive.

removing support for array-form items
removing $recursive* in favor of $dynamic* (and now even that's changing)

I think we all agree that these were necessary changes, however I also question whether we were properly diligent in analyzing the impact that these changes would have. Perhaps better pre-publication communication of the changes could have helped.

Conclusion

Overall, after extensively considering everything in the comments above and my own deliberation, I still don't think that the live stable spec proposal is right for this project, and I'd like to see this project start actually moving forward again under the publication approach that we've previously used, while also being more conscious of the impact of the changes and other decisions we make.

3 replies

jviotti Jan 15, 2024
Collaborator

As an example, while we generally recommend that people use the latest version of JSON Schema, we also regularly recognize that people don't typically need the dynamic references or other features and can usually just use draft 7 as it usually it contains the features they need and it's the most widely supported version. As such, a user doesn't need to reference the latest version of a library; they just need to reference some version that supports draft 7.

One implication of this is that when people keep using the version they work for them whenever they adopted it, without any motivation to move to the latest of version of JSON Schema (independently of how implementations do this), the community becomes fragmented and maintaining implementations/tooling become over complex.

I feel that we should cultivate somehow a culture for people to use the latest version of JSON Schema, as it would simplify the ecosystem a lot.

benjagm Jan 15, 2024
Maintainer

I pretty much agree with @jviotti:

One implication of this is that when people keep using the version they work for them whenever they adopted it, without any motivation to move to the latest of version of JSON Schema (independently of how implementations do this), the community becomes fragmented and maintaining implementations/tooling become over complex.

Reduce the Ecosystem's fragmentation:
The JSON Schema Ecosystem is fragmented in nature, with hundreds of implementations done by hundreds of implementers but with the right structure this fragmentation turns into a strong Ecosystem, however the additional fragmentation as consequence if keep delivering more coexisting releases can somehow be a hurdle to materialize the JSON Schema Ecosystem vision.

Strengthen the Ecosystem:
The other aspect I think is critical for good is prioritize the JSON Schema extensibility/extensions to empower everyone to extend their own... instead of the current "centralized" approach. Empowering everyone to create extensions turn "Users" into "Complementor". (Complementors : Downstream actors providing innovations that enhance the value of the core proposition).

gregsdennis Jan 15, 2024
Maintainer Author

I don't disagree that we should promote using the latest version. I was using it as an illustration that people often don't need to.

The point of that section is that, while using the latest version of the spec while writing a schema is ideal, it's quite different from the scenario of having an existing schema that uses an older version. The existing schema (which may have been written to what was latest at the time) works for what the user needs. But if the library they're using updates and no longer supports that version, then the user has to decide whether to update the library (which forces an update to their schemas) or just continue using the older version of the library. I think continuing to use the older version is completely fine: update only when you want to do something new.

The argument that's being used for the living spec is that, without breaking changes, they'd just be able to update the library and not update their schemas. But...

It doesn't require a living spec, just that we are careful about breaking changes.
It doesn't consider deprecating features and library support of those features.

Regarding fragmentation, I think it's up to the implementations to produce pressure to use the latest versions, and my "spec lifetime" proposal aims to address that. A living spec isn't going to solve the problem of having to support deprecated/legacy features; in fact I think it's going to make it worse. The only way to ensure implementations can let go of old features (which provides pressure for users to stop using them) is for us to explicitly tell the implementors not to support them anymore. I feel the best way to do that is through expiring versions.

with hundreds of implementations done by hundreds of implementers

This isn't fragmentation. This is wide, albeit varied, support. Given the nature of open source, with independent developers creating implementations for various reasons (e.g. legit desire for good support, school projects, "toy"/hackathon projects, etc.), I don't think we can avoid the variation in support. Some projects aren't intended to be taken seriously; some projects are created to fulfill only a single person's niche need; some projects are just abandoned. I think the work we're doing to define metrics on implementations is working toward creating some notion of quality, but that's the best we can realistically hope for.

Strengthen the Ecosystem

This is off-topic for this discussion, but could be raised elsewhere.

gregsdennis · 2024-03-12T20:02:25Z

gregsdennis
Mar 12, 2024
Maintainer Author

Closing this in favor of https://github.com/orgs/json-schema-org/discussions/671

0 replies

JSON Schema

Should we continue pursuing the stable spec idea? #557

gregsdennis Nov 30, 2023 Maintainer

Replies: 9 comments · 16 replies

jdesrosiers Dec 1, 2023 Maintainer

gregsdennis Dec 12, 2023 Maintainer Author

jdesrosiers Dec 13, 2023 Maintainer

gregsdennis Dec 13, 2023 Maintainer Author

On the maturity process itself

The problem with a constant-$id meta-schema

jdesrosiers Dec 15, 2023 Maintainer

mwadams Dec 8, 2023 Collaborator

jviotti Dec 9, 2023 Collaborator

jdesrosiers Dec 13, 2023 Maintainer

jviotti Jan 15, 2024 Collaborator

gregsdennis Jan 15, 2024 Maintainer Author

jviotti Jan 15, 2024 Collaborator

Julian Dec 11, 2023 Maintainer

jdesrosiers Dec 13, 2023 Maintainer

benjagm Dec 13, 2023 Maintainer

jdesrosiers Dec 13, 2023 Maintainer

benjagm Dec 14, 2023 Maintainer

gregsdennis Dec 14, 2023 Maintainer Author

nyalex Dec 17, 2023

jviotti Dec 17, 2023 Collaborator

nyalex Dec 18, 2023

gregsdennis Jan 15, 2024 Maintainer Author

Breaking changes aren't all that bad

Security/bug fixes

New features

"Latest" mindset

A case for breaking changes

Proper management

Future keywords

Catching spelling errors

Deprecated keywords

Process

Conclusion

jviotti Jan 15, 2024 Collaborator

benjagm Jan 15, 2024 Maintainer

gregsdennis Jan 15, 2024 Maintainer Author

gregsdennis Mar 12, 2024 Maintainer Author

gregsdennis
Nov 30, 2023
Maintainer

Replies: 9 comments 16 replies

jdesrosiers
Dec 1, 2023
Maintainer

gregsdennis Dec 12, 2023
Maintainer Author

jdesrosiers Dec 13, 2023
Maintainer

gregsdennis Dec 13, 2023
Maintainer Author

The problem with a constant-`$id` meta-schema

jdesrosiers Dec 15, 2023
Maintainer

mwadams
Dec 8, 2023
Collaborator

jviotti
Dec 9, 2023
Collaborator

jdesrosiers Dec 13, 2023
Maintainer

jviotti Jan 15, 2024
Collaborator

gregsdennis Jan 15, 2024
Maintainer Author

jviotti Jan 15, 2024
Collaborator

Julian
Dec 11, 2023
Maintainer

jdesrosiers Dec 13, 2023
Maintainer

benjagm
Dec 13, 2023
Maintainer

jdesrosiers Dec 13, 2023
Maintainer

benjagm
Dec 14, 2023
Maintainer

gregsdennis Dec 14, 2023
Maintainer Author

nyalex
Dec 17, 2023

jviotti Dec 17, 2023
Collaborator

gregsdennis
Jan 15, 2024
Maintainer Author

jviotti Jan 15, 2024
Collaborator

benjagm Jan 15, 2024
Maintainer

gregsdennis Jan 15, 2024
Maintainer Author

gregsdennis
Mar 12, 2024
Maintainer Author