Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How extension mechanism is supposed to work when dimension type is unknown? #147

Open
OleksandrKvl opened this issue Apr 25, 2022 · 7 comments

Comments

@OleksandrKvl
Copy link

Extension mechanism is based on required blockLength and optional numGroups and numVarDataFields. The latter two tell how many groups/var-data fields we have but they tell nothing about their dimension types so I don't understand how such extension can be handled. Consider the first version of the message:

<field name="field1" id="1" type="uint32"/>
<group name="group1" id="10" dimensionType="groupSizeEncoding">
    <field name="field1" id="11" type="uint32"/>
</group>

And the second version which adds another group:

<field name="field1" id="1" type="uint32"/>
<group name="group1" id="2" dimensionType="groupSizeEncoding">
    <field name="field1" id="3" type="uint32"/>
</group>
<!-- new group with different dimensionType -->
<group name="group2" id="4" dimensionType="groupSizeEncoding2">
    <field name="field1" id="5" type="uint32"/>
</group>

The reader of the new version will see numGroups set to 2 but that group has different dimension type and it's not possible to correctly handle it without its definition.
The same happens with var-data fields. New ones can have different type which means different length type so it's not possible to correctly handle them.

For groups I can imagine potential "fix" in form of saying that since group dimension is unknown it should be treated as default groupSizeEncoding so only groups with this dimensionType can be safely added to the message. But the standard says nothing about default type for var-data encoding. Can you please clarify how it's supposed to work?

@donmendelson
Copy link
Member

A decoder generated from an earlier version of a message schema cannot decode the new group anyway. The downside is that it cannot correctly determine the message length by walking the blocks. For that purpose, a framing header such as Simple Open Framing Header or WebSocket Protocol is recommended.

The extension mechanism section already states the constraint: "Message header encoding cannot change." This should be amended to say "Message header encoding and group dimension of existing repeating groups cannot change."

@OleksandrKvl
Copy link
Author

OleksandrKvl commented Apr 25, 2022

@donmendelson This section says:

  • A repeating group may be added after existing groups at the root level or nested within another repeating group.
  • A variable-length data field may be added after existing variable-length data at the root level or within a repeating group.

Correct me if I'm wrong, the above is only partially true:

  • for groups: group can be added only at the root level and only if there are no existing var-data fields. Otherwise older client cannot get the proper offset of the first var-data field.
  • for var-data: new var-data field can be added only at the root level.
    Neither of them can be added inside a group because older client wouldn't be able to properly calculate the entry size and iterate over the group.

A decoder generated from an earlier version of a message schema cannot decode the new group anyway.

Well, if group header and var-data length field types are known (for example defaulted) and message/group header contains numGroups and numVarDataFields, it should be possible because you would have all the information to calculate the size of each message level. I thought that was the original intent... Then what's the purpose of those numGroups and numVarDataFields on the wire?

@OleksandrKvl
Copy link
Author

Actually, here's the direct quote from here:

Number of repeating groups and variable data.
Message headers and repeating group dimensions carry a count of the number of repeating groups and a count of variable-length data fields on the wire. This supports a walk by a decoder of all the elements of a message, even when the decoder was built with an older version of a schema. As for added fixed-length fields, new repeating groups cannot be interpreted by the decoder, but it still can process the ones it knows, and it can correctly reach the end of a message.

@donmendelson
Copy link
Member

@OleksandrKvl yes, it was the intent of the extension mechanism to be able to walk the blocks of a message even if the design of the blocks is unknown to the decoder. Previous proposals for version 2.0 did not anticipate that the repeating group dimension and length of variable-length fields could change for new blocks.

You proposed the possibility of a default dimension to be used for new blocks.

Another possibility would be that new blocks use the same dimension previous groups within a message. However, that would not cover the case of adding the first repeating group or variable-length field to a message. In that case there would have to some kind of default anyway.

I open to any other proposal that anyone would like to make.

@OleksandrKvl
Copy link
Author

@donmendelson I thought that SBE V1 is kinda "final" and complete but it contains exactly the same extension logic which seems not possible to implement (I'm actually writing SBE parser, that's why I want to know the details).

Another possibility would be that new blocks use the same dimension previous groups within a message.

This also doesn't cover a (rare) case where groups have different dimension types. There's already messageSchema.headerType, possible solution might be to add messageSchema.groupSizeEncoding and messageSchema.varDataEncoding with the same defaults as their names. Similar attributes can be added to the <message> element to allow per-message customization. Then the only limitation to extension would be that new groups/var-data fields should use default types and of course numGroups/numVarDataFields must exist.

@donmendelson
Copy link
Member

At high level, the intention of the extension mechanism was to support simple addition of new fields to a block without breaking old parsers. The proposal to allow addition of new repeating groups was initially rejected by the working group, and it was not included in SBE v1.0. The thought was that if a message is to be radically changed, then you should just issue a new template.

Since it was requested again by some users, the proposal was revived as SBE 2.x release candidate. I want to emphasize that a release candidate is provisional and subject to improvement. I would encourage @OleksandrKvl or any other interested parties to make proposals to resolve ambiguities in the spec. Any proposal entered as PR will be considered.

@OleksandrKvl
Copy link
Author

OleksandrKvl commented May 9, 2022

@donmendelson before making such a proposal, I'd like to understand whether it's possible to change <data> attributes in a way I proposed in #148 (in particular providing just length type instead of composite) since proposal directly depends on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants