-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The way of generating the genesis state root for the domain instance is not deterministic #1798
Comments
This is what we actually want right? The solution would be to use the same runtime on both scenarios @NingLin-P |
The runtime I referring to in "complied with the production evm runtime" is the native runtime, not the wasm runtime. We are always using the same wasm runtime, but as I mentioned above the If we want to use the same native runtime that literally means using the same release snapshot forever. |
I'm not sure why would anything use native runtime. Not only we specify to use WASM runtime in CLI, it is going away in the future completely. Also Substrate ensures that native runtime is only ever used if it matches WASM runtime version (including spec version), otherwise it will switch back to WASM automatically. I also don't understand what consensus runtime has to do with this. For context, genesis config is already possible to create within runtime itself in latest Substrate we have imported, it no longer requires Actually, I think we might even want to upload the genesis state as a blob during domain registration rather than introducing a host function. Host function should always be the last resort. |
We are not directly using the native runtime like executing an extrinsic or invoking a runtime API, what I mean above is that the
This is something new for me, but seems the
That will require the user to generate the genesis state manually and we are unable to check if it is well formatted. |
This makes no sense to be so far. I don't see
Storage building is deterministic process given the same input. I don't understand why you think it is not, it would break, as I mentioned, all Substrate-based chains.
It must decode properly, as long as it does and user pays the fee we shouldn't care. |
Take the following as an entry of how https://github.com/paritytech/substrate/blob/master/client/service/src/builder.rs#L144
Indeed, the problem now is we can't guarantee the same input, the wasm runtime is not involved during the storage building process, the input now is somehow the native runtime that compiles with the consensus chain client, to guarantee the same input means using the same release snapshot forever. That is what I mean by introducing some kind of block execution mechanism to get the wasm runtime involved during the storage building process. |
We went with host function so that we runtime know whats in the genesis config instead of relying on the domain creator to not add for example
IIUC, given a genesis config and a runtime, we should be able to deterministically generate the state root. If, for example like mentioned above, spec_version was incremented and compiled, and state root generation relies on Would be great to know what exactly is causing the change in state function @NingLin-P |
Okay, so it is
Why is WASM not involved? WASM that is included in chain spec is the only thing from which chain must always start. If domain doesn't have chain spec, it must have something equivalent. It makes no sense to me that it just blindly calls into the whatever native runtime that is a part of the current binary is. |
This can be confirmed in the code here https://github.com/paritytech/substrate/blob/master/frame/support/procedural/src/pallet/expand/genesis_build.rs#L38-L41 I guess substrate has some way to handle the mismatch genesis state root during the sync, perhaps @nazar-pc would know.
Sorry for the typo @nazar-pc,
The runtime code and domain id of a domain instance is already determined and maintained in the consensus chain state, we don't want the
Okay, it is involved but as an argument of |
No one is questioning that that part is done by the client. But no, there is nothing that handles genesis root mismatch. Different genesis root is by definition a completely different chain. The fact that operation is done in the client doesn't mean it is not deterministic. It is, given the same input, which is the case given the same chain spec as I mentioned above.
This is where I believe things are wrong. You should create genesis config with the runtime that corresponds to domain you're planning to use. And as mentioned above latest Substrate version is able to do that from Calling into arbitrary native runtime of arbitrary version is why you see mismatches, it should be fixed and the rest should fall into place I think.
Well, I think it should be the opposite. You take the runtime, you generate genesis config with it. Or at least patch itself with runtime since runtime will probably not be able to return itself as part of genesis config (unless you pass it as an argument alongside domain ID, which would be awkward, but possible). I think the root issue is that you need genesis config, but genesis config is inherently tied to the runtime. So you should really upload genesis config or chain spec of sorts (maybe simplified, but still) rather than just a runtime. Or the runtime must contain everything necessary to generate its own chains spec. The fact that client then takes that and builds storage and writes things into database on disk is irrelevant and not a source of non-determinism. |
To build the genesis state there are 2 inputs, the first is the
The
The #1824 Try to fix this by building the genesis state when domain is instantiated on the consensus node and the operator node can use the pre-build genesis state instead of building its own. I do realize it is still problematic since it involves the native runtime and different versions of consensus node may produce different genesis state. So the solution should be make |
It is not coming from native runtime either if I read the code correctly, it doesn't care about runtime at all. 6c325b3#diff-3d14087dd782f18801a80ab0168ce425161be6f42fc25dac99bc3a93116062a9R112-R115 I am not sure this is actually correct. This is probably one way to do it, but not the one we need. We should already have all storage items pre-created as key-value pair (the way they are in the raw chain spec), then you should be able to create an empty storage and fill it with data you actually want. Ideally, that key-value map would be a part of the runtime, probably as a separate section, then we don't need to instantiate and run WASM to extract it.
I think the problem is that you're uploading genesis config, not raw chain spec. They are not the same.
Why does it need to build something, why is it not uploaded just like chain spec is uploaded to the repo?
We should not use native runtime for this, at all. And version of the consensus node shouldn't matter. Polkadot was upgraded many times and they still have chain spec that is imported and results in the same genesis state just fine. I think the solution like with everything else is to upload an equivalent of genesis chain spec when domain is created. |
The code is just used to demonstrate the issue, as you can see the value is just used to compute a hash and print it out to show with the same
In the case of building genesis state it is the same if you take a look at substrate
That is the only part I haven't figured out will try to demonstrate the issue with polkadot. |
But it does that only once on a reproducible machine and then stored in raw form in the raw chain spec. If we do the same we bypass the issue in exactly the same way. That way it doesn't care about runtime since storage is just a key-value pairs that can be easily read and written to the database. |
Okay, this answers why polkadot doesn't have the issue, because polkadot only builds the genesis state once and stores it in the raw chain spec, as long as it uses the same raw chain spec it will use the same genesis state. So back to our issue, the mismatch of genesis state root issue still exists in the main branch, #1824 tried to fix it by building the genesis state on the consensus node and storing it onchain so the operator won't need to build it again, but because every consensus node will need to build the genesis state and the building process still invoke the native runtime which makes it unreproducible so it is still an issue. The solution we have so far:
|
Why not use something like a chain spec that you upload instead of raw runtime blob? |
When a domain is instantiated on the consensus chain runtime, we will generate the genesis state root of the domain instance through a host function and use that to construct and import the genesis ER into the block tree. The host function generates the genesis state root by building the genesis block with the
RuntimeGenesisConfig
and getting the state root from the header of the genesis block:https://github.com/subspace/subspace/blob/ff48b6e5bc1a6b24c3d89069de2769638cf4c317/crates/subspace-node/src/domain.rs#L108-L118
After the domain is instantiated, at some point of time, when someone starts running an operator node, it will also construct and import the genesis block when constructing the client of the node in the same way, the
RuntimeGenesisConfig
it used is fetch from the consensus chain state and is also the same:https://github.com/paritytech/substrate/blob/884a288caf2dfb29333ad2676c2281d31d543e49/client/service/src/builder.rs#L147-L152
https://github.com/paritytech/substrate/blob/884a288caf2dfb29333ad2676c2281d31d543e49/client/service/src/client/client.rs#L399
These 2 genesis state root is expected to be the same, otherwise, when the operator tries to submit its local genesis ER, it will fail due to the mismatch with the genesis state root on the consensus chain state. Unfortunately, the way we used to derive to genesis state root is not deterministic on both the host function and the operator client side.
This branch has added minor changes to the evm domain test runtime (change the
spec_version
from 0 to 1) and added log to demonstrate the issue. Running any domain test in this branch will fail due to genesis ER mismatch (i.e.cargo test --package domain-client-operator --lib -- tests::test_domain_block_production --exact --nocapture
) and from the log you can see even with the sameRuntimeGenesisConfig
the result genesis storage is different.My guess of the root cause is that the call of
<RuntimeGenesisConfig as BuildStorage>::build_storage
is not deterministic, this trait function is implemented for theRuntimeGenesisConfig
within theconstruct_runtime
macro, so the implementation can be different from runtime to runtime but it is not included by the runtime and is called outside of the runtime. Which implementation is used is determined by which runtime is compiled with the host function/operator node.In the above example branch, the host function has complied with the production evm runtime with
spec_version = 0
while the operator is compiled with test evm runtime withspec_version = 1
thus they will produce a different state root, if we also change thespec_version
of the production evm runtime to 1 then all the test will pass again.This issue will cause problems when we try to upgrade the runtime version as the above example does. A solution to this issue is to directly store the genesis storage in the consensus runtime instead of the
RuntimeGenesisConfig
so the operator won't need to call<RuntimeGenesisConfig as BuildStorage>::build_storage
and just use the genesis storage to construct the genesis block.cc @vedhavyas
The text was updated successfully, but these errors were encountered: