Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch runtime to wasm only #701

Merged
merged 14 commits into from
Aug 1, 2022
Merged

switch runtime to wasm only #701

merged 14 commits into from
Aug 1, 2022

Conversation

flame4
Copy link
Contributor

@flame4 flame4 commented Jul 21, 2022

Signed-off-by: Yi [email protected]

Description

relates to PR closes: paritytech/substrate#469.

This PR switch the runtime to wasm only, instead of NativeElseWasmExecutor.
It means in compiled binary, it'll never contain a native runtime.
You can get more relative issue on paritytech/substrate#469, but the brief benifit i think is:

  • since the existing of forkless upgrade, only some specific version can be speed up with native binary because it can be only compiled once.
  • eliminate the abstruction complexity to support two runtime
  • lower binary size and compiling time

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Added one line describing your change in <branch>/CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer.

Situational Notes:

  • If adding functonality, write unit tests!
  • If importing a new pallet, choose a proper module index for it, and allow it in BaseFilter. Ensure every extrinsic works from front-end. If there's corresponding tool, ensure both work for each other.
  • If needed, update our Javascript/Typescript APIs. These APIs are officially used by exchanges or community developers.
  • If modifying existing runtime storage items, make sure to implement storage migrations for the runtime and test them with try-runtime. This includes migrations inherited from upstream changes, and you can search the diffs for modifications of #[pallet::storage] items to check for any.
  • If runtime changes, need to update the version numbers properly:
    • authoring_version: The version of the authorship interface. An authoring node will not attempt to author blocks unless this is equal to its native runtime.
    • spec_version: The version of the runtime specification. A full node will not attempt to use its native runtime in substitute for the on-chain Wasm runtime unless all of spec_name, spec_version, and authoring_version are the same between Wasm and native.
    • impl_version: The version of the implementation of the specification. Nodes are free to ignore this; it serves only as an indication that the code is different; as long as the other two versions are the same then while the actual code may be different, it is nonetheless required to do the same thing. Non-consensus-breaking optimizations are about the only changes that could be made which would result in only the impl_version changing.
    • transaction_version: The version of the extrinsics interface. This number must be updated in the following circumstances: extrinsic parameters (number, order, or types) have been changed; extrinsics or pallets have been removed; or the pallet order in the construct_runtime! macro or extrinsic order in a pallet has been changed. You can run the metadata_diff.yml workflow for help. If this number is updated, then the spec_version must also be updated
  • Verify benchmarks & weights have been updated for any modified runtime logics

@flame4
Copy link
Contributor Author

flame4 commented Jul 21, 2022

Before remove native binary size: 134MB.
After remove native binary size: 123MB.

@ghzlatarev

Compiler env: cargo 1.61.0 (a028ae42f 2022-04-29) stable on MacPro 2022 M1 Chip

@flame4
Copy link
Contributor Author

flame4 commented Jul 21, 2022

Use polkadot-launch raising up a local network. It works well.

@ghzlatarev
Copy link
Contributor

Use polkadot-launch raising up a local network. It works well.

This is also part of the publish-draft-release job fyi

node/src/service.rs Show resolved Hide resolved
node/src/service.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@ghzlatarev ghzlatarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this force-debug feature for debugging: https://github.com/PureStake/moonbeam/blob/master/runtime/moonbase/Cargo.toml#L252
When we compile for wasm a lot of the debug impl is stripped, so we will be getting <wasm:stripped> in logs
Until now we've been working around this by running with native while debugging, but this is better solution
This is the log implementation if you're curios https://paritytech.github.io/substrate/master/src/sp_debug_derive/impls.rs.html#46-175

We could probably also use disable-logging feature for releases to shrink the size further.

Can you also write a description for the PR.

@flame4 flame4 closed this Jul 21, 2022
@flame4 flame4 reopened this Jul 21, 2022
stechu
stechu previously requested changes Jul 21, 2022
CHANGELOG.md Outdated Show resolved Hide resolved
node/src/service.rs Show resolved Hide resolved
pallets/manta-pay/src/lib.rs Outdated Show resolved Hide resolved
@flame4
Copy link
Contributor Author

flame4 commented Jul 22, 2022

Can you add this force-debug feature for debugging: https://github.com/PureStake/moonbeam/blob/master/runtime/moonbase/Cargo.toml#L252 When we compile for wasm a lot of the debug impl is stripped, so we will be getting <wasm:stripped> in logs Until now we've been working around this by running with native while debugging, but this is better solution This is the log implementation if you're curios https://paritytech.github.io/substrate/master/src/sp_debug_derive/impls.rs.html#46-175

We could probably also use disable-logging feature for releases to shrink the size further.

Can you also write a description for the PR.

for the force-debug feature, it will lead to a compile error.
the reason is that some code on pallet-xcm with polkadot-v0.9.22 is not compatible with force-debug. They have already fixed it in newer version, but we can't upgrade version so simple for other potential problems.
so in this pr the force-debug line is removed, but log it as new issue.

disable-logging make binary smaller from 123MB to 121MB.

Signed-off-by: Yi <[email protected]>
.gitignore Show resolved Hide resolved
Signed-off-by: Yi <[email protected]>
ghzlatarev
ghzlatarev previously approved these changes Jul 22, 2022
@flame4 flame4 requested a review from stechu July 22, 2022 11:12
@flame4 flame4 dismissed stechu’s stale review July 22, 2022 11:17

already changed.

@Garandor Garandor added the DO-NOT-MERGE Labels a PR that should not be merged label Jul 25, 2022
@ghzlatarev
Copy link
Contributor

ghzlatarev commented Jul 26, 2022

A conceptual question here - please DO NOT MERGE until after discussion:

Is it worth making non-consensus nodes ( full nodes handling RPC etc ) spend >2x more computation time executing the block import WASM over running native just to save 2 MB on the binary size?

Have you measured the CPU busy % of a full node keeping up with the chain before and after? While it has no impact on benchmarks, it may have impact on other node services like RPC servers. It's difficult to measure because it means we have to simulate full-ish blocks to generate enough load for this to matter, but I bet that RPC performance will go down ( either in round trip timing or capacity of parallel connections handled per second ) after this change. How much is an open question.

Please convince me that this is a good idea with other reasons than statemine did it

Code complexity is obviously not it, since this PR saves 14 lines I get the code complexity point, but that can't be all

RPC is part of the client, this change is referring to the runtime. Also recall that in all of our production nodes the client is already only communicating with a wasm-executed runtime. So there should not be any performance loss/gain from this change.

Following the link chain you can find the broader discussion paritytech/polkadot-sdk#62. The main problem with native runtime (imo) is that there's a chance that the network can run runtimes compiled with different compiler versions, which was partly the reason for a major stall on Polkadot already.

@flame4 I'd put some of these more important discussion issues and linked PRs in the description so they can be in plain view of reviewers.

@Garandor
Copy link
Contributor

Garandor commented Jul 26, 2022

RPC is part of the client, this change is referring to the runtime. Also recall that in all of our production nodes the client is already only communicating with a wasm-executed runtime. So there should not be any performance loss/gain from this change.
Following the link chain you can find the broader discussion paritytech/polkadot-sdk#62. The main problem with native runtime (imo) is that there's a chance that the network can run runtimes compiled with different compiler versions, which was partly the reason for a major stall on Polkadot already.

Runtime Executor is part of the client as well. Check the diffs on this PR.
I remember Rob is running our nodes with the wasm execution CLI flag, yes.
Doesn't mean he must continue to be doing so - at least not for the non-collating nodes.
I refreshed my memory by re-reading this, but it mainly talks about the stalling problems you referenced.
That's about block production i.e. only relevant for collator nodes.
In the freak case that a compiler problem occurs, losing our full nodes and have their RPC connectivity freeze on some block while the chain continues to produce blocks is not nearly as bad as stalling the chain.

The argument here is, what's wrong with the CLI flag?

Second argument is - if this performance difference is negligible, this whole discussion doesn't matter.
Would be nice to see some performance numbers while the chain is processing large amounts of transactions before/after this change @flame4

@ghzlatarev
Copy link
Contributor

ghzlatarev commented Jul 26, 2022

RPC is part of the client, this change is referring to the runtime. Also recall that in all of our production nodes the client is already only communicating with a wasm-executed runtime. So there should not be any performance loss/gain from this change.
Following the link chain you can find the broader discussion paritytech/polkadot-sdk#62. The main problem with native runtime (imo) is that there's a chance that the network can run runtimes compiled with different compiler versions, which was partly the reason for a major stall on Polkadot already.

Runtime Executor is part of the client as well. Check the diffs on this PR. I remember Rob is running our nodes with the wasm execution CLI flag, yes. Doesn't mean he must continue to be doing so - at least not for the non-collating nodes. I refreshed my memory by re-reading this, but it mainly talks about the stalling problems you referenced. That's about block production i.e. only relevant for collator nodes. In the freak case that a compiler problem occurs, losing our full nodes and have their RPC connectivity freeze on some block while the chain continues to produce blocks is not nearly as bad as stalling the chain.

The argument here is, what's wrong with the CLI flag?

Second argument is - if this performance difference is negligible, this whole discussion doesn't matter. Would be nice to see some performance numbers before/after this change @flame4

Following the link chain you can find the broader discussion paritytech/polkadot-sdk#62. The main problem with native runtime (imo) is that there's a chance that the network can run runtimes compiled with different compiler versions, which was partly the reason for a major stall on Polkadot already.
@flame4 I'd put some of these more important discussion issues and linked PRs in the description so they can be in plain view of reviewers.

Not sure i'm following. But 1. we will continue running our runtime in WASM and 2.The client will always be native.
Again this whole change is related to how the runtime runtime is executed.

@Garandor
Copy link
Contributor

  1. we will continue running our runtime in WASM

No argument there for collator nodes. My point is that for full nodes, forcing WASM over native runtime is wasteful.

@ghzlatarev
Copy link
Contributor

ghzlatarev commented Jul 26, 2022

  1. we will continue running our runtime in WASM

No argument there for collator nodes. My point is that for full nodes, forcing WASM over native runtime is wasteful.

I guess I'm still not following, but can you ask your question in the upstream discussion. Perhaps they can answer.

@Garandor
Copy link
Contributor

I'll do that. In the meantime - CC @stechu for a second opinion

@ghzlatarev
Copy link
Contributor

ghzlatarev commented Jul 26, 2022

I'll do that. In the meantime - CC @stechu for a second opinion

Actually I think I get what you mean, but we need to measure the system as a whole to see whether RPC nodes are the bottleneck. Definitely out of scope of this PR. And in the end of the day you can always spawn more RPC nodes if that's the bottleneck.
As mentioned earlier this change will have no performance impact on all of our wasm-executed nodes.
EDIT: correction our nodes are not using running with the wasm flag yet..

@ghzlatarev
Copy link
Contributor

I'll do that. In the meantime - CC @stechu for a second opinion

Actually I think I get what you mean, but we need to measure the system as a whole to see whether RPC nodes are the bottleneck. Definitely out of scope of this PR. And in the end of the day you can always spawn more RPC nodes if that's the bottleneck. As mentioned earlier this change will have no performance impact on all of our wasm-executed nodes.

Probably it would be best to compile a list of all the benefits in 1 place.

@Garandor
Copy link
Contributor

My concern was answered here paritytech/polkadot-sdk#62

We're good to merge this :)

@Garandor Garandor removed the DO-NOT-MERGE Labels a PR that should not be merged label Jul 27, 2022
Copy link
Contributor

@Garandor Garandor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we're not:

  • Please add A- C- and L- labels :)

@stechu
Copy link
Collaborator

stechu commented Jul 28, 2022

@Garandor @ghzlatarev @flame4 for the RPC performance discussion, I think it really depends on kind of RPC we are doing. To make it simple, let's imagine two extreme scenarios:

  1. RPC A, almost all the execution happens in the runtime
  2. RPC B, almost all the execution happens in the non-wasm client part

Clearly, switching to wasm only execution only affect A a lot, and has very minimal impact on B. A general design is we should never design something like A. That's why @Dengjianping is working on indexers. We may revisit this after adding PVM tho.

@flame4 flame4 added C-enhancement Category: An issue proposing an enhancement or a PR with one A-runtime Area: Issues and PRs related to Runtimes P-medium Priority: Medium L-added Log: Issues and PRs related to addition labels Jul 29, 2022
@flame4 flame4 requested a review from Garandor July 29, 2022 07:47
@flame4
Copy link
Contributor Author

flame4 commented Jul 29, 2022

Actually, we're not:

  • Please add A- C- and L- labels :)

Already Add. Please review it.

@Garandor Garandor added L-changed Log: Issues and PRs related to changes and removed L-added Log: Issues and PRs related to addition labels Jul 29, 2022
@Dengjianping Dengjianping merged commit 66a5000 into manta Aug 1, 2022
@Dengjianping Dengjianping deleted the Yi/remove-native-runtime branch August 1, 2022 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: Issues and PRs related to Runtimes C-enhancement Category: An issue proposing an enhancement or a PR with one L-changed Log: Issues and PRs related to changes P-medium Priority: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

thread 'ImportQueue' panicked at 'Storage root must match that calculated.'
5 participants