Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support debug trace RPC APIs #768

Closed

Conversation

a-moreira
Copy link

@a-moreira a-moreira commented Jul 11, 2022

@a-moreira a-moreira requested a review from sorpaas as a code owner July 11, 2022 15:33
@cla-bot-2021
Copy link

cla-bot-2021 bot commented Jul 11, 2022

User @a-moreira, please sign the CLA here.

@tgmichel
Copy link
Contributor

I think is worth noting that this code can only work with runtime overrides. Let me (try to) explain.

Environmental in the runtime

To enable tracing you need to apply extrinsics using environmental (https://docs.rs/environmental/1.1.3/environmental/) - which is the crate used by the evm to emit Events ultimately back to the host function that handles those tracing messages. What environmental does is to create a global reference to an Event listener we coded, and then the EVM can be executed with access to that global reference - that is having a way to listen to whatever Event is emitted from the EVM from within your runtime environmentally.

For obvious reasons, you cannot enable environmental or emit events in your on-chain runtime - unless you want your regular transactions to be in orders of magnitude slower than a regular EVM execution -, so if you are a LIVE chain and you want to use the Debug and Trace namespaces you need dedicated tracing nodes that point to a full-set of runtime overrides following religiously your on-chain runtime version history but compiled with the provided environmental feature flags. Otherwise it just won't work.

Environmental in the client

In addition to that, there is what we called internally a "double environmental hop" here - because we need a way to dodge the limited WASM memory restrictions and avoid overflowing it:

  • environmental is first used to listen to EVM events from the runtime.
  • And is used from the client to proxy those events from the runtime to a host function.

So we don't handle any trace in WASM and proxy everything back to the client instead.

The way the tracing works is very dependant on what I just explained.

Example

How a single transaction is traced:

  • Client receives request with txhash 0xA.
  • Get all extrinsic in 0xA's block.
  • Overlay changes on 0xA's parent block up to 0xA's index.
  • Apply 0xA using environmental.
  • EVM executes 0xA and emits each event back to the runtime and does so in the intermediate state the original transaction was executed.
  • The runtime proxies those events to the client using a set of host functions.
  • Because the overarching trace is, again, using environmental, we can store those events on the client's memory.
  • After the evm exited, build a response in the client with every Event that was capture

@sorpaas
Copy link
Member

sorpaas commented Jul 11, 2022

I think is worth noting that this code can only work with runtime overrides.

The problem can be solved if environmental is only enabled in std (so it only happens in native instead of wasm). Then, do the tracing through a direct function call to the pallet, instead of going through the runtime API that invokes wasm.

@tgmichel
Copy link
Contributor

tgmichel commented Jul 11, 2022

If you use native you are always replaying the requested transactions over the same STF, which will most likely be different in comparison to the one the original transaction was executed over.

Edit: maybe I should re-phrase can only work with runtime overrides -> is probably useless in production without runtime overrides.

@sorpaas
Copy link
Member

sorpaas commented Jul 11, 2022

Yes, but EVM config can be customized by simply passing a new Config struct. Because it also follows the old Ethereum construct, it always support older variants on new versions. This means if one wants more accurate debug tracing, it's always possible with some extra care.

My arguments are basically:

  • Minor differences in debug tracing between native / wasm doesn't matter, because this never goes into consensus.
  • If it matters, it can later be fixed on the node.
  • This would be more practical than using environmental in wasm. Nodes can generally have debugging enabled or disabled dynamically (by enabling / disabling native environmental).

@sorpaas
Copy link
Member

sorpaas commented Jul 11, 2022

To get accurate EVM execution, we can also do the following:

  • Expose a runtime API that returns the current Config used in EVM.
  • Use that in EVM, and run tracing only in native.

@tgmichel
Copy link
Contributor

But it's not just the evm related transactions, it's all the Frame extrinsics that might preceed the one you want to trace in that block.

For example, imagine you changed how you ration the fees in the most recent runtime and now Author take 60% of the fees instead of 10%. If the (old) evm transaction you want to trace uses that address balance in any case, contract execution might take another code path and result in a completely different trace result.

@sorpaas
Copy link
Member

sorpaas commented Jul 11, 2022

For example, imagine you changed how you ration the fees in the most recent runtime and now Author take 60% of the fees instead of 10%. If the (old) evm transaction you want to trace uses that address balance in any case, contract execution might take another code path and result in a completely different trace result.

All the pre-conditions can be executed in native without tracing though. It's only the final EVM transaction that requires tracing as well as the block initialization. We simply need to build a storage overlay executing them (which Substrate's block initialization code in RPC calls is already an example). For the subsequent per-requisite EVM transactions, it can be done either calling transact individually, or we can provide a new batch call for it.

@tgmichel
Copy link
Contributor

All the pre-conditions can be executed in native without tracing though

You meant in wasm? Maybe could be worth experimenting with if:

  • We have a reasonable way of identifying which runtime version needs which evm::Config.
  • Then we can replay the block in wasm up to the desired tx index.
  • Then trace using Native. Is it possible to conditionally use the RuntimeApi with one or other execution strategy? (btw wasn't native execution in the works for being deprecated?).

Again, I need to think more about this but I'm pretty sure that there will be side effects. For example on top of my head if min_gas_price is fixed and changed between runtime versions since the txn was processed, and the contract uses BASE_FEE opcode, then the trace result will be likely different (or the BlockGasLimit, or any value set through a Get associated type for that matter).

@sorpaas
Copy link
Member

sorpaas commented Jul 11, 2022

You meant in wasm?

Yeah sorry. Meant to say wasm.

@crystalin
Copy link
Collaborator

Yes, from my experience, Native definitely provides different output than WASM in substrate. However it might not be the case in EVM which has different requirements in terms of op codes.

@boundless-forest
Copy link
Collaborator

btw wasn't native execution in the works for being deprecated?

According to paritytech/polkadot-sdk#62, the tracing feature discussed above will be broken once the native runtime execution has been abandoned?

@tgmichel
Copy link
Contributor

tgmichel commented Jul 12, 2022

@sorpaas I've been thinking on what you suggested and I like the idea, as it would remove the more complex requirements. But in order for it to work, we need to check all the below, to summarize (please add if I miss something):

  • We can leverage Native execution on the long run (it does not seem to be the case, thank you @AsceticBear for linking the PR).
  • Have a way of switch executor between RuntimeApi calls.
  • Have the right Api to retrieve the evm::Config for a given height.
  • Have a comprehensive list of associated type values that might affect the Backend trait implementation for SubstrateStackState, and we are able to selectively override them at a given height in the runtime.

If we cannot satisfy all of the above points we will likely run into problems when tracing or end up with an implementation that covers just some of the cases. From our experience with it, tracing mismatches - even small ones - cause a lot of problems to deployments, partners and Dev-rel teams in turn. That is the reason why we traded infra and setup complexity - all the WASM tracing shenanigans - for a better integration.

(Side note: implementations like OnUnbalanced used in OnChargeEVMTransaction will also cause problems in traced EVM transactions if we use the built-in runtime - as it might change between versions.. not true, as that happens after the evm execution).

@sorpaas
Copy link
Member

sorpaas commented Jul 12, 2022

The rationale for the deprecation of native runtime is to avoid compliance errors between native and wasm (so wasm becomes the single source of truth). This has been historically troublesome even for Polkadot mainnet. Those are all more or less in the consensus layer (there are some overlaps in the workers, but that doesn't matter much for this), and that is the layer that the deprecation is concerned. The code of the native runtime will never be deprecated, because it can't be -- it is used to build the wasm runtime. You'd always able to use the native runtime as a library, which is what we do here.

If non-compliance bugs should be absolutely minimized then this is indeed a concern, cause we just run into the same problem as the original rational above. However, here, we don't need to treat it as strict as the consensus layer. Plus, deep state non-compliance should be extremely rare as long as the same evm version is used. If non-compliance only happens in the surface level, then it's always easy to debug and fix.

An alternative to avoid the hacky runtime override is to store a runtime hash of the specialized build tracing runtime. Then in node, we keep a registry of all past tracing runtime build, and enable debug API only if the on-chain tracing runtime hash is in the registry. This may be a good compromise. However, note that it does not avoid the non-compliance problem because we're still using different code paths for tracing and non-tracing.

@tgmichel
Copy link
Contributor

The rationale for the deprecation of native runtime is to avoid compliance errors between native and wasm (so wasm becomes the single source of truth). This has been historically troublesome even for Polkadot mainnet. Those are all more or less in the consensus layer (there are some overlaps in the workers, but that doesn't matter much for this), and that is the layer that the deprecation is concerned. The code of the native runtime will never be deprecated, because it can't be -- it is used to build the wasm runtime. You'd always able to use the native runtime as a library, which is what we do here.

I see, yeah makes sense, thanks for explaining.

bugs should be absolutely minimized

From our experience this last year running the tracing in production, this is the case. Even the smallest mismatches are a huge headache, specially because they cascade into more serious mismatches in big nested transactions (internal calls). At the end it boils down to we cannot anticipate the side effects of even the smallest mismatch.

Plus, deep state non-compliance should be extremely rare as long as the same evm version is used

I agree, but is the edge cases the ones that concern me, as they ultimately render the whole service as "unreliable". Additionally, because they are rare and hard to reproduce, the tend to slip in from testnet and show up in production.

An alternative to avoid the hacky runtime override is to store a runtime hash of the specialized build tracing runtime. Then in node, we keep a registry of all past tracing runtime build, and enable debug API only if the on-chain tracing runtime hash is in the registry. This may be a good compromise. However, note that it does not avoid the non-compliance problem because we're still using different code paths for tracing and non-tracing.

I'm not sure I understand this part. Are you suggesting to map the modified/tracing runtime wasm in offchain storage and use it instead of configuring a runtime overrides path through cli?

@fiexer
Copy link

fiexer commented Aug 19, 2022

Are there any updates available?

@andabak
Copy link

andabak commented Sep 15, 2022

Hi @tgmichel @a-moreira - are there any updates regarding this PR, or timeline estimates for merge? Thanks!

@tgmichel
Copy link
Contributor

Are there any updates available?

Hi @tgmichel @a-moreira - are there any updates regarding this PR, or timeline estimates for merge? Thanks!

Unfortunately no. In order to move the tracing code here we need a fully compliant solution, otherwise external projects relying on it will 100% malfunction.

Consider the following: we don't use runtime overrides and we use purely Native somehow conditionally loading EVM configuration depending on the block we are tracing.

  • What about precompiles? Precompiles code is part of the runtime and often needs to be revisited. How it can be managed without overrides?
  • Regarding what @sorpaas mentioned on using Native executor for Frame extrinsics and selectively switch to wasm execution for the tracing-target Ethereum extrinsic. This is not clear to me, how does it work? Can I create a single overlayed state using multiple executors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants