Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occassional Invalid tx status from RPC although transaction gets included in tx-pool #149

Open
valentinfernandez1 opened this issue Jul 5, 2024 · 4 comments

Comments

@valentinfernandez1
Copy link
Collaborator

valentinfernandez1 commented Jul 5, 2024

It has been detected from continuous testing that spurious Invalid statuses are emitted for transactions that eventually succeed and get committed into a block. Here's some details provided by the mythical team:

  • This seems to happen to transactions randomly when the node is under moderate load.
  • The transaction subscription events when the issue happens are always Ready -> Broadcast -> Invalid
  • No further updates come through the subscription. However, around 50 seconds later we find the transaction in a finalized block, succeeded.
  • Seems to happen to small transfer transactions as well as "heavy" ones with lots of nested proxy/batch calls.

This pattern is present in both traditional and custom pool implementations of the node

@michalkucharczyk
Copy link

This should not happen with fork-aware txpool implementation. The invalid event is sent when the transaction is invalid for all the blocks at tips of the forks, or is invalid for finalized block.
I never saw this pattern using saga-polkadot-util or while making my internal tests.

@valentinfernandez1
Copy link
Collaborator Author

After testing from Mythical, they discovered that this issue can also occur where a transaction is signaled as Invalid and not included in a block, leading to problems when trying to track the state of execution of a transaction. Here's a quote from @niksaak that provides more detail:

With the current node behavior, we are not able to automatically distinguish between a transaction signalling Invalid that was dropped from the global mempool and a transaction that was not and will get committed eventually.
This means that we can not, in fact, treat Invalid events as non-terminal — which is what we assumed previously — as that would lead to some operations depending on these transactions getting stuck waiting forever.

While we are aware that fork-aware txpool should resolve the issue it is critical to have an alternative solution in the meantime while the new transaction pool is under development and an estimated release date for the transaction pool so we can adjust the production release plan accordingly

cc @michalkucharczyk @bkchr

@valentinfernandez1
Copy link
Collaborator Author

valentinfernandez1 commented Jul 11, 2024

This is the original message from the Mythical team

We have just now detected the first case where a transaction with status Invalid did not get into a finalized block. With the current node behaviour, we are not able to automatically distinguish between a transaction signalling Invalid that was dropped from the global mempool and a transaction that was not and will get committed eventually.

This means that we can not, in fact, treat Invalid events as non-terminal — which is what we assumed previously — as that would lead to some operations depending on these transactions getting stuck waiting forever.

I withhold judgment on whether we can go to prod with this, but it looks like it will be very uncertain and painful to live with. This happened during normal flow testing with TPS much lower than what we currently have on prod. In any case, the above makes it a super-high priority issue to fix.

We are ready to provide any info we can to assist with fixing it.

@niksaak feel free to post any relevant updates here.

@michalkucharczyk
Copy link

michalkucharczyk commented Jul 13, 2024

Maybe we could give the new transaction pool a try to see how it behaves and to check how stable it is and also to see what problems (if any) are still hidden there? I am still polishing it and code needs to be reviewed, but I think it is a good moment to start gathering more feedback and executing more testing scenarios. I am quite satisfied with test results I have performed so far.

In case anyone is interested it was backported here: #162

Just add --pool-type=fork-aware args to enable it, and -ltxpool=info (or debug) to get some logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants