Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress Test for Actor Placement #145

Open
RyanLettieri opened this issue Aug 26, 2022 · 0 comments
Open

Stress Test for Actor Placement #145

RyanLettieri opened this issue Aug 26, 2022 · 0 comments

Comments

@RyanLettieri
Copy link

RyanLettieri commented Aug 26, 2022

Introduction

Actor resiliency and stability are crucial aspects of Dapr. When a service containing an actor is restarted or brought down, actors will not properly be invoked. Testing should be created in order to ensure that this problem doesn't occur, and if it does occur, the amount of non-invoked actors should be very limited.

Along with the issue of actors not properly being invoked upon various restarts, another problem persists. When a cluster or app container is restarted, Double Activation (the same actor being invoked multiple times) can occur. This causes additional unnecessary resources to be used.

The tests being proposed will test the issue of actors not being re-invoked upon application restarts, but they will also test the issue of multiple instances of an actor being invoked.

Environment

  • Up to two clients
  • Up to two service applications each containing a single actor
  • The inclusion of a third actor that exists in both services.
  • Across all 3 scenarios, the services will be identical. This is to ensure that if a single service is taken down, that the requests can be routed to the remaining service.

Test Scenarios

  • Three separate scenarios will be executed in this test suite.

  • Scenario One: 1 Client and 1 Service application containing a single actor
    image

  • Scenario Two: 2 Clients and 2 service applications, each of which containing a single actor:
    image

  • Scenario Three: 2 Clients and 2 service applications, each of which containing a single actor, but this time a third actor will be created and service by the local system:

    • To start this scenario, both clients will be attempting to invoke actor3. Once both clients have successfully invoked the actor, the service invoking that actor will be taken down.
    • Once the service is taken down, the clients should be able to continue to invoke actor3 since it lives on both services
    • After successfully invoking actor3, the remaining service will be taken down. This should result in a failure when attempting to invoke actor3 since both services are down.
    • Next, the first service that was taken down will be brought back up and the expected result is that both clients should be able to invoke actor3 once again.
    • Finally, the first service will be taken down and the second service will be brought back up. Upon attempting to invoke actor3 the expected result is that both clients should be able to invoke actor3 once again.
    • Note that when an actor is invoked per the above steps, it will not be a single call. The actor will be invoked multiple times and it is expected that it is successful every time
    • Additional testing will include a resiliency policy that covers retries. This is needed to ensure 100% success upon invoking the actor since there will be a period of time where at least one, or both, services are unreachable.
      image

Failure Detection

  • Check for double activation
    • Fail the test if the number of actors before the shutdown occurs is not equal to the number of actors after the restarts occur
  • Ensure client can reach an actor performing the requested activity
    • Fail the test if upon service restart, the client is unable to communicate with the actor and unable to complete the requested activity

Definition of Success

  • No double activation can occur
  • Jobs that are being run by the actors should continue execution when an actor comes back up
  • All N invoke calls from the clients to the actors should be successful
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant