Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply short circuit evaluation regarding generic attributes when querying #4422

Open
teyyubismayil opened this issue Dec 6, 2024 · 3 comments

Comments

@teyyubismayil
Copy link
Contributor

teyyubismayil commented Dec 6, 2024

Is your feature request related to a problem? Please describe.
When my query includes generic key its performance drops very much even if this condition don't contribute to query result.
For example following query executes in 5 seconds and in my case returns empty result:
{resource.cluster="some_cluster" && resource.namespace="some_namespace" && resource.service.name="some_service_name" && status=error}
But following query responds in 16 seconds even though it also return empty list by conditions on non generic attributes:
{resource.cluster="some_cluster" && resource.namespace="some_namespace" && resource.service.name="some_service_name" && status=error && event.exception.type="some_exception"}

As a note following query times out after 30 seconds:
{resource.cluster="some_cluster" && resource.namespace="some_namespace" && resource.service.name="some_service_name" && status=error && event.exception.type=~".+"}

Describe the solution you'd like
I think it should be possible to do some kind of short circuit evaluation so first non generic attributes conditions are checked and if all match then check generic attributes.

Describe alternatives you've considered
I have considered to move event.exception.type to dedicated column but currently there is no such feature.

Additional context
Parquet v4 is used.

@joe-elliott
Copy link
Member

When Tempo attempts to execute a TraceQL query it can either do that entirely in the fetch layer or it is required to pass data into the query engine for evaluation. All of the queries you have posted are "fast" queries that can be done entirely in the fetch layer because:

  1. All of your attributes are fully scoped
  2. They are all a bunch of &&ed conditions in a single spanset

If the query is a "slow" query and falls back to the engine we do short circuit conditions and attempt to reorder them for performance. These improvements are coming in 2.7.

In the fetch layer for "fast" queries we have a more basic attempt at this in the JoinIterator. In this case we swap one of the iterators to the top that we hope is better at filtering data and use it to drive the others. Ideally, I think we'd reuse the branch prediction code from the engine to record the "cost" of the sub iterators and reorder them all appropriately.

Your event exception type is even more complex b/c the iterations is happening at a deeper level than the resource conditions. I'd have to dump the iterators for that query for a deeper understanding of what it did. My expectation is the same as yours that adding the event.exception.type condition would not have a large impact on that queries response time.

We do have some regex perf improvements in 2.7 which should help somewhat, but I know that's not the focus of this question.

@joe-elliott
Copy link
Member

Anecdotally I just ran these 3 queries and they all took roughly the same amount of time and returned no results.

{ resource.service.name = "foo" }
{ resource.service.name = "foo" && span.bar = "baz" }
{ resource.service.name = "foo" && span.bar = "baz" && event.exception.type =~ "bat" }

@teyyubismayil
Copy link
Contributor Author

Thanks for detailed answer @joe-elliott.

Our traces mostly have event.exception and in event.exception we have stacktrace field which is pretty long so I think that is why in your cluster you did not notice difference in queries response times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants