You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, Sightglass uses a single threshold based on a confidence interval computed by Behrens-Fisher to determine whether a sampled statistic shifted between configurations.
The result of this is that we get either "changed" (i.e., benchmark got 5% faster) or "not changed". However, the latter answer can also appear if we simply don't have enough data points to prove statistical significance, or if the system is too noisy.
This "false negative" is somewhat dangerous: we could make a change, see that it is performance-neutral according to Sightglass, and accept it, but actually we just didn't turn the knobs up high enough.
Ideally, Sightglass should provide a third output of "unsure" if the measurements aren't precise enough to prove either "changed" or "not changed" to the desired confidence.
The text was updated successfully, but these errors were encountered:
Right now, Sightglass uses a single threshold based on a confidence interval computed by Behrens-Fisher to determine whether a sampled statistic shifted between configurations.
The result of this is that we get either "changed" (i.e., benchmark got 5% faster) or "not changed". However, the latter answer can also appear if we simply don't have enough data points to prove statistical significance, or if the system is too noisy.
This "false negative" is somewhat dangerous: we could make a change, see that it is performance-neutral according to Sightglass, and accept it, but actually we just didn't turn the knobs up high enough.
Ideally, Sightglass should provide a third output of "unsure" if the measurements aren't precise enough to prove either "changed" or "not changed" to the desired confidence.
The text was updated successfully, but these errors were encountered: