-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] simplify eval result printing #6749
Conversation
python-package/lightgbm/callback.py
Outdated
return f"{value[0]}'s {value[1]}: {value[2]:g}" | ||
else: | ||
raise ValueError("Wrong metric value") | ||
eval_name, metric_name, eval_result, *_ = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of using similar names to the ones here
LightGBM/python-package/lightgbm/basic.py
Line 5215 in 53e0ddf
ret.append((data_name, eval_name, val, is_higher_better)) |
I find the first one a bit confusing, since it's the name of the validation set. Maybe something like
dataset_name, metric_name, metric_value, *_ = value
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right... I think eval_name
and eval_result
are confusing. I love your suggested names, agree we should standardize on those (here and in future PRs like this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 6b63377
I'll do more of this renaming in future PRs.
else: | ||
return f"{value[0]}'s {value[1]}: {value[2]:g}" | ||
else: | ||
raise ValueError("Wrong metric value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're loosing this error, not sure if it can happen but may be worth adding a check at the start like:
if len(value) not in [4, 5]:
raise ValueError("Wrong metric value")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't think this is a very helpful error message. "Wrong metric value" doesn't really describe the problem here, and you'll end up walking the stacktrace to find this point in the code anyway.
Originally in this PR, I was thinking we could just keep this simple and let Python's "too many values to unpack" or "not enough values to unpack" tuple-unpacking errors convey that information. But thinking about it more... removing this exception means that you could now implement a custom metric function that returns tuples with more than 5 elements and LightGB would silently accept it. I think it's valuable to continue preventing that, to reserve the 6th, 7th, etc. elements for future LightGBM-internal purposes.
I'll put back an exception here using logic like you suggested... but changing the message to something a bit more helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright so I tried to add an error message here, and then write a test to check that it's raised... and I couldn't find a public code path that'd allow a tuple with too few or too many tuples to get to this point.
Here's what I tried:
import lightgbm as lgb
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=10_000, n_features=3)
def constant_metric(preds, train_data):
# returns 4 elements (should be 3)
return ("error", 0.0, False, "too-much")
dtrain = lgb.Dataset(X, label=y).construct()
dvalid = dtrain.create_valid(X, label=y)
bst = lgb.train(
params={
"objective": "regression"
},
train_set=dtrain,
feval=[constant_metric],
valid_sets=[dvalid],
valid_names=["valid0"]
)
That gets stopped earlier:
[LightGBM] [Info] Total Bins 765
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 3
[LightGBM] [Info] Start training from score -0.471765
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jlamb/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py", line 329, in train
evaluation_result_list.extend(booster.eval_valid(feval))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jlamb/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 4445, in eval_valid
return [
^
File "/Users/jlamb/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 4448, in <listcomp>
for item in self.__inner_eval(self.name_valid_sets[i - 1], i, feval)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jlamb/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 5214, in __inner_eval
eval_name, val, is_higher_better = feval_ret
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 3)
Here:
LightGBM/python-package/lightgbm/basic.py
Lines 5209 to 5215 in b33a12e
feval_ret = eval_function(self.__inner_predict(data_idx), cur_data) | |
if isinstance(feval_ret, list): | |
for eval_name, val, is_higher_better in feval_ret: | |
ret.append((data_name, eval_name, val, is_higher_better)) | |
else: | |
eval_name, val, is_higher_better = feval_ret | |
ret.append((data_name, eval_name, val, is_higher_better)) |
So I think that maybe this error message we're talking about was effectively unreachable.
And I don't think we should add a custom and slightly more informative error message there in Booster.__inner_eval()
:
- that code will raise a
ValueError
if anything other than exactly a 3-item tuple (or list of such tuples) is returned... so no need to add more protection against tuples of other sizes - that part of the code is already kind of complicated
- that part of the code runs on every iteration, so adding an
if len() ..: raise
would add a bit of extra overhead to every iteration
Co-authored-by: José Morales <[email protected]>
…tGBM into python/format-eval-result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice refactoring!
Thanks! I'll merge this and then continue with #6748 soon. |
Proposes a small refactor of
_format_eval_result()
, used to populate log messages like this:LightGBM/python-package/lightgbm/callback.py
Lines 99 to 100 in 53e0ddf
This is a first step in the broader refactoring proposed in #6748 ... see that issue for more details.