-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensembling over layers #259
base: main
Are you sure you want to change the base?
Conversation
…into ensembling_layer
…into ensembling_layer
for more information, see https://pre-commit.ci
…into ensembling_layer
for more information, see https://pre-commit.ci
…into ensembling_layer
elk/metrics/eval.py
Outdated
@@ -41,6 +41,73 @@ def to_dict(self, prefix: str = "") -> dict[str, float]: | |||
return {**auroc_dict, **cal_acc_dict, **acc_dict, **cal_dict} | |||
|
|||
|
|||
def calc_auroc(y_logits, y_true, ensembling, num_classes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add annotation
…into ensembling_layer
for more information, see https://pre-commit.ci
…into ensembling_layer
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests run forever on my machine. Need to check what is wrong there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly just fix the handling of the multidataset case
❯ elk elicit gpt2 imdb amazon_polarity --max_examples 10 300 --debug --num_gpus 1
y_logits_collection.append(y_logits) | ||
|
||
# get logits and ground_truth from middle to last layer | ||
middle_index = len(layer_outputs) // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in some ways I think we should allow the layers over which we ensemble to be configurable. E.g. sometimes the last layers perform worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it makes sense to make it configurable. However, I'm curious, how would you decide which layers to pick?
middle_index = len(layer_outputs) // 2 | ||
y_logits_stacked = torch.stack(y_logits_collection[middle_index:]) | ||
# layer prompt_ensembling of the stacked logits | ||
y_logits_stacked_mean = torch.mean(y_logits_stacked, dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the ensembling is done by taking the mean over layers, rather than concatenating. This isn't super clear from comments/docstrings, and hard to tell from reading the code because the shapes aren't commented.
from enum import Enum | ||
|
||
|
||
class PromptEnsembling(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine
elk/training/train.py
Outdated
@@ -53,7 +54,7 @@ def apply_to_layer( | |||
layer: int, | |||
devices: list[str], | |||
world_size: int, | |||
) -> dict[str, pd.DataFrame]: | |||
) -> tuple[dict[str, pd.DataFrame], list[dict]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here regarding return type
elk/run.py
Outdated
try: | ||
for df_dict in tqdm(mapper(func, layers), total=len(layers)): | ||
for k, v in df_dict.items(): | ||
for df_dict, layer_output in tqdm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't write all the appropriate lines for
❯ elk elicit gpt2 imdb amazon_polarity --max_examples 10 300 --debug --num_gpus 1
There should be evaluation results for both imdb and amazon_polarity in the layer_ensembling_results.csv
sorting remove comment
for more information, see https://pre-commit.ci
my fixes for layer ensembling
for more information, see https://pre-commit.ci fix merge
f3319c1
to
64e762a
Compare
Ensembling from mid to last layer