-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow Variable Aggregation testing in Keras 3.7.0 #20601
Comments
Follow-up comment. Here's the code for my own MAE metric. If I'm doing something wrong, please let me know.
|
First of all, I'm not very familiar with It seems that the current codebase is missing I can try adding it, but I have no idea if it will resolve the issue. |
Hi, James. I understand your frustration. I am frustrated, too. To be totally honest, I am pretty sure my "waiting for Keras 3" is going to get me fired or something, since I had a fully functioning prototype precipitation Nowcasting GAN in Keras 2/TF 2.15 that now I can't train properly because of the broken legacy Tensorflow support and because I insisted on jumping feet first into Keras 3. My mistake. Furthermore, it looks like Keras doesn't support sub-classed model instances with multiple component models as part of your JAX interface, rendering Keras totally useless for my use case. My team will now be back-porting our system to tf-keras before moving to pytorch entirely. It is unfortunate that as you all are trying to fix and modify things, you don't have a stock multi-GPU testing environment that isn't a Google colab where you can test these obviously back-breaking changes for those of us who are trying to migrate from Keras 2 to Keras 3. Parallel training for Tensorflow is utterly broken in Keras 3, regardless of whether or not it's It isn't difficult to see that Google is going to sunset Tensorflow. That much is clear. JAX is the future for Google. But you all had to have understood that the majority of your user base were Tensorflow users, right? I suppose my request here is that you make it clear that if you have certain use cases from K2 that include but are not limited to:
that one shouldn't use Keras 3 for the time being. |
I thought it was only relevant for MultiWorkerMirroredStrategy, which we don't support in Keras 3. But I am not a tf.distribute expert and I don't know for sure. The fact that the value needs to be different for TPU only, in a way that isn't handled automatically by TF, is weird. I think we can add it -- when we create a tf.Variable, we can do: if tf.distribute.has_strategy():
strategy = tf.distribute.get_strategy()
if is_tpu_strategy(strategy):
synchronization = tf.VariableSynchronization.ON_WRITE
else:
synchronization = tf.VariableSynchronization.AUTO then pass |
Forking this from: #20568
Specifically tagging @james77777778.
tl; dr: It's
SUM
.The recommendation was:
I can make the following report for you:
If it matters at all, this was using a Logical split of an RTX A6000 Ada for testing.
The text was updated successfully, but these errors were encountered: