Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't convert time data to timedelta by default #940

Closed
wants to merge 2 commits into from

Conversation

ocefpaf
Copy link
Contributor

@ocefpaf ocefpaf commented Aug 4, 2016

I don't really like this PR... Too much change for a simple thing. I may try again soon.

Closes #843

@ocefpaf ocefpaf force-pushed the no_default_timedelta_conv branch 3 times, most recently from 4e052f5 to 98ed32e Compare August 4, 2016 13:15
@ocefpaf
Copy link
Contributor Author

ocefpaf commented Aug 4, 2016

This is ready for review.

Here is an example of this PR in action showing how plotting and working with periods gets easier with floats instead of timedeltas.

BTW, in light of #939 (comment), I wonder if decode_timedeltas and decode_datetimes (or even the original decode_times) are needed at all. (Just defending myself as I don't really want a new keyword argument but a better default for time data in general 😬)

Maybe I am being thick and I don't know enough use cases data but I cannot see why someone might want to convert time data (which most of the time represent periods) to timedeltas. That breaks from the original data units and forces the user to compute it back and convert the dtype too before for plotting, etc.

Regarding time coordinate itself I understand that xarray, due to the pandas nature of the index, needs to fail to convert time when the calendar is not supported by pandas. Maybe, instead of throwing an error and asking the users to use the option decode_(date)times=False in case of failure, xarray could issue an warning and return only the "numbers" as if decode_(date)times were set to False.

I understand, and agree most of the time, that raising erros is better than issuing warnings, and creating an ambiguity in the returns. So maybe this one is harder to change than the former.

@shoyer
Copy link
Member

shoyer commented Aug 4, 2016

Decoding time units into timedelta64 is useful if you want to be able to use them for arithmetic with datetimes. I agree that this is not always the most useful currently because, e.g., plotting doesn't work very well, and time deltas in nanoseconds are difficult to interpret. Still I'm somewhat reluctant to change the default here.

Either way, we will definitely need to support the original decode_times keyword argument for now (with a deprecation warning) so there is a smooth transition.

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Aug 4, 2016

Decoding time units into timedelta64 is useful if you want to be able to use them for arithmetic with datetimes.

I get that but my question is how often do users perform do such operations? Again I am biased b/c with my data I never want to do that as it does not make sense. And, when it does make sense, I believe that the price of post conversion is worth the advantages of converting by default.

Still I'm somewhat reluctant to change the default here.

If you want decode_timedeltas=True as the default I am not sure it is worth the extra keyword then. My goal with this PR is to avoid extra steps. If users need to set it to True it is not too different from doing data.values / 1e9. If all we get is to change one step for another I prefer to not make the open_dataset more complex and leave it as is.

Feel free to close this. I don't have strong feelings about what xarray should do by default. I just want to make it more convenient for my uses cases.

@shoyer
Copy link
Member

shoyer commented Aug 4, 2016

It might be worth querying the mailing list for more opinions here.

If the main issue is plotting, you could try fixing that upstream, too! pandas-dev/pandas#8711

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Aug 4, 2016

If the main issue is plotting, you could try fixing that upstream, too! pandas-dev/pandas#8711

Indeed! That makes sense to matter what is decided here. Thanks for pointing that out. (Not sure if I am up to the challenge though.)

It might be worth querying the mailing list for more opinions here.

Done!

@ocefpaf
Copy link
Contributor Author

ocefpaf commented Aug 11, 2016

@shoyer the more I think about this the more I don't like the addition of extra keywords. Even though I would like this behavior to be the default one I really do not like the complexity I added here.

Closing this...

@ocefpaf ocefpaf closed this Aug 11, 2016
@ocefpaf ocefpaf deleted the no_default_timedelta_conv branch August 11, 2016 16:15
@ocefpaf ocefpaf mentioned this pull request May 4, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants