-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD/DISC: Decreasing build/distribution size #52654
Comments
Yeah, the only part of dateutil which I think we should be using is in |
1 is done already (for cibuildwheel), and has been for a couple years now (with the old build system).
|
|
There are three things from dateutil we use: parser, tz, relativedelta. parser I pretty much agree with Marco we can/should move away from. tz IIUC is going to be subsumed by zoneinfo anyway (i.e. the private attributes we rely on will go away) eventually, so getting rid of our special support for that makes sense. relativedelta we use in a few places (some of which are unnecessary or broken, xref #52569); i havent given much thought to how we could avoid that |
Always a good conversation but I don't think we should change the way we think about C / Cython and fused types to account for this |
It seems that the tests make up about 1/3rd of the distributed package size, so that might be worth reconsidering
|
I can have a look at removing tests (moving them to a separate pandas-tests package). This is probably going to cause some friction, for developers, though (I will comment more on the other issue soon). R.e. points 7 & 8, it is worth noting that pytz and dateutil hardly take up any space, as they are pure Python. (both are < 1MB). I would not worry about those too much. One thing that might be interesting to try PGO/LTO on our C extensions. While I don't think any other projects are doing this, I think Python itself is built with PGO/LTO, and there is an issue in one of the Python repos suggesting using BOLT on module .so libraries (One thing to note, though, is that it would dramatically increase the compile time, and maybe OOM the GHA runners used to build our wheels. There's also the question of what to use as profiling data for PGO) (@WillAyd Do you think this is worth pursuing?) |
Looks interesting. A little out of my wheelhouse but if you have time/interest I say go for it. PGO looks particularly interesting, though I guess we'd have to decide how we want to best train the program for optimization |
On the pytz one, we would also need to roll a replacement for pytz.AmbiguousTimeError |
This came up indirectly in #52509 and I think merits some brainstorming. In no particular order:
Circa 2018 there was discussion of stripping some (debug?) symbols from our C files. No idea if that went anywhere. cc @WillAyd
In the last couple years we have improved perf in some groupby reductions by using fused types in libgroupby to support more dtypes directly without casts. I think this significantly increased the size of libgroupby. We did something similar in libalgos and libhashtable. I think avoiding the casting is worth it, but we should acknowledge the tradeoffs.
Some stuff in _libs could plausibly live outside of cython without a ton of downside. ops_dispatch and reduction come to mind, though these are both quite small. More could move if we learn to live with circular dependencies.
This would be a PITA, but we could distribute some dtype-specific stuff separately e.g.
pip install pandas[sparse] pandas[interval] pandas[period]
and potentially see some big savings that way. This would really be a PITA, but would make a big dent.IIUC moving cython code back to plain C might get some mileage cc @WillAyd again? This wo
Avoid the numpy dependency. (grep finds 1105 "import numpy"s in pandas/, some of them in eg doctests. 33 "cimport numpy"s)
Avoid pytz dependency (xref DEPR: deprecate pytz support #46463 coming up shortly once we drop py38)
Avoid dateutil dependency
There was a discussion [citation needed] of distributing pandas without the tests. I guess that was a "no".
related DEV: reduce the size of the dev environment.yml #49998
The text was updated successfully, but these errors were encountered: