Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] [dask] dask tests failing: unexpected tuple return value somewhere #6739

Open
jameslamb opened this issue Dec 8, 2024 · 4 comments

Comments

@jameslamb
Copy link
Collaborator

Description

Most Dask tests are failing, with errors like this:

Exception: "TypeError('Data must be one of: numpy arrays, pandas dataframes, sparse matrices (from scipy). Got tuple.')"

With a stacktrace indicating it's coming from here:

raise TypeError(
f"Data must be one of: numpy arrays, pandas dataframes, sparse matrices (from scipy). Got {type(seq[0]).__name__}."
)

Reproducible example

This is happening on all Dask test jobs across multiple Python versions, on master and multiple PRs.

Example build: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=17337&view=logs&j=275189f9-c769-596a-7ef9-49fb48a9ab70&t=3a9e7a4a-04e6-52a0-67ea-6e8f6cfda74f

But not ALL CI jobs.

Environment info

It looks like in the failing jobs, we are getting these versions of relevant libraries:

    dask-2024.12.0             |     pyhd8ed1ab_1           7 KB  conda-forge
    dask-core-2024.12.0        |     pyhd8ed1ab_1         884 KB  conda-forge
    dask-expr-1.1.20           |     pyhd8ed1ab_0         182 KB  conda-forge
    distributed-2024.12.0      |     pyhd8ed1ab_1         784 KB  conda-forge
...
    numpy-2.1.3                |  py312h58c1407_0         8.0 MB  conda-forge
...
    pandas-2.2.3               |  py312hf9745cd_1        14.7 MB  conda-forge
...
    scipy-1.14.1               |  py312h62794b6_1        16.8 MB  conda-forge

Additional Comments

N/A

@jmoralez
Copy link
Collaborator

jmoralez commented Dec 9, 2024

I've pinned down the issue, it seems to be a bug in dask. Do you know if we should open an issue in https://github.com/dask/dask or in https://github.com/dask/distributed? It seems to be related to Client.compute on dicts with persisted objects, so I'd lean towards distributed, do you agree?

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 9, 2024

Amazing, thank you!!! Yeah I think if it's about the Client, distributed is a good place to report.

This type of uncertainty and fragmentation of conversation is part of why Dask maintainers are debating merging those repos (and dask-expr as well): dask/community#402

@jmoralez
Copy link
Collaborator

jmoralez commented Dec 9, 2024

Thanks! Opened dask/distributed#8959. Should we pin dask<2024.12 in the meantime?

@jameslamb
Copy link
Collaborator Author

Yeah I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants