Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a more efficient schema methods for row-based tables #127

Open
ablaom opened this issue May 31, 2021 · 1 comment
Open

Towards a more efficient schema methods for row-based tables #127

ablaom opened this issue May 31, 2021 · 1 comment

Comments

@ablaom
Copy link
Member

ablaom commented May 31, 2021

The issue has been raised that schema, applied to a table, currently has to concretely manifest each column, as way of extracting it's scitype, which for row-based tables is inefficient. The reason for the current implementation is that, in general, the column element scitype cannot be inferred from column element machine type.

Here are some details so that someone interested can explore a workaround, which I think is certainly possible.

At present (and this might change) the only time the scitype of an array A cannot be determined from the machine type is if

eltype(A) <: CategoricalArrays.CategoricalValue    

This is because the scitype depends on: (i) whether or the pool is ordered, and (ii) the number of levels. Neither of these are in the machine type - they must be extracted from an instance. However, it is safe to assume that all elements have the same scitype, because it is very unusual for an array to have inhomogeneous pools (the CategoricalPool contains the order/levels information). Indeed, CategoricalArrays goes to great lengths to ensure creating of such arrays is difficult. Under this assumption, one can therefore compute the scitype of A by looking at just the first element (which for Tables, means looking just at the first row).

cc @OkonSamuel

@ablaom ablaom transferred this issue from JuliaAI/MLJScientificTypes.jl Jun 21, 2021
@ablaom
Copy link
Member Author

ablaom commented Jul 7, 2021

@OkonSamuel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant