You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue has been raised that schema, applied to a table, currently has to concretely manifest each column, as way of extracting it's scitype, which for row-based tables is inefficient. The reason for the current implementation is that, in general, the column element scitype cannot be inferred from column element machine type.
Here are some details so that someone interested can explore a workaround, which I think is certainly possible.
At present (and this might change) the only time the scitype of an array A cannot be determined from the machine type is if
eltype(A) <: CategoricalArrays.CategoricalValue
This is because the scitype depends on: (i) whether or the pool is ordered, and (ii) the number of levels. Neither of these are in the machine type - they must be extracted from an instance. However, it is safe to assume that all elements have the same scitype, because it is very unusual for an array to have inhomogeneous pools (the CategoricalPool contains the order/levels information). Indeed, CategoricalArrays goes to great lengths to ensure creating of such arrays is difficult. Under this assumption, one can therefore compute the scitype of A by looking at just the first element (which for Tables, means looking just at the first row).
The issue has been raised that
schema
, applied to a table, currently has to concretely manifest each column, as way of extracting it's scitype, which for row-based tables is inefficient. The reason for the current implementation is that, in general, the column element scitype cannot be inferred from column element machine type.Here are some details so that someone interested can explore a workaround, which I think is certainly possible.
At present (and this might change) the only time the scitype of an array
A
cannot be determined from the machine type is ifThis is because the
scitype
depends on: (i) whether or the pool is ordered, and (ii) the number of levels. Neither of these are in the machine type - they must be extracted from an instance. However, it is safe to assume that all elements have the same scitype, because it is very unusual for an array to have inhomogeneous pools (theCategoricalPool
contains the order/levels information). Indeed, CategoricalArrays goes to great lengths to ensure creating of such arrays is difficult. Under this assumption, one can therefore compute the scitype ofA
by looking at just the first element (which for Tables, means looking just at the first row).cc @OkonSamuel
The text was updated successfully, but these errors were encountered: