-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a single default convention #180
Comments
@juliohm I am not sure if there are people using different conventions apart from the using ScientificTypes # By this you opt to using the `DefaultConvention` (which I believe you already do)
ScientificTypes.scitype(::YourDefinedType) = ...
I believe this addresses your concerns? |
I still believe that it is a valid question to ask. Do we really want to support multiple conventions? Is there a real need or just a feature that is there because we can? I will try to use the suggested implementation in CoDa.jl to see if it works. |
I've ended up using ScientificTypesBase.jl with my convention, but just in order to avoid the heavy dependencies of ScientificTypes.jl. It'd be great if a lighter version was available. |
Davi can you comment on how your convention differs from the default
convention? You copied pasted it like I did in the past or had to actually
modify something?
…On Fri, Jan 28, 2022, 17:37 Davi Sales Barreira ***@***.***> wrote:
I've ended up using ScientificTypesBase.jl with my convention, but just in
order to avoid the heavy dependencies of ScientificTypes.jl. It'd be great
if a lighter version was available.
—
Reply to this email directly, view it on GitHub
<#180 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZQW3NCXSVBOALIS4AS7RDUYL5BJANCNFSM5KVFQ7EA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Pretty much a copy. But I didn't need all the types. Here is the whole thing: struct JPlotsConvention <: ScientificTypesBase.Convention end
scitype(::Integer, ::JPlotsConvention) = Count
scitype(::AbstractString, ::JPlotsConvention) = Multiclass
scitype(::AbstractChar, ::JPlotsConvention) = Multiclass
scitype(::AbstractFloat, ::JPlotsConvention) = Continuous
function coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing, Continuous}}
) where T <: Union{Missing, Real}
return float(y)
end
function coerce(y::T, T2::Type{<:Union{Missing, Continuous}}
) where T <: Union{Missing, Real}
return float(y)
end
function coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing, Count}}
) where T <: Union{Missing, Real}
return convert.(Int,y)
end
function coerce(y::T, T2::Type{<:Union{Missing, Count}}
) where T <: Union{Missing, Real}
return convert(Int,y)
end
function coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing, Multiclass}}
) where T <: Union{Missing, Real}
return string.(y)
end
function coerce(y::T, T2::Type{<:Union{Missing, Multiclass}}
) where T <: Union{Missing, Real}
return string(y)
end
|
Yes the same here. I really believe the that default convention should
actually be a universal convention. There is value in standarding it across
the ecosystem.
…On Fri, Jan 28, 2022, 17:48 Davi Sales Barreira ***@***.***> wrote:
Pretty much a copy. But I didn't need all the types. Here is the whole
thing:
struct JPlotsConvention <: ScientificTypesBase.Convention end
scitype(::Integer, ::JPlotsConvention) = Countscitype(::AbstractString, ::JPlotsConvention) = Multiclassscitype(::AbstractChar, ::JPlotsConvention) = Multiclassscitype(::AbstractFloat, ::JPlotsConvention) = Continuous
function coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing, Continuous}}
) where T <: Union{Missing, Real}
return float(y)end
function coerce(y::T, T2::Type{<:Union{Missing, Continuous}}
) where T <: Union{Missing, Real}
return float(y)endfunction coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing, Count}}
) where T <: Union{Missing, Real}
return convert.(Int,y)end
function coerce(y::T, T2::Type{<:Union{Missing, Count}}
) where T <: Union{Missing, Real}
return convert(Int,y)endfunction coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing, Multiclass}}
) where T <: Union{Missing, Real}
return string.(y)end
function coerce(y::T, T2::Type{<:Union{Missing, Multiclass}}
) where T <: Union{Missing, Real}
return string(y)end
—
Reply to this email directly, view it on GitHub
<#180 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZQW3IKL6QJHNIHEK7HBG3UYL6LJANCNFSM5KVFQ7EA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@davibarreira Thanks for chiming in here. Be good to know what dependencies in ScientificTypes you don't need in your use case. Please check those you don't need.
Possibly we can move some of the ColorTypes might be too but is super light-weight already, depending only on FixedPointNumbers. StatisticalTraits is super light-weight, depending only on ScientificTypesBase, which is already needed. |
At the moment, my package does not use
I'm developing a plotting package, but there is still a lot to be done. I'm trying to avoid heavy dependencies to keep the precompilation time reasonable. Some of the packages I'll list as not using, might end up being used in another "helper" package, like StatsPlots is to Plots. Here is what I'm not using:
As you've pointed out, some of these packages are quite light, and I might end up using them (e.g. StatisticalTraits), but at the moment, they are not part of my dependencies. |
@ablaom I am bumping into this issue again in other package now where I simply want to make use of the My general recommendation remains: I wish we could erase this idea of multiple competing conventions in favor of a well-thought single convention that everyone uses across different ecosystems. |
The use case that most people have is the following:
I understand that the implementation of I would love to erase this "default convention" idea and use a single convention everywhere, and would also be happy to move all methods of Please let me know if that can be done here in JuliaAI, otherwise we will have to roll our own scientific types stack moving forward. |
Thanks for the comments, @juliohm . I am currently on holiday and not so active in this space just now. I am not opposed to a move to abandon multiple conventions. I have also run into problems with the status quo. Specifically, the problem is that the I agree that overloading Some thoughts moving forward (ST = ScientificTypes.jl, STBase = ScientificTypesBase.jl):
Mmm. I'm not sure about that. Generally, I don't expect much functionality from a Base |
I think we can do better without package extensions. We just need a single package SciTypesBase.jl that defines the function I will try to work on it next week. Copy/paste the whole ScientificTypesBase.jl and ScientificTypes.jl code into a single module and clean it up to reduce to a single convention that everyone can use safely across ecosystems. |
I'm having some second thoughts about including Tables.jl as a dep of the STB pkg. As @OkonSamuel has pointed out, we have this dependency: MLJModelInterface -> ScientificTypesBase This makes MLJModelInterface super light. If we add Tables.jl as a dependency, then this will upset a lot of providers of 3rd party MLJ model interfaces. For example, a package like EvoTrees.lj does not want Tables in its deps. Indeed, this was the original motivation for splitting ST into two packages. The status quo, has Tables.jl in ST instead of STB, but this has one cost: The Note that the only part of STB needed by MLJModelInterface are the scientific types themselves, and the |
In view of the constraints just mentioned, and after consulting with @OkonSamuel, here is a rough proposal of what we could do. Basically, we are keeping the basic structure of the two packages but getting rid of alternative conventions, which means extending scitypes requires only the super lightweight dependency What ScientificTypesBase.jl (super lightweight) provides:
abstract type Tabularity end
struct Tabular <: Tabularity end
struct NonTabular <: Tabularity end
scitype(X, ::NonTabular) = base_scitype(X)
base_scitype(::Missing) = Missing
base_scitype(::Nothing) = Nothing
base_scitype(::AbstractArray) = ...
base_scitype(::Tuple) = ...
for some type What ScientificTypes.jl (medium weight) provides:
scitype(X) = ScientificTypesBase.scitype(X, tabularity(X))
coerce(X, options...) = ScientificTypesBase.coerce(X, tabularity(X), options...)
coerce!(X) = ScientificTypesBase.coerce(X, tabularity(X), options...) where tabularity(X) = Tables.istable(X) ? Tabular() : NonTabular() or, some complication of that to deal with some corner cases (see existing code).
And, in an extension module with weak dep DataFrames.jl, an extension of
What 3rd party packages do to extend scitype:They only need
For example, they do If it's conceptually cleaner, they could just extend |
@OkonSamuel may have some time to work on this. Given the constraints we have explained, are you happy with this proposal? |
Hi @ablaom , sorry for not replying earlier. I was traveling and forgot to address the proposal. Unfortunately the proposed changes are overly complex for the applications we have in mind and don't seem to address the original issue I raised, which is the inability to use the At a first glance, I don't get the Tabularity traits in the proposal and think that they are unnecessary. I feel that the design could be much simpler and easier to use with a core package that implements methods for built-in Julia types and Tables.jl (no need to dispatch on any specific table type). If @OkonSamuel has time to improve the ecosystem here, that would be great anyways. We will come back to this issue of scientific types in the future, and will probably try a different approach to it. |
@juliohm Thanks for that response. I am happy for a counterproposal, but it should address the issue that we do not want Tables.jl to be a dependency of packages that only want to import the scientific types themselves (e.g., MLJModelInterface.jl). I cannot find anything in your discussion addressing this issue, which is key for us. |
Hi @ablaom, trying to catch up with multiple tasks here... Coming back to the issue with scitypes: why can't we have a SciTypesBase.jl package that defines the functions and implementations for built-in Julia types, and that gets rid of this concept of convention? That way 3rd-party packages can load the same well-thought convention for built-in Julia types and extend the convention with custom types. The new SciTypes.jl package could then add Tables.jl + SciTypesBase.jl and define fallbacks for tables that use these core function names. The whole story doesn't need to involve 3rd-party packages. Package developers can depend on SciTypeBase.jl which will have definitions for built-in types, and end-users can load SciTypes.jl for actual workflows with tabular data. Let me know if you have a similar development path in mind. We could certainly speed up this cleanup as it is becoming more and more pressing in some downstream applications. |
We migrated our stack to DataScienceTraits.jl where the issues above are properly solved: |
Re-opening as I still think reverting to a single convention still makes sense, and DataScienceTraits.jl cannot currently replace ScientificTypes.jl. For example, there is no container support (scitypes for arrays and tables, type coercion for containers, etc). Also, an incompatible way of handling the distinction between ordered and unordered categoricals (there is only a single |
We do have scitype of arrays, but our definition is different. We don't think that knowing Vector{Continuous} helps with dispatch, we just need to know the
Yes, we opted to only use |
I am genuinely interested in learning about the use cases of custom conventions. Do you have practical use cases where users needed to customize the default convention? What is the value of having multiple conventions? Does it actually help the community converge into something? My impression is that everyone is using DefaultConvention and we are maintaining code that is unnecessarily complex with support for multiple conventions. If a package A relies on DefaultConvention and package B relies on CustomConvention, what happens? Do we really want to support such use cases?
Appreciate if you can clarify these questions. I can help with JuliaAI/ScientificTypesBase.jl#21 after this is sorted out. My personal opinion is that we should not waste time modeling multiple conventions and should ask the community to adhere to a single convention for scientific types. It has tremendous benefits in more complex pipelines.
The text was updated successfully, but these errors were encountered: