Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize building and casting of EnsoMultiValue #11924

Draft
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Dec 19, 2024

Pull Request Description

This PR will address #11846 by introducing an internal MultiType replacing Type[], but guaranteeing quick (via ==) comparisons necessary to built various inline caches.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Dec 19, 2024
@JaroslavTulach JaroslavTulach self-assigned this Dec 19, 2024
@JaroslavTulach JaroslavTulach marked this pull request as draft December 19, 2024 16:07
@JaroslavTulach JaroslavTulach changed the title Optimize building can casting of EnsoMultiValue Optimize building and casting of EnsoMultiValue Dec 19, 2024
@JaroslavTulach JaroslavTulach linked an issue Dec 20, 2024 that may be closed by this pull request
go 0 0


make_vector type n =
Copy link
Member Author

@JaroslavTulach JaroslavTulach Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dedicated benchmark for EnsoMultiValue instances based on the idea of ArrayProxy one. Currently the more complicated benchmarks refuse to compile and the compiler bails out. The initial results are:

sbt:enso> runtime-benchmarks/benchOnly MultiValueBenchmarks
Benchmark                                                 Mode  Cnt    Score    Error  Units
MultiValueBenchmarks.sumOverComplexAndFloat5              avgt    5  214.330 ±  4.179  ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3  avgt    5  219.803 ± 11.872  ms/op
MultiValueBenchmarks.sumOverFloat1                        avgt    5    0.079 ±  0.006  ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6              avgt    5  219.525 ±  6.393  ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  203.788 ±  9.843  ms/op
MultiValueBenchmarks.sumOverInteger0                      avgt    5    0.074 ±  0.001  ms/op

After 630ec62 the results are better:

MultiValueBenchmarks.sumOverComplexAndFloat5              avgt    5  30.109 ± 0.661  ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3  avgt    5  26.988 ± 0.446  ms/op
MultiValueBenchmarks.sumOverFloat1                        avgt    5   0.078 ± 0.003  ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6              avgt    5  27.821 ± 0.856  ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  27.961 ± 0.263  ms/op
MultiValueBenchmarks.sumOverInteger0                      avgt    5   0.078 ± 0.002  ms/op

and there are no bailouts. Time to really speed things up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* The <b>base benchmark</b> for this suite. Measures how much it takes to access an Atom in a
* Vector, read {@code re:Float} field out of it and sum all of them together.
*/
@Benchmark
Copy link
Member Author

@JaroslavTulach JaroslavTulach Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the base benchmark. The results after 4dacf53 are:

# the base one
MultiValueBenchmarks.sumOverComplexBaseBenchmark0         avgt    5  0.139 ± 0.013  ms/op

# these two are supposed to be faster
MultiValueBenchmarks.sumOverInteger1                      avgt    5  0.065 ± 0.003  ms/op
MultiValueBenchmarks.sumOverFloat2                        avgt    5  0.073 ± 0.002  ms/op

# these should catch up with sumOverComplexBaseBenchmark0 one day
MultiValueBenchmarks.sumOverComplexAndFloat5              avgt    5  8.580 ± 0.326  ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3  avgt    5  9.118 ± 0.483  ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6              avgt    5  8.110 ± 0.160  ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  9.393 ± 0.648  ms/op

still 60 times slower than it should be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current problem is in reorderOnly branch:

reorderOnly

It seems to always allocate new arrays. Rather it should treat EnsoMultiType as compilation constants have everything ready from previous run.

if (reorderOnly) {
var copyTypes = allTypesWith.executeAllTypes(dispatch, mv.extra);
if (i == 0 && dispatch.typesLength() == 1) {
return newNode.newValue(copyTypes, 1, mv.values);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if seems to speed sumOverFloatComplexRecastedToFloat4 up twice:

[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  1.071 ± 0.074  ms/op
[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  2.322 ± 0.086  ms/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark EnsoMultiValue and speed it up
1 participant