-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize building and casting of EnsoMultiValue #11924
base: develop
Are you sure you want to change the base?
Conversation
go 0 0 | ||
|
||
|
||
make_vector type n = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dedicated benchmark for EnsoMultiValue
instances based on the idea of ArrayProxy one. Currently the more complicated benchmarks refuse to compile and the compiler bails out. The initial results are:
sbt:enso> runtime-benchmarks/benchOnly MultiValueBenchmarks
Benchmark Mode Cnt Score Error Units
MultiValueBenchmarks.sumOverComplexAndFloat5 avgt 5 214.330 ± 4.179 ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3 avgt 5 219.803 ± 11.872 ms/op
MultiValueBenchmarks.sumOverFloat1 avgt 5 0.079 ± 0.006 ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6 avgt 5 219.525 ± 6.393 ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 203.788 ± 9.843 ms/op
MultiValueBenchmarks.sumOverInteger0 avgt 5 0.074 ± 0.001 ms/op
After 630ec62 the results are better:
MultiValueBenchmarks.sumOverComplexAndFloat5 avgt 5 30.109 ± 0.661 ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3 avgt 5 26.988 ± 0.446 ms/op
MultiValueBenchmarks.sumOverFloat1 avgt 5 0.078 ± 0.003 ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6 avgt 5 27.821 ± 0.856 ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 27.961 ± 0.263 ms/op
MultiValueBenchmarks.sumOverInteger0 avgt 5 0.078 ± 0.002 ms/op
and there are no bailouts. Time to really speed things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some speedup achieved...
…nsoMultiValue instance
* The <b>base benchmark</b> for this suite. Measures how much it takes to access an Atom in a | ||
* Vector, read {@code re:Float} field out of it and sum all of them together. | ||
*/ | ||
@Benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now the base benchmark. The results after 4dacf53 are:
# the base one
MultiValueBenchmarks.sumOverComplexBaseBenchmark0 avgt 5 0.139 ± 0.013 ms/op
# these two are supposed to be faster
MultiValueBenchmarks.sumOverInteger1 avgt 5 0.065 ± 0.003 ms/op
MultiValueBenchmarks.sumOverFloat2 avgt 5 0.073 ± 0.002 ms/op
# these should catch up with sumOverComplexBaseBenchmark0 one day
MultiValueBenchmarks.sumOverComplexAndFloat5 avgt 5 8.580 ± 0.326 ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3 avgt 5 9.118 ± 0.483 ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6 avgt 5 8.110 ± 0.160 ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 9.393 ± 0.648 ms/op
still 60 times slower than it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (reorderOnly) { | ||
var copyTypes = allTypesWith.executeAllTypes(dispatch, mv.extra); | ||
if (i == 0 && dispatch.typesLength() == 1) { | ||
return newNode.newValue(copyTypes, 1, mv.values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
seems to speed sumOverFloatComplexRecastedToFloat4
up twice:
[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 1.071 ± 0.074 ms/op
[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 2.322 ± 0.086 ms/op
Pull Request Description
This PR will address #11846 by introducing an internal
MultiType
replacingType[]
, but guaranteeing quick (via==
) comparisons necessary to built various inline caches.Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,