What breaks ABI? #3961

g0djan · 2024-12-16T15:31:29Z

Hi, I have a POC of a feature that let's pause WASM VM with a signal interruption, write function indexes of stackframes to a file and resume.

To make it async-signal-safe(signal is delivered asynchronously with pthread_kill from another native thread) I need to have an atomic copy of this variable exec_env->wasm_stack->top https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/common/wasm_exec_env.h#L85 Aomic basically would mean that updating valued of the variable couldn't be interrupted by signal and the pointer holds a valid value.

I consider 3 options on how to store it but @loganek and I were discussing that it might break ABI:

Make exec_env->wasm_stack->top atomic, I made a proof of concept of that . The idea is to check that sizeof uint8 and sizeof _Atomic(uint8) are the same and then set a compilation flag(needs to be added) to true otherwise don't enable the feature.
Write the value to the end of the stack, it's usually not used by anyone. A bit harder to access from wamrc in my opinion but probably shouldn't break ABI for anything
Just add a field to exec_env, but I believe it will break ABI, right?

I would appreciate an advice on that @wenyongh @yamt

Context: we need this feature to report stacktraces when WASM/AOT stuck somewhere (e.g. deadlock, infinite loop, a long operation). Sometimes the app unfreezes so we would like avoid crashing or terminating VM to get those stacktraces

The text was updated successfully, but these errors were encountered:

yamt · 2024-12-17T01:49:32Z

have you considered to use the existing check_suspend_flags mechanism used by wasm_cluster_suspend_thread, interpreter debug-engine, wasm_runtime_terminate, etc?
_Atomic is for threading and quite expensive. for signals it's overkill in general.

g0djan · 2024-12-17T15:50:36Z

Hi, thanks for the quick reply.

have you considered to use the existing check_suspend_flags mechanism used by wasm_cluster_suspend_thread, interpreter debug-engine, wasm_runtime_terminate, etc?

We did consider wasm_runtime_terminate and I had an internal POC with it as well. But we chose to proceed with signal interruption approach because:

It allows us avoid unnecessary app terminations
I ran into problem with AOT, it would clear the frames in aot compiled code before I could access any to report(I managed to workaround it though).

We didn't really consider wasm_cluster_suspend_thread, I need to read the code. One complication might be calling it from another native thread. And another - we probably need to get a response that we can start read frames and I'm not sure it can be done if the app stuck.

_Atomic is for threading and quite expensive
Yes, we realise that. However today I just compared parallel-wasm-gzip compiled to AOT with my changes and without and compared benchmarks on compressing 1GB file with benchamrk measurement tool https://github.com/sharkdp/hyperfine

There were no clear signs of any degradation. (upd: I think I got some on low end device, I'll share later)

I ran compressing 1GB with each version for 10 iterations with hyperfine. I tried 4 times in total: 2 times version without my changes was better, 2 times version with my changes was better.

Here are 2 examples of output

godjan@ud6d5de19674f5f:~/github/wasm-parallel-gzip$ hyperfine --warmup 2 'make benchmark.wamr.nopatch' 'make benchmark.wamr'Benchmark 1: make benchmark.wamr.nopatch
 Time (mean ± σ): 19.877 s ± 2.828 s [User: 34.389 s, System: 1.771 s]
 Range (min … max): 17.379 s … 23.536 s 10 runs

Benchmark 2: make benchmark.wamr
 Time (mean ± σ): 20.840 s ± 2.867 s [User: 36.400 s, System: 1.808 s]
 Range (min … max): 17.319 s … 24.151 s 10 runs

Summary
 'make benchmark.wamr.nopatch' ran
 1.05 ± 0.21 times faster than 'make benchmark.wamr'

godjan@ud6d5de19674f5f:~/github/wasm-parallel-gzip$ hyperfine --warmup 2 'make benchmark.wamr' 'make benchmark.wamr.nopatch'
Benchmark 1: make benchmark.wamr
 Time (mean ± σ): 20.768 s ± 3.547 s [User: 36.232 s, System: 1.751 s]
 Range (min … max): 17.748 s … 27.061 s 10 runs

Benchmark 2: make benchmark.wamr.nopatch
 Time (mean ± σ): 23.088 s ± 3.942 s [User: 39.985 s, System: 1.963 s]
 Range (min … max): 17.667 s … 29.005 s 10 runs

Summary
 'make benchmark.wamr' ran
 1.11 ± 0.27 times faster than 'make benchmark.wamr.nopatch'

Also I compared it with our app on a slow device(ARM Cortex A9 dual core 1 GHz from 2014) and there were also no clear winner, sometimes version with my changes was faster sometimes without, I was profiling iteration of the loop that covers everything.

I checked that the AOT binaries that I ran were indeed different and it's Intermidiate representation code was different in exactly atomic store/load operations.

for signals it's overkill in general.

From https://en.cppreference.com/w/cpp/utility/program/signal I found it the only non UB way to access the data. In practice reading regular pointer in signal handler never caused me any problems in my testing

g0djan · 2024-12-23T15:20:59Z

@yamt

I tried out the existing `wasm_cluster_suspend_thread` mechanism

I encountered 2 problems with it

Currently wasm_cluster_suspend_thread doesn't wait for the thread to stop and there's no existing mechanism to check it. For reporting stacktraces it led to thread sanitizer warnings. It's also mentioned in Support threads(cluster) suspension/resume #2319
If I call sleep(x) and then wasm_cluster_suspend_thread is called thread actually doesn't get suspended until sleep finishes.

I believe this branch should solve the first problem as it adds awaiting thread suspension. But it doesn't fix the second problem as I don't see additional CHECK_SUSPEND_FLAGS added to the code and currently it has to go through all the code in this block before the flags are checked. If I cause a deadlock in wasm code then the suspension never happens after wasm_cluster_suspend_thread is called

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What breaks ABI? #3961

What breaks ABI? #3961

g0djan commented Dec 16, 2024 •

edited

Loading

yamt commented Dec 17, 2024

g0djan commented Dec 17, 2024 •

edited

Loading

g0djan commented Dec 23, 2024 •

edited

Loading

What breaks ABI? #3961

What breaks ABI? #3961

Comments

g0djan commented Dec 16, 2024 • edited Loading

yamt commented Dec 17, 2024

g0djan commented Dec 17, 2024 • edited Loading

g0djan commented Dec 23, 2024 • edited Loading

I tried out the existing wasm_cluster_suspend_thread mechanism

g0djan commented Dec 16, 2024 •

edited

Loading

g0djan commented Dec 17, 2024 •

edited

Loading

g0djan commented Dec 23, 2024 •

edited

Loading

I tried out the existing `wasm_cluster_suspend_thread` mechanism