Option to avoid per-function allocation of fast interpreter #3921

sasq64 · 2024-11-21T10:38:29Z

Feature

Option to make the fast interpreter not allocate so much memory;

Right now, the fast interpreter allocates close to 1000 bytes of memory for each function in the web assembly which is a lot of RAM for an embedded environment.

I would like to be able to turn avoid this, even if it means letting the code run slower.

Also because the fast interpreter can be run from read-only (flash) memory, while the slow interpreter can not, and the wasm binary must be copied into RAM.

Benefit

Reducing RAM usage.

Implementation

The big culprit is in wasm_loader_ctx_init() where the two structs BranchBlock and Const constitute allocations of 192 + 704 = 896 bytes of memory per function.

Alternatives

Alternative could be to let the slow interpreter work from read-only memory, to avoid the memory needed when copying it into RAM.

The text was updated successfully, but these errors were encountered:

lum1n0us · 2024-11-26T04:30:00Z

In my understanding, by design, both the slow and fast interpreters are capable of executing Wasm opcode from read-only flash memory. This is managed by the flag is_load_from_file_buf in the load_from_section() function parameters.

At least, the fast interpreter uses more RAM than the slow one to store the processed Wasm code.

If slow interpreter is unable to execute a .wasm file from read-only flash, I believe this is a problem and we should definitely investigate it.

sasq64 · 2024-11-27T08:46:18Z

No. That undocumented flag is always set to true and even if you change it to false in the WAMR source code, wasm_load_prepare_bytecode() still tries to rewrite opcodes for the slow interpreter.

sasq64 · 2024-11-27T08:59:32Z

Its hard to understand exactly what happens when the binary is loaded, but it seems like the fast interpreter produces new opcodes optimzied for speed, while the slow interpreter patches the existing opcodes.

So fast interpeter works with the wasm binary in read only memory because it doesn't need to patch existing opcodes, instead producing new ones in RAM.

lum1n0us · 2024-11-27T11:09:16Z

wasm_load_prepare_bytecode() still tries to rewrite opcodes for the slow interpreter.

It doesn't seem to match our design; we'll take a closer look and keep you updated.

lum1n0us · 2024-11-27T13:07:31Z

@wenyongh @xujuntwt95329 @loganek @TianlongLiang @yamt

I've noticed that there are two functions that will write back to the binary file, which does not comply with the read-only flash requirement.

wasm_loader_prepare_bytecode() will rewrite when it processes opcodes such as global.get, global.set, local.get, local.set, select, and so on.
wasm_const_str_list_insert() will also rewrite, but this can be controlled by the is_load_from_file_buffer flag.

To address this issue, we have two options:

We could allow the classic interpreter to run without rewriting the binary content.
Alternatively, we could try to reduce the memory usage of the fast interpreter.

(As the author mentioned above.)

I used to think that not altering the binary content was one of our design principles. It appears that it's not, but I still believe this principle is important for certain scenarios, like streaming and this particular embedded hardware. I'm inclined to choose the first option even though it's slower (we all know the classic interpreter is the slowest and it won't make it much worse).

Please share your thoughts.

wenyongh · 2024-12-01T07:36:54Z

Agree to enable classic interpreter without modifying the binary first. For fast interpreter, I guess there is little room to reduce the size of pre-compiled code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to avoid per-function allocation of fast interpreter #3921

Option to avoid per-function allocation of fast interpreter #3921

sasq64 commented Nov 21, 2024 •

edited

Loading

lum1n0us commented Nov 26, 2024

sasq64 commented Nov 27, 2024

sasq64 commented Nov 27, 2024

lum1n0us commented Nov 27, 2024

lum1n0us commented Nov 27, 2024

wenyongh commented Dec 1, 2024

Option to avoid per-function allocation of fast interpreter #3921

Option to avoid per-function allocation of fast interpreter #3921

Comments

sasq64 commented Nov 21, 2024 • edited Loading

Feature

Benefit

Implementation

Alternatives

lum1n0us commented Nov 26, 2024

sasq64 commented Nov 27, 2024

sasq64 commented Nov 27, 2024

lum1n0us commented Nov 27, 2024

lum1n0us commented Nov 27, 2024

wenyongh commented Dec 1, 2024

sasq64 commented Nov 21, 2024 •

edited

Loading