Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to avoid per-function allocation of fast interpreter #3921

Open
sasq64 opened this issue Nov 21, 2024 · 6 comments
Open

Option to avoid per-function allocation of fast interpreter #3921

sasq64 opened this issue Nov 21, 2024 · 6 comments

Comments

@sasq64
Copy link

sasq64 commented Nov 21, 2024

Feature

Option to make the fast interpreter not allocate so much memory;

Right now, the fast interpreter allocates close to 1000 bytes of memory for each function in the web assembly which is a lot of RAM for an embedded environment.

I would like to be able to turn avoid this, even if it means letting the code run slower.

Also because the fast interpreter can be run from read-only (flash) memory, while the slow interpreter can not, and the wasm binary must be copied into RAM.

Benefit

Reducing RAM usage.

Implementation

The big culprit is in wasm_loader_ctx_init() where the two structs BranchBlock and Const constitute allocations of 192 + 704 = 896 bytes of memory per function.

Alternatives

Alternative could be to let the slow interpreter work from read-only memory, to avoid the memory needed when copying it into RAM.

@lum1n0us
Copy link
Collaborator

In my understanding, by design, both the slow and fast interpreters are capable of executing Wasm opcode from read-only flash memory. This is managed by the flag is_load_from_file_buf in the load_from_section() function parameters.

At least, the fast interpreter uses more RAM than the slow one to store the processed Wasm code.

If slow interpreter is unable to execute a .wasm file from read-only flash, I believe this is a problem and we should definitely investigate it.

@sasq64
Copy link
Author

sasq64 commented Nov 27, 2024

No. That undocumented flag is always set to true and even if you change it to false in the WAMR source code, wasm_load_prepare_bytecode() still tries to rewrite opcodes for the slow interpreter.

@sasq64
Copy link
Author

sasq64 commented Nov 27, 2024

Its hard to understand exactly what happens when the binary is loaded, but it seems like the fast interpreter produces new opcodes optimzied for speed, while the slow interpreter patches the existing opcodes.

So fast interpeter works with the wasm binary in read only memory because it doesn't need to patch existing opcodes, instead producing new ones in RAM.

@lum1n0us
Copy link
Collaborator

wasm_load_prepare_bytecode() still tries to rewrite opcodes for the slow interpreter.

It doesn't seem to match our design; we'll take a closer look and keep you updated.

@lum1n0us
Copy link
Collaborator

@wenyongh @xujuntwt95329 @loganek @TianlongLiang @yamt

I've noticed that there are two functions that will write back to the binary file, which does not comply with the read-only flash requirement.

  • wasm_loader_prepare_bytecode() will rewrite when it processes opcodes such as global.get, global.set, local.get, local.set, select, and so on.
  • wasm_const_str_list_insert() will also rewrite, but this can be controlled by the is_load_from_file_buffer flag.

To address this issue, we have two options:

  • We could allow the classic interpreter to run without rewriting the binary content.
  • Alternatively, we could try to reduce the memory usage of the fast interpreter.

(As the author mentioned above.)

I used to think that not altering the binary content was one of our design principles. It appears that it's not, but I still believe this principle is important for certain scenarios, like streaming and this particular embedded hardware. I'm inclined to choose the first option even though it's slower (we all know the classic interpreter is the slowest and it won't make it much worse).

Please share your thoughts.

@wenyongh
Copy link
Contributor

wenyongh commented Dec 1, 2024

Agree to enable classic interpreter without modifying the binary first. For fast interpreter, I guess there is little room to reduce the size of pre-compiled code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants