Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node:fs APIs are not actually using file descriptors #23012

Open
mash-graz opened this issue Mar 21, 2024 · 5 comments
Open

node:fs APIs are not actually using file descriptors #23012

mash-graz opened this issue Mar 21, 2024 · 5 comments
Labels
bug Something isn't working correctly node API Related to various "node:*" modules APIs node compat

Comments

@mash-graz
Copy link
Contributor

mash-graz commented Mar 21, 2024

Version: upstream (2f7b966)

deno seems to show a significant different behavior when it comes to accessibility of resources between main instance and workers. In node and bun you can open a file, send the file descriptor resp. RID to the worker and access its content in this other context. Using deno, you will get just an BadResource Error. The received RID isn't present in the resource table of the client.

Here a small demonstration script:

import { isMainThread, parentPort, Worker } from "node:worker_threads";
import fs from "node:fs";


if (isMainThread) { // --- code running on the main instance ---
 
  // open a File in the contex if the Main Thread
  const fd = fs.openSync(import.meta.filename);

  globalThis.Deno && console.log("ResourceTable (Main):\n", Deno.resources());

  // run this script also as 'node:worker_thread'
  const w = new Worker(import.meta.filename);

  // send file descriptor to the worker side
  console.log(`Send open file RID: ${fd}`);
  w.postMessage(fd)

} else { // --- code running on the worker side ---

  // receive file descriptor and try to read file content
  parentPort.on("message", (msg) => {
    const received_fd = msg;
    console.log(`Received RID:  ${received_fd}`);

    globalThis.Deno && console.log("ResourceTable (Worker):\n", Deno.resources());

    fs.read(received_fd, (err, br, buf) => {
      if (err) {
        console.log(`ERROR: ${err}`)
      } else {
        console.log(`READ: ${br} bytes`)
      }

      // cleanup and exit worker
      globalThis.Deno ? close() : parentPort.unref();
    });
  });
}

the output using deno:

❯ ../deno/target/debug/deno run -A cross_realm_fd_reuse.mjs
ResourceTable (Main):
 { "0": "stdin", "1": "stdout", "2": "stderr", "3": "fsFile" }
Send open file RID: 3
Received RID:  3
ResourceTable (Worker):
 { "0": "stdin", "1": "stdout", "2": "stderr" }
ERROR: BadResource: Bad resource ID

on node:

❯ node cross_realm_fd_reuse.mjs
Send open file RID: 17
Received RID:  17
READ: 1133 bytes

and bun:

❯ /tmp/bun cross_realm_fd_reuse.mjs
Send open file RID: 12
Received RID:  12
READ: 1133 bytes

Although I understand denos radical different answer in this case very well, because it helps to prevent all sort of thread related troubles, it's still a significant different behavior resp. incompatibility.

The deno_core docu states about Resources:

Resources are not thread-safe - they can only be accessed from the thread that the JsRuntime lives on.

I'm not sure if this explanation inevitable applies to node:worker_threads, therefore I'm asking here, how we should work around this incompatibility?

Otherwise I would have to write rather complex adaptations for an existing software to bypass this differences in behavior (bytecodealliance/jco#400).

@mash-graz mash-graz changed the title node:worker_threads don't share resources with main Instance node:worker_threads don't share resources with main instance Mar 21, 2024
@bartlomieju
Copy link
Member

This is a bug in fs.open() and not in node:worker_threads. Most polyfills in fs are currently wrong as they don't work on actual file descriptors but Deno's own "resources". This is outright wrong and needs to be fixed.

@bartlomieju bartlomieju changed the title node:worker_threads don't share resources with main instance node:fs APIs are not actually using file descriptors Mar 21, 2024
@bartlomieju bartlomieju added bug Something isn't working correctly node compat node API Related to various "node:*" modules APIs labels Mar 21, 2024
@mash-graz
Copy link
Contributor Author

mash-graz commented Mar 21, 2024

I'm not sure, if I'm able to understand your explanation.

Right now all operations in /ext/node/polyfills/_fs/... just translate the calling options to finally call their counterparts in ext/fs. And this functions use the resource table as an additional indirection instead of real file descriptors. The file descriptors are present in this table, but they can't be easily shared between mainThread and worker.

And if I got it right, you argue, that the /ext/node file operations shouldn't take this indirection over the resource table, but simply use and return plain file descriptor numbers to achieve better compatibility.

@bartlomieju
Copy link
Member

The file descriptors are present in this table, but they can't be easily shared between mainThread and worker.

No, they are not. Resource ID is not a File Descriptor. Even though they are both number, the rid does not correspond to the actual value of FD of the OS.

And if I got it right, you argue, that the /ext/node file operations shouldn't take this indirection over the resource table, but simply use and return plain file descriptor numbers to achieve better compatibility.

We might still use resource table to properly clean up opened files, but we first need to fix the APIs to return proper OS file descriptors instead of resource ids. This is gonna be a big refactor that will take about two weeks and needs to be handled by the core team.

@mash-graz
Copy link
Contributor Author

mash-graz commented Mar 22, 2024

No, they are not. Resource ID is not a File Descriptor. Even though they are both number, the rid does not correspond to the actual value of FD of the OS.

I never thought, that Resource IDs in deno are identical to file descriptors.
But we can already get the file descriptors from an entry in the Resource table simply by:

#[op2(fast)]
pub fn op_fs_get_fd(
  state: &mut OpState,
  #[smi] rid: ResourceId,
) -> Result<i32, AnyError>{
  let file = FileResource::get_file(state, rid)?;
  let fd = match file.backing_fd() {
    Some(fd) => fd,
    None => return Err(generic_error("fd not found")),
  };
  Ok(fd)
}

That's why I'm rather confused about your explanation, that file descriptors aren't accessible via this RIDs.

The question, how these handlers or entire resource entries should be shared/replicated/transferred between threads/workers without fear, looks much more difficile to me.

Nevertheless, I really understand that this topic is indeed a rather complex one and it has to be handled very carefully by those, who are really familiar with its design and security implications.

@bartlomieju
Copy link
Member

A bunch of tests were disabled in #25285 for these APIs as they showed incorrect behavior using resource IDs instead of file descriptors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly node API Related to various "node:*" modules APIs node compat
Projects
None yet
Development

No branches or pull requests

2 participants