-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need public opt-out API for output routing from threads #1289
Comments
I think perhaps a public API to set the global parent as one or more ContextVars (ideally one); @contextmanager
def parent_header_setter(parent):
token = parent_context_var.set(parent)
try:
yield
finally:
parent_context_var.reset(token)
sys.stdout.parent_header = sys.stdout.parent_header which looks weird, but it stores the current resolved value (which may come from up to 3 different places) in a contextvar, so it overrides the global and the thread-local lookup, so it's a one-time set of the global default, but a persistent set of the thread-local value. It would be nice to have the possibility to do this also without overriding the global, but that can only currently be done with the private: sys.stdout._parent_header.set(sys.stdout.parent_header) I think we should:
because while #1186 fixed OutputWidget capturing print statements from threads, it actually doesn't capture |
Great write up, thank you!
Not sure about this point ;)
Do you have something specific in mind? I guess we would not want users to play with |
Not more specific than what I outlined above. Perhaps a public method on Kernel to set the thread-local header (possibly even with a context manager API), with the capabilities:
This could be e.g. Kernel.set_thread_parent(parent, channel="shell") # or set_parent(parent, thread=True)
Kernel.get_thread_parent(channel="shell") # always gets the resolved thread parent? Or only the override? where e.g. Kernel.set_thread_parent(None) could clear the thread-local parent, or an explicit separate kernel.set_thread_parent(kernel.get_parent()) could set the thread-local parent to the global parent (re-pinning to the current cell). If we switched to opt-in instead of opt-out, this would be the call to restore the current behavior. I'm not sure if I think this would have the benefit of solving the problems addressed by #1186 consistently for all outputs, instead of just stream output, which would be more consistent. As it is now, stream output and display output are not routed to the same cell because this logic is on OutStream instead of Kernel. Whatever we come up with, we should make sure that: print('stdout')
display('displaypub') consistently go to the same place, which is not currently the case, I think. I don't necessarily think we need to change the default (I do still think we should). But I think we should at least make sure that the default behavior does not affect threads created by ThreadPoolExecutor, since that results in pretty clearly incorrect behavior, I think, and selecting whichever we make not the default should be relatively clear and public. |
We need an easier opt-out of the behavior introduced in #1186, because most output I'm seeing come from threads is really confusing or even lost, where the previous behavior of sending all thread output to the current cell would be more appropriate. Either a kernel-level opt-out or a per-thread opt-out or both should be required, and we should perhaps reconsider the default behavior.
Two examples that I don't think are that unusual, demonstrating where the current behavior is incorrect and currently impossible to avoid with public APIs:
ipyparallel
For example, IPython Parallel runs its IO in a background thread, and sometimes this produces output (e.g. when streaming output from engines). It produces output in response to direct, blocking action in the main thread, but the output is produced in a long-running background thread. Sending this output to the cell that created the Client is not desirable. Producing the output in the main thread is not particularly feasible either, because it's used in e.g. streaming output in user code:
In all situations, the right thing to do for output produced by this thread is to go to the current cell.
The same is true for some log output, e.g. stopping clusters - which again produces output in a background thread as a direct result of synchronized action taken on the main thread, where placing the output near the action taken that triggered the output is less surprising than the far disconnected initial launch of the thread.
ThreadPoolExecutor
Another, fully standard library situation is ThreadPoolExecutor, where outputs from all tasks will go to the initial thread-spawning Cell. It's not just the cell that creates the pool, since spawning threads may actually be deferred until the first task submission requiring the thread, producing this very surprising output order:
which may be related to the joblib issue in #402 which is closed, but appears to actually be unresolved.
I think there is an assumption in #1186 that threads, once spawned, are long running and do not interact with the main thread. This holds for some examples of "fire and forget" type threads, but is definitely not true in general, and I'm not even sure it's true more often than not.
We at least need a way for packages/libraries to indicate that a thread producing output shouldn't be routed to the originating cell.
If we want to be really fiddly and try to guess the right thing to do (as we are doing now with threads, where the guess is often incorrect, if clear and predictable), we could assume that threaded output should go to the current cell if the current cell is blocking while the output is produced. This would definitely do the wrong thing sometimes in the cases where a background thread should route to the originating cell. That seems to be the rare exception, however.
While the async routing is nice and I suspect more robust, I think the guess for threads is more often incorrect than correct, so I think perhaps it should be made opt-in instead of opt-out. At the very least, I think we should make sure that the current thread output routing does not apply to threads created by ThreadPoolExecutor or similar.
Looking at jupyter-widgets/ipywidgets#2358 which uses OutputWidget, allowing OutputWidget to set the thread-local parent header in a sticky way would still solve the motivating issue, even if default print statements were routed to the latest cell, which I think is probably the better default behavior. I think #1186 gave us what we need to allow
with OutputWIdget()
to work in a way that persists for the current context (viastream._parent_header.set
).I don't actually understand how
with output_widget
captures output anymore, since apparently settingself.msg_id
is all it does, but presumablywith output_widget
should set the parent_header in a way that's persistent for the context within the current thread and not overridden by concurrent executions in the main thread.What do you think is the right path forward, @krassowski?
The text was updated successfully, but these errors were encountered: