-
-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async file in multipart file upload #3339
base: master
Are you sure you want to change the base?
Conversation
httpx/_types.py
Outdated
@@ -93,7 +93,7 @@ | |||
|
|||
RequestData = Mapping[str, Any] | |||
|
|||
FileContent = Union[IO[bytes], bytes, str] | |||
FileContent = Union[IO[bytes], bytes, str, AsyncIterable[bytes], AsyncIterable[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an async equivalent to typing.IO
?
Note that only support byte streams, not text streams.
(Ie. files must be opened in binary mode, or else we raise an exception.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an async equivalent to typing.IO?
I don't think so. Since all the asyncio files I know are implemented in 3rd party libraries, there is no need for typing.IO
asynchronous alternative in stdlib, I guess.
Note that only support byte streams, not text streams.
But it kinda works in that case. I'm using yield to_bytes(achunk)
in arender_data
to convert str
chunk to bytes. I've updated the test to work with the file in text mode also. Not sure if this is correct aproach =/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I think the failure case is this?...
- Read file size, used for Content-Length header.
- Open a file on a system using a non utf-8 encoding.
- Write to bytes using utf-8.
- Oops, uploaded data stream didn't match the Content-Length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've updated the code and added a runtime checks for bytes in arender_data
and updated tests accordingly.
Oh, thanks @tsimoshka 😊 |
BTW, It looks like |
if not isinstance(self.file, AsyncIterable): | ||
for chunk in self.render_data(): | ||
yield chunk | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want block loop for non-AsynIterable types?
if not isinstance(self.file, AsyncIterable): | |
for chunk in self.render_data(): | |
yield chunk | |
return | |
if not isinstance(self.file, AsyncIterable): | |
raise TypeError("Invalid type for file. Only AsyncIterable is supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but won't it break IO[bytes] usage with AsyncClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async generator objects should always be typed as AsyncGenerators so they can be used with contextlib.aclosing
httpx/_multipart.py
Outdated
if not isinstance(achunk, bytes): | ||
raise TypeError( | ||
"Multipart file uploads must be opened in binary mode," | ||
" not text mode." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be outside of for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to check the type of file in FileField.__init__
. But I think that it's impossible to check the type of the chunks in AsyncIterable at runtime without iterating. Besides, strictly speaking, nothing stops me in python from doing something like:
import asyncio
async def aiter():
yield b"bytes"
yield "some str"
async def main():
result = []
async for chunk in aiter():
result.append(chunk)
print(result)
asyncio.run(main())
I know that this is a very strange example and it's unlikely that someone will do something like that. So I can check only the first chunk in a loop to ensure it is of the correct type. Something like:
checked = False
async for achunk in self.file:
if not checked and not isinstance(achunk, bytes):
raise TypeError(
"Multipart file uploads must be opened in binary mode,"
" not text mode."
)
checked = True
yield achunk
But I'm not sure whether it'll significantly impact the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work?
file_aiter = self.file.__aiter__()
try:
achunk = await aiter.__anext__()
except StopAsyncIteration:
return
if not isinstance(achunk, bytes):
raise TypeError(
"Multipart file uploads must be opened in binary mode,"
" not text mode."
)
yield achunk
async for achunk in file_aiter:
yield achunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would work, but it wouldn't catch the mixed string/bytes iterators I mentioned here. However, I've modified the code as you suggested, since I agree that handling mixed iterators is an edge/unlikely case in this situation..
Hello! This looks great, any chance it can be merged soon? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to explicitly aclose async generators. contextlib.aclosing was added in 3.10 for this purpose, you'll need to copy in the code to support 3.9 or conditionally use contextlib2.aclosing
if not isinstance(self.file, AsyncIterable): | ||
for chunk in self.render_data(): | ||
yield chunk | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async generator objects should always be typed as AsyncGenerators so they can be used with contextlib.aclosing
httpx/_multipart.py
Outdated
def render(self) -> typing.Iterator[bytes]: | ||
yield self.render_headers() | ||
yield from self.render_data() | ||
|
||
async def arender(self) -> typing.AsyncIterator[bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use AsyncGenerator type
httpx/_multipart.py
Outdated
def render(self) -> typing.Iterator[bytes]: | ||
yield self.render_headers() | ||
yield from self.render_data() | ||
|
||
async def arender(self) -> typing.AsyncIterator[bytes]: | ||
yield self.render_headers() | ||
async for chunk in self.arender_data(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use async with contextlib.aclosing
httpx/_multipart.py
Outdated
for field in self.fields: | ||
yield b"--%s\r\n" % self.boundary | ||
if isinstance(field, FileField): | ||
async for chunk in field.arender(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async with contextlib.aclosing
Hey! Thanks for reviewing this. I won't do this for the following reasons:
|
I'm not talking about using aclosing on the files, just the async generators. The AsyncIterator from the file doesn't need to be closed |
7595f86
to
7984e22
Compare
7984e22
to
a6ea8b4
Compare
Sorry for the delay. I've added aclosing as requested. To support Python versions prior to 3.10, I've restored the previously removed httpx/_compat.py file, though I'm not certain this is the best approach. |
Summary
Trying to implement support for async files (any AsyncIterable[bytes | str]) multipart file upload. Partially fixes #1620
Checklist