-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align File operations to the 'symlink' metaphor of data links to files #11825
Comments
Btw. not sure if I will have enough time to do it as part of this ticket or create a separate one, but since we are now even more 'mixing' file systems (e.g. Currently some operations yield a type error, or worse - undefined behaviour (i.e. hard to say what happens) if they get a file instance from a different filesystem. Let's align this at some point:
|
Not for this PR, but I am curious about the choices about which operations follow the data link and which don't -- I can see uses for either one. In particular, I can definitely see users sometimes wanting |
@jdunkerley initially didn't want to have The thing with e.g. But I agree that especially for files it seems good to be able to check both if the link but also the target file exists and distinguish the two non-existence scenarios. |
@radeusgd That makes sense about datalinks pointing to non-files. (It could mean "does the table exist"; errors about connection failures are different, and apply both to tables and cloud filesystems.) I also can agree that we don't want to bother the user with a |
Radosław Waśko reports a new STANDUP for yesterday (2024-12-16): Progress: Working on making is_regular_file and is_directory follow symlinks on all backends, updating tests and docs. It should be finished by 2024-12-19. Next Day: Next day I will be working on the same task. Fix errors in Jaroslav's PR, then work on / operator in datalinks |
Radosław Waśko reports a new STANDUP for yesterday (2024-12-17): Progress: Fixing test failures in dataflow propagation PR - it took some time. Reviews. It should be finished by 2024-12-19. Next Day: Next day I will be working on the same task. Work on / operator in datalinks |
Radosław Waśko reports a new STANDUP for yesterday (2024-12-18): Progress: Implemented crossing datalinks on Next Day: Next day I will be working on the same task. Make sure it is also consistent with |
Radosław Waśko reports a new STANDUP for yesterday (2024-12-19): Progress: Fixing edge cases on S3 - some more advanced heuristics needed there as the notion of directory is less clearly defined there so the distinctions that are easy elsewhere are more complicated. It should be finished by 2024-12-20. Next Day: Next day I will be working on the same task. Wrap up the PR |
Radosław Waśko reports a new STANDUP for today (2024-12-20): Progress: Added cache (TTL 30s) to S3.is_data_link as it is costly and it was queried relatively often - it was making the data link tests 2x slower. We could cache more in the future. Fixing some edge cases/test failures. Addressing code review. It should be finished by 2024-12-20. Next Day: Next day I will be working on the same task. Perhaps do some work on Type inferrence - update the PR to latest develop, resolve conflicts, address review. |
Radosław Waśko reports a new STANDUP for today (2024-12-30): Progress: Fixing unexpected test failures in Base_Internal_Tests, fixed regression in File.new on relative paths. Testing product on AoC challenges, reported some findings. It should be finished by 2025-01-07. Next Day: Next day I will be working on the same task. Fix one more failure in the CI (S3 this time), work on types - get the PR up to date and apply code review comments. |
Radosław Waśko reports a new STANDUP for the provided date (2024-12-27): Progress: Fixing unexpected test failures in Base_Internal_Tests, fixed regression in File.new on relative paths. Testing product on AoC challenges, reported some findings. It should be finished by 2025-01-07. Next Day: Next day I will be working on the same task. Fix one more failure in the CI (S3 this time), work on types - get the PR up to date and apply code review comments. |
Radosław Waśko reports a new STANDUP for yesterday (2024-12-30): Progress: Fixed the S3 failure on CI, now the engine/stdlib tests are green on the PR. Put up 2 small PRs: fix Snowflake CI, and fix for Integer.parse widget. Reporting bugs from AoC testing. Updated Types PR and beginning on addressing code review. It should be finished by 2025-01-07. Next Day: Next day I will be working on the #9812 task. Continue addressing code review comments on Types PR, implement some requested restructure. |
Implementing #11802 calls for aligning other methods on files to the 'symlink metaphor'.
The idea is following:
is_directory
andis_regular_file
follow the data link and check the target file. If the data link points to a non-file object, they both returnFalse
.size
follows the data link and returns the size of the target file. If the data link does not point to a file, it throws an error.exists
does not follow the data link - it checks whether the link itself exists, regardless of target.creation_time
andlast_modified_time
do not follow the data link - they return the modified/created timestamps of the link.posix_permissions
andis_writable
(local file only) do not follow the data links; they are not implemented on other file systems./
andresolve
allow to 'cross' data links if the data link points to a directory (this is the immediate consequence ofData.list
following the data link). This means that the return type of/
is no longer the same type as input (e.g.File./
may returnS3_File
because of data links). For now we do not ascribe checked return type, later we may want to use&
types with type classes to make this better.The text was updated successfully, but these errors were encountered: