Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align File operations to the 'symlink' metaphor of data links to files #11825

Open
radeusgd opened this issue Dec 10, 2024 · 12 comments · May be fixed by #11926
Open

Align File operations to the 'symlink' metaphor of data links to files #11825

radeusgd opened this issue Dec 10, 2024 · 12 comments · May be fixed by #11926
Assignees
Labels
-libs Libraries: New libraries to be implemented l-cloud-integration Enso Cloud integration work

Comments

@radeusgd
Copy link
Member

Implementing #11802 calls for aligning other methods on files to the 'symlink metaphor'.

The idea is following:

  • is_directory and is_regular_file follow the data link and check the target file. If the data link points to a non-file object, they both return False.
  • size follows the data link and returns the size of the target file. If the data link does not point to a file, it throws an error.
  • exists does not follow the data link - it checks whether the link itself exists, regardless of target.
  • creation_time and last_modified_time do not follow the data link - they return the modified/created timestamps of the link.
  • posix_permissions and is_writable (local file only) do not follow the data links; they are not implemented on other file systems.
  • / and resolve allow to 'cross' data links if the data link points to a directory (this is the immediate consequence of Data.list following the data link). This means that the return type of / is no longer the same type as input (e.g. File./ may return S3_File because of data links). For now we do not ascribe checked return type, later we may want to use & types with type classes to make this better.
@radeusgd radeusgd self-assigned this Dec 10, 2024
@radeusgd radeusgd added -libs Libraries: New libraries to be implemented l-cloud-integration Enso Cloud integration work labels Dec 10, 2024
@radeusgd
Copy link
Member Author

Btw. not sure if I will have enough time to do it as part of this ticket or create a separate one, but since we are now even more 'mixing' file systems (e.g. / may return a new file on completely another file system), we should also 'reinforce' other methods to allow values from other filesystems.

Currently some operations yield a type error, or worse - undefined behaviour (i.e. hard to say what happens) if they get a file instance from a different filesystem. Let's align this at some point:

  • is_descendant_of / starts_with - should simply return False if filesystems differ. Even if a file can be reached by a datalink, it's not descendant from path perspective.
  • relativize and join - should raise an error if filesystems differ - it makes no sense to relativize or join on different file systems.
  • perhaps we should add Data Link section to documentation of copy_to and move_to.

@radeusgd radeusgd moved this from ❓New to 🔧 Implementation in Issues Board Dec 10, 2024
@GregoryTravis
Copy link
Contributor

Not for this PR, but I am curious about the choices about which operations follow the data link and which don't -- I can see uses for either one. In particular, I can definitely see users sometimes wanting exists to follow links. Perhaps we will want a follow_link flag for all of them -- but with differing defaults.

@radeusgd
Copy link
Member Author

@jdunkerley initially didn't want to have follow_links option. I agree with him about the point that that would obfuscate the API and make it more complicated. I suspect many users don't want to think about the symlinks/datalinks too much and exposing that complexity may scare them.

The thing with e.g. exists following links is that it is not well defined for data links that point to non-files. I.e. if I have a data link pointing to a Database connection, what does exist mean? Is it "can I connect to the endpoint"? We can make it work but it is indeed making it a bit more vague.

But I agree that especially for files it seems good to be able to check both if the link but also the target file exists and distinguish the two non-existence scenarios.

@GregoryTravis
Copy link
Contributor

@radeusgd That makes sense about datalinks pointing to non-files. (It could mean "does the table exist"; errors about connection failures are different, and apply both to tables and cloud filesystems.)

I also can agree that we don't want to bother the user with a follow_links argument to every method that might involve a data link. Instead, we might have to implement as_data_link_itself (which is a bad name) on these values. The value itself contains the referent, but if you want to access the data link as a data link, then you would have to take an extra step. This would be rarely used by end users, so it would make sense to require a special method call.

@enso-bot
Copy link

enso-bot bot commented Dec 17, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-12-16):

Progress: Working on making is_regular_file and is_directory follow symlinks on all backends, updating tests and docs. It should be finished by 2024-12-19.

Next Day: Next day I will be working on the same task. Fix errors in Jaroslav's PR, then work on / operator in datalinks

@enso-bot
Copy link

enso-bot bot commented Dec 18, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-12-17):

Progress: Fixing test failures in dataflow propagation PR - it took some time. Reviews. It should be finished by 2024-12-19.

Next Day: Next day I will be working on the same task. Work on / operator in datalinks

@enso-bot
Copy link

enso-bot bot commented Dec 19, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-12-18):

Progress: Implemented crossing datalinks on /. It should be finished by 2024-12-19.

Next Day: Next day I will be working on the same task. Make sure it is also consistent with File.new

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Dec 20, 2024
@enso-bot
Copy link

enso-bot bot commented Dec 20, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-12-19):

Progress: Fixing edge cases on S3 - some more advanced heuristics needed there as the notion of directory is less clearly defined there so the distinctions that are easy elsewhere are more complicated. It should be finished by 2024-12-20.

Next Day: Next day I will be working on the same task. Wrap up the PR

@enso-bot
Copy link

enso-bot bot commented Dec 20, 2024

Radosław Waśko reports a new STANDUP for today (2024-12-20):

Progress: Added cache (TTL 30s) to S3.is_data_link as it is costly and it was queried relatively often - it was making the data link tests 2x slower. We could cache more in the future. Fixing some edge cases/test failures. Addressing code review. It should be finished by 2024-12-20.

Next Day: Next day I will be working on the same task. Perhaps do some work on Type inferrence - update the PR to latest develop, resolve conflicts, address review.

@enso-bot
Copy link

enso-bot bot commented Dec 30, 2024

Radosław Waśko reports a new STANDUP for today (2024-12-30):

Progress: Fixing unexpected test failures in Base_Internal_Tests, fixed regression in File.new on relative paths. Testing product on AoC challenges, reported some findings. It should be finished by 2025-01-07.

Next Day: Next day I will be working on the same task. Fix one more failure in the CI (S3 this time), work on types - get the PR up to date and apply code review comments.

@enso-bot
Copy link

enso-bot bot commented Dec 30, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-12-27):

Progress: Fixing unexpected test failures in Base_Internal_Tests, fixed regression in File.new on relative paths. Testing product on AoC challenges, reported some findings. It should be finished by 2025-01-07.

Next Day: Next day I will be working on the same task. Fix one more failure in the CI (S3 this time), work on types - get the PR up to date and apply code review comments.

@enso-bot
Copy link

enso-bot bot commented Dec 31, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-12-30):

Progress: Fixed the S3 failure on CI, now the engine/stdlib tests are green on the PR. Put up 2 small PRs: fix Snowflake CI, and fix for Integer.parse widget. Reporting bugs from AoC testing. Updated Types PR and beginning on addressing code review. It should be finished by 2025-01-07.

Next Day: Next day I will be working on the #9812 task. Continue addressing code review comments on Types PR, implement some requested restructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-cloud-integration Enso Cloud integration work
Projects
Status: 👁️ Code review
Development

Successfully merging a pull request may close this issue.

2 participants