Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-chain Retrieval Expectations #862

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

willscott
Copy link
Contributor

Discussion: #861
Rendered version: here

This FIP proposes a set of retrieval SLA ‘tiers’. These tiers identify the ‘category’ of data - if it is fully offline, meant for low-volume-retrieval archival usage, or for higher bandwidth activity. The consensus part of this FIP is a proposal to encode the retrieval SLA tier as part of a deal proposal. Standardizing retrieval expectations in this way allows storage providers to apply appropriate policies and pricing to deals.

FIPS/fip-00xx.md Outdated Show resolved Hide resolved
Co-authored-by: Jiaying Wang <[email protected]>
Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think that this topic should be an FRC, but we can take that to the discussion threads.

I've attempted part of an editorial review on the assumption that it really needs a core protocol FIP, but overall this is missing quite a bit of substance in the specification. When that is firmed up, we'll better be able to describe what ought to be covered in the "considerations" sections.

The consensus part of this FIP is a proposal to encode the retrieval SLA tier as part of a deal proposal.
Standardizing retrieval expectations in this way allows storage providers to apply appropriate policies and pricing to deals.

## Change Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is lacking describing the motivation for any consensus change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to above.


* `Label` - part of a `DealProposal` on chain.

* `VerifiedDeal` - part of `DealProposal` on chain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on how these touch the issue? FastRetrieval isn't part of consensus protocol AFAIK so is a good example of FRC-type data here. Label is opaque at the consensus level, but can support conventions being built on it in FRCs. VerifiedDeal is an instruction to the built-in market actor to act as an operator for the client's datacap and make a verified allocation concurrent with publishing a deal, but I don't see the relevance to this proposal.


```go
RetrievalTier abi.RetrievalTier
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change implies a bunch more changes. The DealProposal is part of built-in market actor API and state.

  • Changing the API (the schema of the struct it will accept) is disruptive. From text below, I think you're suggesting that there be a new DealProposal2 type with a new field that can be used in the API for a new PublishStorageDeals2 method, in order to preserve compatibility
  • Changing the state schema requires a migration of the ~50M deals in state (or an alternate design for multiple collections, or something).

### Deal flow change (in conjunction with direct data onboarding)

In the network with Direct Data Onboarding (FIP 0076), there will not always be a `DealProposal` for a deal.
In this case, where no market deal is present, the `RetrievalTier` is encoded as an extension of the `PieceActivationManifest` from the storage provider.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also quite under-specified.

The whole design intent behind a piece activation manifest is that the consensus layer need know nothing about markets and deals etc. There is an opaque payload field in order to pass information through to a market contract but, like deal label, it's completely up to market contracts to interpret. I can't tell if you mean to specify some format of data in that payload, or put a field into the consensus-level schema. If it's the latter, I will object (but can take it to the discussion topic).

The negotiation of storage deals may contain a client request for which Retrieval Tier to specify, but that process in Direct Data Onboarding occurs off chain.
What inclusion of this field in the `PieceActivationManifest` allows is that the chain may react and that contracts can be written to monitor and respond to these advertisements.

### FVM extensions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section doesn't have much to do with the FVM, and should probably just be folded into the main actor change specification.

## Backwards Compatibility
<!--All FIPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The FIP must explain how the author proposes to deal with these incompatibilities. FIP submissions without a sufficient backwards compatibility treatise may be rejected outright.-->

The added field, both in and out of a direct data onboarding network, is optional. As such, any current deal, and any future deal not specifying the added field will take on the specified default behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an optional field is not a backwards compatible change to actor APIs. As noted above, this probably needs additional methods and a state migration.

## Simple Summary
<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.-->

Today, a mish-mash of heuristics have emerged to attempt to identify client intentions about the retrievability of data stored in filecoin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Today, a mish-mash of heuristics have emerged to attempt to identify client intentions about the retrievability of data stored in filecoin.
Today, a mish-mash of heuristics have emerged to attempt to identify client intentions about the retrievability of data stored on Filecoin.

Nit update to be more aligned with what appears to be convention.

<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.-->

Today, a mish-mash of heuristics have emerged to attempt to identify client intentions about the retrievability of data stored in filecoin.
This FIP proposes a simple on-chain mechanism to allow clients to express those expectations at deal time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This FIP proposes a simple on-chain mechanism to allow clients to express those expectations at deal time.
This FIP proposes a simple on-chain mechanism which allows clients to express those expectations at deal time.

<!--A short (~200 word) description of the technical issue being addressed.-->

This FIP defines a set of retrieval SLA ‘tiers’.
These tiers identify the ‘category’ of data - if it is fully offline, meant for low-volume-retrieval archival usage, or for more regular retrieval.
Copy link
Contributor

@kaitlin-beegle kaitlin-beegle Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott during your presentation at LabWeek, I believe you articulated five specific tiers. Have these been reduced to three?

If not, please be specific in listing exactly which tiers/retrieval types are being specified. The abstract should be brief, but also explicit.

The consensus part of this FIP is a proposal to encode the retrieval SLA tier as part of a deal proposal.
Standardizing retrieval expectations in this way allows storage providers to apply appropriate policies and pricing to deals.

## Change Motivation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on how encoding these SLAs at the protocol level will affect Filecoin functionality in the future, and/or address concrete issues in the present.

Why should retrieval expectations become policy at the protocol level, as opposed to incorporated as more arbitrary (i.e., optional) deal metadata? Why should SLA tiers be a required component of deal-making for all network participants?

### Gameability of tiers

Reporting that content is available at a higher tier does not directly help clients or storage providers.
In fact, it primarily raises costs as the sum of all deal retrievals will lead to a direct calculation of expected network bandwidth provisioning of an SP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question- where does the 'direct calculation of expected network bandwidth provisioning' take place? Is this a business hypothetical, or a concrete part of the deal-making process/fee calculation that I'm unaware of?

More frequently retrieved data of course has higher bandwidth requirements during retrieval, but does this anticipated bandwidth use actually get passed on to the SP or deal client during deal making? Is there a higher gas fee, etc., charged to deals that are likely to require greater network bandwidth, etc.?

@kaitlin-beegle
Copy link
Contributor

Overall I think that this topic should be an FRC, but we can take that to the discussion threads.

I've attempted part of an editorial review on the assumption that it really needs a core protocol FIP, but overall this is missing quite a bit of substance in the specification. When that is firmed up, we'll better be able to describe what ought to be covered in the "considerations" sections.

FWIW, this was also my comment to @willscott after seeing his team's presentation during LabWeek. I've asked him to add more information about why these flags should be a network prerogative, rather than a standard.

Copy link
Contributor

@kaitlin-beegle kaitlin-beegle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be interest amongst FIP Editors to be more diligent about merging earlier stage FIP drafts. For that reason, I'm happy to approve this for now.

However, there are clearly a lot of sessions that have yet to be fleshed out. I also think there is a big question still to be answered about why this change ought to be encoded within the protocol, and what future functionality or product changes will be enabled by doing so.


### Overview

### Tiers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is overview TODO or tiers, deal flow change and so on should be heading 4?

This proposal should be thought of as two distinct sub-components: A concretization of the expectations that clients have about the retrievability of data into a defined set of SLAs, and a mechanism to express those SLAs on-chain.

## Abstract
<!--A short (~200 word) description of the technical issue being addressed.-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change specific to f05 market deals, or intended to set a standard for future user programmed storage markets?

The proposed tiers are:

1. *Offline* - Data that is transferred via physical media. This data is not expected to be retrieved over the network.
2. *Archival* - Data that is expected to be retrieved infrequently. This data is only expected to need enough network bandwidth to be able to ensure replication. The most common data replication policies on filecoin today at 5x and 10x copy replication, to ensure that data remains even when some copies are unavailable. In order for contracts or users to ‘heal’ data replication policies in the face of failures, they would potentially need (n-1) copies of the data to be re-replicated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding- where does 5x/10x coming from? Fil+?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the replication factor would have a direct or indirect relationship to FIL+, but rather to external expectations and potentially observed fault and retrievability odds. It's probably just where clients/providers landed, like web3.storage picking "minimum 5 deals".


A new method `GetDealRetrievalTier` would be added as with other deal property accessors.

## Design Rationale
Copy link
Member

@jennijuju jennijuju Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this retrieval expectation cannot be enforced onchain, just like fastretrieval. There have been many discussion on having fastretrieval in the protocol sends wrong expectation and should be removed, therefore I’m not sure adding another explicit field similarly is a good idea.

second, have you considered to propose an FRC for setting such “tag” in the label field instead? Im trying to guess the motivation of the fip here as its under specified currently, and it seems like one of the main goal is to allow data client to signal retrieval expectation? If so, I think adding in the label is worth considering to avoid migrations. However, if the goal is also providing retrieval guarantees, then this fip needs more work on that.

## Design Rationale

We cannot fully predict what tiers will be optimal for each provider, but concrete tiers will reduce the design space, which will reduce uncertainty of other components and provide reasonable service tier recommendations for providers to offer.
We expect further tier definitions to emerge over time, but that their inclusion into the registry introduced here will serve as an efficiency mechanism, rather than one needed for enabling new types of service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this point particularly - I’m curious if the teirs should be defined in the consensus protocol level, as that implies for any changes it requires a network upgrade.

Copy link
Member

@jsoares jsoares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to add to the comments already provided by other reviewers but I, too, question why this needs to be handled at the consensus level, given the added tiers are neither specific enough nor self-enforceable. As it stands, I don't see solid reasoning for this path articulated in either the current draft or the related discussion, and it seems like the type of thing that would cleanly fit an FRC.

The proposed tiers are:

1. *Offline* - Data that is transferred via physical media. This data is not expected to be retrieved over the network.
2. *Archival* - Data that is expected to be retrieved infrequently. This data is only expected to need enough network bandwidth to be able to ensure replication. The most common data replication policies on filecoin today at 5x and 10x copy replication, to ensure that data remains even when some copies are unavailable. In order for contracts or users to ‘heal’ data replication policies in the face of failures, they would potentially need (n-1) copies of the data to be re-replicated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the replication factor would have a direct or indirect relationship to FIL+, but rather to external expectations and potentially observed fault and retrievability odds. It's probably just where clients/providers landed, like web3.storage picking "minimum 5 deals".

@willscott willscott marked this pull request as draft January 23, 2024 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants