Skip to content

MSC4222: Adding state_after to sync v2 #4222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Oct 29, 2024

This is to allow clients to opt-in to a change of the sync v2 API that allows them to correctly track the state of the room.

Rendered

Implementations:

FCP tickyboxes

@erikjohnston erikjohnston force-pushed the erikj/sync_v2_state_after branch 3 times, most recently from 36f8310 to 87fb070 Compare October 29, 2024 15:08
@erikjohnston erikjohnston force-pushed the erikj/sync_v2_state_after branch from 87fb070 to 55e34c9 Compare October 29, 2024 15:12
@erikjohnston erikjohnston marked this pull request as ready for review October 29, 2024 15:12
@turt2live turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. hacktoberfest-accepted labels Oct 29, 2024
Comment on lines +151 to +155
Clients will not be able to tell when a state change happened within the timeline. This was used by some clients to
render e.g. display names of users at the time they sent the message (rather than their current display name), though
e.g. Element clients have moved away from this UX. This behavior can be replicated in the same way that clients dealt
with messages received via pagination (i.e. calling `/messages`), by walking the timeline backwards and inspecting the
`unsigned.prev_state` field. While this can lead to incorrect results, this is no worse than the previous situation.
Copy link

@toger5 toger5 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of usage of the timeline section should be not impacted at all.
We still send all state events (even if they might be wrong) in the timeline section so clients using those state event to render things in the timeline should be not impacted at all.

Suggested change
Clients will not be able to tell when a state change happened within the timeline. This was used by some clients to
render e.g. display names of users at the time they sent the message (rather than their current display name), though
e.g. Element clients have moved away from this UX. This behavior can be replicated in the same way that clients dealt
with messages received via pagination (i.e. calling `/messages`), by walking the timeline backwards and inspecting the
`unsigned.prev_state` field. While this can lead to incorrect results, this is no worse than the previous situation.
As before clients will not be able to tell when a state change happened within the timeline. This was used by some clients to
render e.g. display names of users at the time they sent the message (rather than their current display name), though
e.g. Element clients have moved away from this UX.
This MSC allows to compute the current state correctly. It does not fix the behavior for the state history in the timeline. So clients using such an approach still will face the same issues they had before this MSC.
A proper solution for this usecase would be to paginate back state using the `replaces_state` id. With this one can find all actual state event ids. This list of id's can be used in combination with the timeline history to filter for invalid and valid state changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I wanted to make clear was that clients will need to change how they calculate the "state at each event", and that that behaviour is really considered best-effort at best.

> Both responses are the same, but the client **MUST NOT** update its state with the event.


## Potential issues
Copy link
Contributor

@Gnuxie Gnuxie Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of returned state in required_state is a list of events. This does now allow the server to indicate if a "state reset" has happened which removed an entry from the state entirely (rather than it being replaced with another event).

Does this issue from MSC4186: Simplified Sliding Sync apply to this MSC too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, that is true. I guess there is an open question as to whether we want to try and fix that here or not 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any fix we do to communicate deleted state in SSS can easily enough be backported to v2 should it be needed. I see no reason to block the MSC on this point. On the contrary, this MSC massively reduces desync issues compared to v2 without state_after because we are now tracking diffs against the current state and not diffs against the last timeline event. For such a small change, this MSC provides immense value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ok but is this easy or hard to fix? If it isn't easy, then will it be harder to fix when the proposal is spec? I don't mind a compromise as long as we're not hurting ourselves later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends how you classify easy or hard :D

It's technically very easy, as you just need to have some representation of "deleted" events: we've tossed around content: {} and content: null and a few others like having a new key in the response JSON. It's socially difficult however as it's prone to bikeshedding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are roughly two proposals:

  1. Add deleted state into the list as a object with type, state_key and content: null (or whatever) keys, and nothing else. Since the state is deleted we don't have a proper event and so don't have any other keys.
  2. Add a separate field for deleted state, with the same format as above.

Both are trivial server side, the issue is what is cleanest for clients. AIUI the tradeoffs are:

  1. Mixing deleted state in with the other state means that clients essentially need to handle multiple datatypes in the same list.
  2. Having a separate field will make it easy for client implementations to miss/forget implementing the extra field during implementation. From my experience, its fairly common for people not to closely read the docs when initially writing clients, and so I think this will happen more often than you might think.

Neither are particularly hard blockers, though I'm very much coming at this from a server side perspective. I have a mild preference for the first option, as it a) semantically feels more correct, and b) reduces the risk a bit of clients forgetting to implement the extra field.

Copy link
Contributor

@Gnuxie Gnuxie Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to show if a state event in current state has been redacted? How would that work?

What about killing two birds in one stone and repeat the event with empty content. Since then it doesn't matter if a client is naive and unaware of the behavior, if they naively update their model then they'll end up with the right state.

Edit: Ok i forgot protected fields exist smh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Mixing deleted state in with the other state means that clients essentially need to handle multiple datatypes in the same list.
  1. Having a separate field will make it easy for client implementations to miss/forget implementing the extra field during implementation. From my experience, its fairly common for people not to closely read the docs when initially writing clients, and so I think this will happen more often than you might think.

I mean if you really want to avoid both of these ideally you could use a new type for state deltas with a property for the event and a property for the change type. In Draupnir land we call them: Introduced (for unseen state key type pair), Superseded, Redacted, and I guess you'd also need Removed1.

Not sure if it's practical or possible for you to change that now though.

Footnotes

  1. It's actually a bit more complicated because we try to not keep completely redacted state events around as an optimization and try to tell people exactly how they have changed but eh it's not necessary to copy that https://github.com/Gnuxie/matrix-protection-suite/blob/main/src/StateTracking/StateChangeType.ts#L13-L64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any fix we do to communicate deleted state in SSS can easily enough be backported to v2 should it be needed. I see no reason to block the MSC on this point. On the contrary, this MSC massively reduces desync issues compared to v2 without state_after because we are now tracking diffs against the current state and not diffs against the last timeline event. For such a small change, this MSC provides immense value.

+1; I'm strongly in favour of not blocking this MSC for years until someone has time to wrangle this last nit. I don't think that landing it as-is will make it materially more difficult to fix later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be way down on the lower end of priorities but this isn't a nit. And it shouldn't be taking years to sort issues out like this. If it is, then that's a more urgent issue to solve because then things are entirely dysfunctional.

updated with the contents of `state_after`.

Clients can tell if the server supports this change by whether it returns a `state` or `state_after` section in the
response.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state isn't currently required in sync responses, but I guess state_after will be required in all joined rooms after this change? Should probably make it more explicit that servers MUST return state_after with an empty events array inside

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
response.
response. The `state_after` property is **required** by servers that support this change.

@turt2live
Copy link
Member

A team member has request SCT review on this MSC for the following points:

  • To understand it doesn't handle where state gets deleted, and if so how to fix, if we indeed want to.
  • And depending on that, finalise how it should look spec-wise.

@richvdh richvdh removed the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Apr 22, 2025
`org.matrix.msc4222.use_state_after`) query param.

When enabled, the Homeserver will **omit** the `state` section in the room response sections. This is replaced by
`state_after` (the unstable field name is `org.matrix.msc4222.use_state_after`), which will include all state changes between the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`state_after` (the unstable field name is `org.matrix.msc4222.use_state_after`), which will include all state changes between the
`state_after` (the unstable field name is `org.matrix.msc4222.state_after`), which will include all state changes between the

I think?

Comment on lines +3 to +14
The current sync v2 API does not differentiate between state events in the timeline and updates to state, and so can
cause the client's view of the current state of the room to diverge from the actual state of the room. This is
particularly problematic for use-cases that rely on state being consistent between different clients.

This behavior stems from the fact that the clients update their view of the current state with state events that appear
in the timeline. To handle gappy syncs, the `state` section includes state events that are from *before* the start of
the timeline, and so are replaced by any matching state events in the timeline. This provides little opportunity for the
server to ensure that the clients come to the correct conclusion about the current state of the room.

In [MSC4186 - Simplified Sliding Sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) this problem is
solved by the equivalent `required_state` section including all state changes between the previous sync and the end of
the current sync, and clients do not update their view of state based on entries in the timeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this gave a bit more explanation about why /v3/sync's current behaviour is insufficient, or at least linked to some of the issues that come about as a result of the current implementation. It is likely that not all readers will be familiar with the problems.

@@ -0,0 +1,193 @@
# MSC4222: Adding `state_after` to sync v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really not v2, unless you happened to be here in 2015.

Suggested change
# MSC4222: Adding `state_after` to sync v2
# MSC4222: Adding `state_after` to `/sync`

@@ -0,0 +1,193 @@
# MSC4222: Adding `state_after` to sync v2

The current sync v2 API does not differentiate between state events in the timeline and updates to state, and so can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The current sync v2 API does not differentiate between state events in the timeline and updates to state, and so can
The current [`/sync`](https://spec.matrix.org/v1.14/client-server-api/#get_matrixclientv3sync) API does not differentiate between state events in the timeline and updates to state, and so can

This is basically the same as how state is returned in [MSC4186 - Simplified Sliding
Sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186).

State events that appear in the timeline section **MUST NOT** update the current state. The current state **MUST** only be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
State events that appear in the timeline section **MUST NOT** update the current state. The current state **MUST** only be
Clients **MUST NOT** update their view of the current state based on events that appear in the timeline section.
The current state view **MUST** only be

updated with the contents of `state_after`.

Clients can tell if the server supports this change by whether it returns a `state` or `state_after` section in the
response.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
response.
response. The `state_after` property is **required** by servers that support this change.

Since the current state of the room does not include the new state event, it's excluded from the `state_after` section.

> [!IMPORTANT]
> Both responses are the same, but the client **MUST NOT** update its state with the event.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Both responses are the same, but the client **MUST NOT** update its state with the event.
> Both responses are the same, other than the change of `state` to `state_after`, but the client **MUST NOT** update its state with the event.

There are a number of options for encoding the same information in different ways, for example the response could
include both the `state` and a `state_delta` section, where `state_delta` would be any changes that needed to be applied
to the client calculated state to correct it. However, since
[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) is likely to replace the sync v2 API, we may as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) is likely to replace the sync v2 API, we may as
[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186) is likely to replace the current `/sync` API, we may as

> Both responses are the same, but the client **MUST NOT** update its state with the event.


## Potential issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any fix we do to communicate deleted state in SSS can easily enough be backported to v2 should it be needed. I see no reason to block the MSC on this point. On the contrary, this MSC massively reduces desync issues compared to v2 without state_after because we are now tracking diffs against the current state and not diffs against the last timeline event. For such a small change, this MSC provides immense value.

+1; I'm strongly in favour of not blocking this MSC for years until someone has time to wrangle this last nit. I don't think that landing it as-is will make it materially more difficult to fix later.

@richvdh
Copy link
Member

richvdh commented Apr 22, 2025

MSCs proposed for Final Comment Period (FCP) should meet the requirements outlined in the checklist prior to being accepted into the spec. This checklist is a bit long, but aims to reduce the number of follow-on MSCs after a feature lands.

SCT members: please check off things you check for, and raise a concern against FCP if the checklist is incomplete. If an item doesn't apply, prefer to check it rather than remove it. Unchecking items is encouraged where applicable.

MSC authors, feel free to ask in a thread on your PR or in the #matrix-spec:matrix.org room for clarification of any of these points.

  • Are appropriate implementation(s) specified in the MSC’s PR description?
  • [n/a] Are all MSCs that this MSC depends on already accepted?
  • [n/a] For each new endpoint that is introduced:
    • Have authentication requirements been specified?
    • Have rate-limiting requirements been specified?
    • Have guest access requirements been specified?
    • Are error responses specified?
      • Does each error case have a specified errcode (e.g. M_FORBIDDEN) and HTTP status code?
        • If a new errcode is introduced, is it clear that it is new?
  • [n/a] Will the MSC require a new room version, and if so, has that been made clear?
    • Is the reason for a new room version clearly stated? For example, modifying the set of redacted fields changes how event IDs are calculated, thus requiring a new room version.
  • Are backwards-compatibility concerns appropriately addressed?
  • Are the endpoint conventions honoured?
    • Do HTTP endpoints use_underscores_like_this?
    • Will the endpoint return unbounded data? If so, has pagination been considered?
    • If the endpoint utilises pagination, is it consistent with the appendices?
  • An introduction exists and clearly outlines the problem being solved. Ideally, the first paragraph should be understandable by a non-technical audience.
  • All outstanding threads are resolved
    • All feedback is incorporated into the proposal text itself, either as a fix or noted as an alternative
  • While the exact sections do not need to be present, the details implied by the proposal template are covered. Namely:
    • Introduction
    • Proposal text
    • Potential issues
    • Alternatives
    • Dependencies
  • Stable identifiers are used throughout the proposal, except for the unstable prefix section
    • Unstable prefixes consider the awkward accepted-but-not-merged state
    • Chosen unstable prefixes do not pollute any global namespace (use “org.matrix.mscXXXX”, not “org.matrix”).
  • Changes have applicable Sign Off from all authors/editors/contributors
  • There is a dedicated "Security Considerations" section which detail any possible attacks/vulnerabilities this proposal may introduce, even if this is "None.". See RFC3552 for things to think about, but in particular pay attention to the OWASP Top Ten.

@richvdh
Copy link
Member

richvdh commented Apr 22, 2025

There are a couple of open threads here, but I don't think they need to block us from starting to gather FCP ticks.

@mscbot fcp merge

@mscbot
Copy link
Collaborator

mscbot commented Apr 22, 2025

Team member @richvdh has proposed to merge this. The next step is review by the rest of the tagged people:

Once at least 75% of reviewers approve (and there are no outstanding concerns), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for information about what commands tagged team members can give me.

@mscbot mscbot added proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. disposition-merge labels Apr 22, 2025
@richvdh richvdh moved this from Needs idea feedback / initial review to Ready for FCP ticks in Spec Core Team Backlog Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API disposition-merge hacktoberfest-accepted kind:core MSC which is critical to the protocol's success proposal A matrix spec change proposal proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period.
Projects
Status: Ready for FCP ticks
Development

Successfully merging this pull request may close these issues.

10 participants