-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is a backslashed space still whitespace? #245
Comments
Interestingly, currently in the online playground, attributes on the escaped non-breaking space change the behavior:
This might perhaps be an inconsistency in the current parser or a specification issue on how the inlines nest? |
Having looked intensely at the current parser, the current behavior as I understand it is that emphasis and similar marks look for preceding or subsequent whitespace in the raw source text and not in the AST or any semantic representation, so here adding attributes makes the character before I guess specifying a rule about raw source whitespace is as legitimate as a rule about semantic whitespace, but I think even as a basic user I would like to be informed of which one it is (just like I think it was useful to spell out that only ASCII whitespace counts, not the whole unicode class). |
Correct. Do you want to make a targeted suggestion about where this should be reflected in the documentation? |
My specification-reading skill is a bit weird, so you might want other opinions, but as a user I think I would be satisfied with the following additions:
The emphases mark the additions, I don't think any emphasis would be needed in the documentation itself. However these would be the first occurrences of the words "source text", I haven't found any established vocabulary to distinguish between source text, semantic interpretation, and "formatted output". As a parser-writer I would also welcome an update to the example box below that paragraph, showing that |
As long as there is no standard way to insert characters by reference (e.g. a symbol looking like a Unicode codepoint in |
Hello,
sorry to bother you again. As it might be obvious now, I'm implementing a new djot parser, and trying to match existing behavior. Here is something which surprises me as a user (and is somewhat difficult to fit in my parser architecture, but that's my problem):
As a user, I would have expected
\
and U+00A0 to be interchangeable, and not be considered as whitespace as far as syntax goes.Am I in a minority here? Is it worth a specification update?
The text was updated successfully, but these errors were encountered: