You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following issue #180, #25 and some other issues, I'd like to make character sanitization more robust.
I've previously tried to have the code do something like the following:
funcsanitizeXML(xmlDatastring) string {
varbuffer bytes.Bufferfor_, r:=rangexmlData {
ifisLegalXMLChar(r) {
buffer.WriteRune(r)
} else {
// Replace illegal characters with their XML character reference.// You can also skip writing illegal characters by commenting the next line.buffer.WriteString(fmt.Sprintf("&#x%X;", r))
}
}
returnbuffer.String()
}
funcisLegalXMLChar(rrune) bool {
returnr==0x9||r==0xA||r==0xD||
(r>=0x20&&r<=0xD7FF) ||
(r>=0xE000&&r<=0xFFFD) ||
(r>=0x10000&&r<=0x10FFFF)
}
However, there is an old issue #21 that indicated that when I sanitized these characters, it then messed up parsing non-utf8 feeds.
If anyone has any suggestions for how to accommodate both requirements:
Stripping illegal characters from feeds to prevent the xml parser from throwing an error
Allowing the parsing of non-utf8 feeds
It would be much appreciated!
The text was updated successfully, but these errors were encountered:
Following issue #180, #25 and some other issues, I'd like to make character sanitization more robust.
I've previously tried to have the code do something like the following:
However, there is an old issue #21 that indicated that when I sanitized these characters, it then messed up parsing non-utf8 feeds.
If anyone has any suggestions for how to accommodate both requirements:
It would be much appreciated!
The text was updated successfully, but these errors were encountered: