-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with invalid utf-8 #121
Comments
You can fetch feed using your code, not
And read response using this auto decoder
|
I have used the following workaround: https://github.com/kisielk/gorge/blob/master/util/util.go which strips non-utf8 chars from the stream. |
Yes this could work for any content. But it removes not decodes. Anyway, if you are satisfied with this, no problem :) |
@musabgultekin yeah, I don't think that the problem is that the site serves wrong encoding I think it's really just badly-encoded utf8 and only some chars are broken. The content still looks good after bad chars are removed. |
I digged in this issue and found out it caused by encoding/xml package. The package checks if characters are in the xml characters range, and if not, pop that error. I copy |
Expected behavior
Filter out non-utf-8 characters automatically or allow to opt-in for this behavior.
Actual behavior
Error
XML syntax error on line 93: invalid UTF-8
is produced and the feed cannot be processed.Steps to reproduce the behavior
It seems to be happening only if I fetch the feed from
https://ain.ua/feed
usingf.ParseURL
. When I open a locally saved file withf.Parse
, it works.ain.zip
The text was updated successfully, but these errors were encountered: