Replies: 1 comment 2 replies
-
Hi. First a bit background: Guitar Pro 3-5 files had a binary format which did not change much over time beside the fact that it was extended with new content. Some official documentation about version 4 was once published: http://dguitar.sourceforge.net/GP4format.html (I guess it was contributed by Arobas to the public at some point.) It also documents the data type Guitar Pro in it's old days used the default system encoding for non-unicode programs. This setting is part of the windows regional settings: This leads us to this issue that if you share GP5 files across machines with different settings, the information about the right encoding is lost. Regarding the handling in alphaTab As you have found correctly: alphaTab allows you to specify the encoding for string decoding for exactly this reason. But detecting the "right" encoding is not easy only only possible maybe via some heuristics which come at significant cost on decoding. We could take the first proper string read, and try to guess the encoding. Defining the right heuristics is hard:
A starting point could be to look around in the web what mechanisms for detecting the encodings might be known already. I heard that Notepad++ might do a good job. I also found https://github.com/aadsm/jschardet which seem to be used in VSCode. I could imagine adding a feature to alphaTab which would allow you integrating libraries like that. e.g. a callback which would allow you to decode the strings from the raw bytes on your own. Or something to just dynamically detect the encoding based on the first string read. For the second approach it the callback would need to return a Adapting the logic of a more open library like https://github.com/Ousret/charset_normalizer or https://github.com/PyYoshi/cChardet would also be a possibility in a small scale. Long story short:
|
Beta Was this translation helpful? Give feedback.
-
Hi there, I plan to use AlphaTab to render
.gp*
files uploaded by users. Soon I noticed that many old files are not in Unicode, of which titles and comments are rendered in gibberish. I was able to get some of the files correct by manually numerating possible encodings (such as "gbk", "big5") in theimporter.encoding
setting.So a work around to the issue can be to ask users to select encoding before upload. However, I don't think today on the internet an average user would understand what "encoding" is. (It seems to be a pretty old thing that was already fixed by UTF-8?).
So I'm looking for a more automatic solution. My thoughts:
.gp*
file that AlphaTab can read and use? I played the official GuitarPro software on Mac, unfortunately found the same problem. So I'm afraid that was something missing in the old gp file format.Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions