file encoding issue, files get many diffs just after saving. #7580
Replies: 18 comments
-
without a file to reproduce this with a new file so this is unactionable if you can' provide an existing file where this is reproducible |
Beta Was this translation helpful? Give feedback.
-
I understand so, that's why I'm trying to add as much as info I can. opening the file in vanilla notepad, the encoding seems to be ANSI, when saved the file with utf-8 encoding. Helix can now read the file normally. But for helix, I can't change encoding to ANSI, when trying How could I read the ANSI encoding correctly with helix? |
Beta Was this translation helpful? Give feedback.
-
we probably don't detect windows -1252 correctly. There is no way to manually set the encoding while reading a file with helix its always auto detected. Once the file has been read to memory its always converted to UTF-8 so at that point the encoding information is already lost. when you change the encoding with i guess the autodetecion fails somehow here. However, again saving a file containing también as windows-1252 and then opening that with helix works just fine. So I would need a reproducible example to test |
Beta Was this translation helpful? Give feedback.
-
I think following up with |
Beta Was this translation helpful? Give feedback.
-
ANSI encoding is the issue not Windows-1252. Should try save the file as ANSI then open with helix. |
Beta Was this translation helpful? Give feedback.
-
There is no such thing as ANSI encoding. ANSI is a colloquial name that usually refers to |
Beta Was this translation helpful? Give feedback.
-
Thanks for correcting me. The windows notepad didn't provide details
further than `ANSI` regarding the file encoding.
Will check if I could provide more details regarding the text causing the
issue.
Regards,
…On Sun, 2 Jul 2023, 20:21 Pascal Kuthe, ***@***.***> wrote:
There is no such thing as ANSI encoding. ANSI is a colloquial name that
usually refers to wibdowz-1252 or wibdows -1254
—
Reply to this email directly, view it on GitHub
<#7514 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASONXPDPQGRS23OTYXLCRC3XOGUZVANCNFSM6AAAAAAZ3POQDA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Doesn't help. Same corrupted characters. |
Beta Was this translation helpful? Give feedback.
-
Could you try to produce a minimal file which contains no confidential information that you can upload? At this point we are just guessing. Without a reproductioncase three ins't much we can do here |
Beta Was this translation helpful? Give feedback.
-
@pascalkuthe I tried to share the file but here what has happened. File is a .c which can't be attached through github. I tried to zip the file, but retrieving the file from the compression seems to not reproduce the issue. I tried to xxd both files the original and the renamed and content seems to match. What do you suggest for sharing the file untouched by Windows? |
Beta Was this translation helpful? Give feedback.
-
You can upload it somewhere else. Google drive or whatever. Renaming a file should however never change its contents. How did you change the file extension? You should just do and reloading that file while forcing utf-8 (
What exactly do you mean with this? Does VSCode also display these weird characters or do you mean that creating a file and then opening it with helix causes these issue. Its possible that vscode is saving as utf-16 by default while those other editors definitely won't. |
Beta Was this translation helpful? Give feedback.
-
Glad you could reproduce. These characters are what am I getting when attempting to read the file in helix.
I used nvim, :save newfile.txt.
The file can be read normally in Vim, Neovim and emacs. But, VScode behaves like helix, reading those awkward chars and writing it back when saving. Which corrupts the file and generate many diffs.
Yes, exactly. I stopped using VScode long ago since it was generating this issue with many files.
Could be, since I think it auto detects encoding and I think it was windows-1252. Not really sure though. |
Beta Was this translation helpful? Give feedback.
-
this is not the same thing, try to rename the file in windows explorer. Nvim defaults to utf8 for new files. Your problem is that your files are encoded using utf-16be I have no idea why. Basically everybody uses utf-8 (or maybe some variant). Even wore you are using utf-16 files without a BOM. both VSCode use encoding detection that is compliant with webstandards (the encoding detection used in helix is also more or less used in firefox, vscode is based on chrome). Utf16 is only detected if it has a proper BOM but there is no autodetction for it. Vim and emacs have that kind of autodetection for historical reasons but its a pain to implement and considering that utf-16 is basically dead as a format for plaintext files (and when its usec normally has a bom) its not worth implementing. I would advise you to convert all your filer to utf8 or add a BOM marker. You can also manually set the encoding to |
Beta Was this translation helpful? Give feedback.
-
Will consider your recommendations and report back in 2 days. Thanks.
Regards,
…On Fri, 7 Jul 2023, 23:32 Pascal Kuthe, ***@***.***> wrote:
I used nvim, :save newfile.txt.
I tried to copy file and paste it.
I have no mv utility in Windows, I didn't either try to rename the file.
this is not the same thing, try to rename the file in windows explorer.
Nvim defaults to utf8 for new files.
Your problem is that your files are encoded using utf-16be I have no idea
why. Basically everybody uses utf-8 (or maybe some variant). Even wore you
are using utf-16 files without a BOM. both VSCode use encoding detection
that is compliant with webstandards (the encoding detection used in helix
is also more or less used in firefox, vscode is based on chrome). Utf16 is
only detected if it has a proper BOM but there is no autodetction for it.
Vim and emacs have that kind of autodetection for historical reasons but
its a pain to implement and considering that utf-16 is basically dead as a
format for plaintext files (and when its usec normally has a bom) its not
worth implementing.
I would advise you to convert all your filer to utf8 or add a BOM marker.
You can also manually set the encoding to :encoding utf-16be and then
:reload your file
—
Reply to this email directly, view it on GitHub
<#7514 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASONXPGOWSLGNKMPEMPU7ALXPBW5DANCNFSM6AAAAAAZ3POQDA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Renaming the file with simply pressing f2 on Windows, didn't reproduce the issue. I can read the chars normally after renaming.
That resulted in Chinese like characters.
This solves the issue, chars can be normally read now. But, I need a solution from inside helix. |
Beta Was this translation helpful? Give feedback.
-
I tried windows-1252 then reload instead of utf16-be. It display chars normally now and doesn't generate diffs when saving. Since you earlier said that there's no way to manually auto set encoding before reading files. Is there a method to automate setting encoding to windows-1252 and reloading the file. some sort of hook or BufRead vim like. @pascalkuthe Thanks for your support. |
Beta Was this translation helpful? Give feedback.
-
No there is no way to set the encoding manually and there are no plans to add that. It's a nieche usecase that isn't worth spending effort on. Autodetection should work if you can provide a reproducible case where it doesn't we can look into fixing that since it doesn't comolicate the codebase |
Beta Was this translation helpful? Give feedback.
-
@pascalkuthe Here's the file. changing its name seems to change encoding. changing only extensions conserve the issue. Also, here's a ss shows the difference between helix on the left, neovim on the right. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
when reading file that contain western Europe characters such as 'Función', Helix reads it as 'Funci�n'. also overwrites it when saving which generates many diff - since there are many more like those. Vim/NeoVim/Emacs don't cause this issue, but, vscode does.
Reverting the file with hx doesn't help since hx can't write the right characters. once file is broken form helix, I have to revert the file with version control or if I am opening the same file with nvim and nvim can undo the hx save.
all editors encoding is set to utf-8. I can't provide the file since its property isn't mine.
Reproduction Steps
I tried this:
hx
, then pick a file to work with.I expected this to happen:
when saving the file with no changes made. The file shouldn't contain any diffs.
Instead, this happened:
but since helix can't read the characters correctly it produce many diffs.
for example
word
también
is converted to 'tambi�n'. It meas also in Spanish.while
Función
, Helix reads it asFunci�n
.the word
tambi�n
is actuallyt a m b i & # 6 5 5 3 3 ; n
without the spaces but it gets rendered. maybe that could be a clue.Helix log
~/.cache/helix/helix.log
Platform
Windows
Terminal Emulator
Windows Terminal
Helix Version
helix-term 23.03 (3cf0372) Blaž Hrastnik [email protected] A post-modern text editor.
Beta Was this translation helpful? Give feedback.
All reactions