Topics

MBCS encoding in Python 2 (up to NVDA 2019.2)

 

Hello everyone,
I hope this is a right place for asking such question.
I was wondering about MBCS encoding in Python 2 quite often used in
NVDA code. Example from the main systrayList add-on file:
"_addonDir = os.path.join(os.path.dirname(__file__), "..", "..")
if sys.version_info.major == 2:
_addonDir = _addonDir.decode("mbcs")"
It seems to me that this encoding actually works only in Python 2 (and
not 3). I have searched about this thing on the web but was unable to
find the good explanation of why this should be used when getting the
proper directory. Can someone explain what is that encoding and why is
it not nesesary in Python 3?
Thanks.
Sincerely,
Paulius Leveris

 

Hey Paulius,


Here is a pretty good article about unicode in Python: https://www.b-list.org/weblog/2017/sep/05/how-python-does-unicode/ . It doesn't tell something about mbcs though, your actual question.


In short, mbcs is the encoding used in Python 2 to deal with system file paths. It is not necessarily one encoding, but it corresponds with the system default encoding. For example, on my Dutch system, encoding € as mbcs results in \x80, the byte representation of € in the cp1252 encoding. If I would enable beta support for utf-8 encoding in Windows 10 region settings, mbcs will very much if not completely behave like utf-8. It is still available in python 3.


In python 3, you can simply pass Python 3 type strings to functions that deal with paths, so you no longer have to make sure that paths are encoded in the right encoding.


Regards,

Leonard

Op 21-8-2019 om 00:02 schreef Paulius:

Hello everyone,
I hope this is a right place for asking such question.
I was wondering about MBCS encoding in Python 2 quite often used in
NVDA code. Example from the main systrayList add-on file:
"_addonDir = os.path.join(os.path.dirname(__file__), "..", "..")
if sys.version_info.major == 2:
	_addonDir = _addonDir.decode("mbcs")"
It seems to me that this encoding actually works only in Python 2 (and
not 3). I have searched about this thing on the web but was unable to
find the good explanation of why this should be used when getting the
proper directory. Can someone explain what is that encoding and why is
it not nesesary in Python 3?
Thanks.
Sincerely,
Paulius Leveris



 

Hello,
Thanks for the detailed answer.
Sincerely,
Paulius Leveris<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br />
<table style="border-top: 1px solid #D3D4DE;">
<tr>
<td style="width: 55px; padding-top: 13px;"><a
href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank"><img
src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif"
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/></a></td>
<td style="width: 470px; padding-top: 12px; color: #41424e;
font-size: 13px; font-family: Arial, Helvetica, sans-serif;
line-height: 18px;">Virus-free. <a
href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank" style="color: #4453ea;">www.avast.com</a>
</td>
</tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"></a></div>

2019-08-21 8:28 GMT+03:00, Leonard de Ruijter <@leonardder>:

Hey Paulius,


Here is a pretty good article about unicode in Python:
https://www.b-list.org/weblog/2017/sep/05/how-python-does-unicode/ . It
doesn't tell something about mbcs though, your actual question.


In short, mbcs is the encoding used in Python 2 to deal with system file
paths. It is not necessarily one encoding, but it corresponds with the
system default encoding. For example, on my Dutch system, encoding € as
mbcs results in \x80, the byte representation of € in the cp1252
encoding. If I would enable beta support for utf-8 encoding in Windows
10 region settings, mbcs will very much if not completely behave like
utf-8. It is still available in python 3.


In python 3, you can simply pass Python 3 type strings to functions that
deal with paths, so you no longer have to make sure that paths are
encoded in the right encoding.


Regards,

Leonard

Op 21-8-2019 om 00:02 schreef Paulius:
Hello everyone,
I hope this is a right place for asking such question.
I was wondering about MBCS encoding in Python 2 quite often used in
NVDA code. Example from the main systrayList add-on file:
"_addonDir = os.path.join(os.path.dirname(__file__), "..", "..")
if sys.version_info.major == 2:
_addonDir = _addonDir.decode("mbcs")"
It seems to me that this encoding actually works only in Python 2 (and
not 3). I have searched about this thing on the web but was unable to
find the good explanation of why this should be used when getting the
proper directory. Can someone explain what is that encoding and why is
it not nesesary in Python 3?
Thanks.
Sincerely,
Paulius Leveris