Python comes with a number of codecs builtin, either implemented as C
functions, or with dictionaries as mapping tables. The following table
lists the codecs by name, together with a few common aliases, and the
languages for which the encoding is likely used. Neither the list of
aliases nor the list of languages is meant to be exhaustive. Notice
that spelling alternatives that only differ in case or use a hyphen
instead of an underscore are also valid aliases.
Many of the character sets support the same languages. They vary in
individual characters (e.g. whether the EURO SIGN is supported or
not), and in the assignment of characters to code positions. For the
European languages in particular, the following variants typically
exist:
an ISO 8859 codeset
a Microsoft Windows code page, which is typically derived from
a 8859 codeset, but replaces control characters with additional
graphic characters
an IBM EBCDIC code page
an IBM PC code page, which is ASCII compatible
Codec
Aliases
Languages
ascii
646, us-ascii
English
cp037
IBM037, IBM039
English
cp424
EBCDIC-CP-HE, IBM424
Hebrew
cp437
437, IBM437
English
cp500
EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500
Western Europe
cp737
Greek
cp775
IBM775
Baltic languages
cp850
850, IBM850
Western Europe
cp852
852, IBM852
Central and Eastern Europe
cp855
855, IBM855
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856
Hebrew
cp857
857, IBM857
Turkish
cp860
860, IBM860
Portuguese
cp861
861, CP-IS, IBM861
Icelandic
cp862
862, IBM862
Hebrew
cp863
863, IBM863
Canadian
cp864
IBM864
Arabic
cp865
865, IBM865
Danish, Norwegian
cp869
869, CP-GR, IBM869
Greek
cp874
Thai
cp875
Greek
cp1006
Urdu
cp1026
ibm1026
Turkish
cp1140
ibm1140
Western Europe
cp1250
windows-1250
Central and Eastern Europe
cp1251
windows-1251
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6
iso-8859-6, arabic
Arabic
iso8859_7
iso-8859-7, greek, greek8
Greek
iso8859_8
iso-8859-8, hebrew
Hebrew
iso8859_9
iso-8859-9, latin5, L5
Turkish
iso8859_10
iso-8859-10, latin6, L6
Nordic languages
iso8859_13
iso-8859-13
Baltic languages
iso8859_14
iso-8859-14, latin8, L8
Celtic languages
iso8859_15
iso-8859-15
Western Europe
koi8_r
Russian
koi8_u
Ukrainian
mac_cyrillic
maccyrillic
Bulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greek
macgreek
Greek
mac_iceland
maciceland
Icelandic
mac_latin2
maclatin2, maccentraleurope
Central and Eastern Europe
mac_roman
macroman
Western Europe
mac_turkish
macturkish
Turkish
utf_16
U16, utf16
all languages
utf_16_be
UTF-16BE
all languages (BMP only)
utf_16_le
UTF-16LE
all languages (BMP only)
utf_7
U7
all languages
utf_8
U8, UTF, utf8
all languages
A number of codecs are specific to Python, so their codec names have
no meaning outside Python. Some of them don't convert from Unicode
strings to byte strings, but instead use the property of the Python
codecs machinery that any bijective function with one argument can be
considered as an encoding.
For the codecs listed below, the result in the ``encoding'' direction
is always a byte string. The result of the ``decoding'' direction is
listed as operand type in the table.
Codec
Aliases
Operand type
Purpose
base64_codec
base64, base-64
byte string
Convert operand to MIME base64
hex_codec
hex
byte string
Convert operand to hexadecimal representation, with two
digits per byte