Predefined values for some standard encodings.
Libxml don't do beforehand translation on UTF8, ISOLatinX.
It also support UTF16 (LE and BE) by default.
Anything else would have to be translated to UTF8 before being
given to the parser itself. The BOM for UTF16 and the encoding
declaration are looked at and a converter is looked for at that
point. If not found the parser stops here as asked by the XML REC
Converter can be registered by the user using xmlRegisterCharEncodingHandler
but the current form doesn't allow stateful transcoding (a serious
problem agreed !). If iconv has been found it will be used
automatically and allow stateful transcoding, the simplest is then
to be sure to enable icon and to provide iconv libs for the encoding
support needed.
xmlCharEncodingInputFunc ()
int (*xmlCharEncodingInputFunc) (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen);
Take a block of chars in the original encoding and try to convert
it to an UTF-8 block of chars out.
out :
a pointer to an array of bytes to store the UTF-8 result
outlen :
the length of out
in :
a pointer to an array of chars in the original encoding
inlen :
the length of in
Returns :
the number of byte written, or -1 by lack of space, or -2
if the transcoding failed.
The value of inlen after return is the number of octets consumed
as the return value is positive, else unpredictiable.
The value of outlen after return is the number of octets consumed.
xmlCharEncodingOutputFunc ()
int (*xmlCharEncodingOutputFunc) (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen);
Take a block of UTF-8 chars in and try to convert it to an other
encoding.
Note: a first call designed to produce heading info is called with
in = NULL. If stateful this should also initialize the encoder state.
out :
a pointer to an array of bytes to store the result
outlen :
the length of out
in :
a pointer to an array of UTF-8 chars
inlen :
the length of in
Returns :
the number of byte written, or -1 by lack of space, or -2
if the transcoding failed.
The value of inlen after return is the number of octets consumed
as the return value is positive, else unpredictiable.
The value of outlen after return is the number of ocetes consumed.
Initialize the char encoding support, it registers the default
encoding supported.
NOTE: while public, this function usually doesn't need to be called
in normal processing.
xmlCleanupCharEncodingHandlers ()
void xmlCleanupCharEncodingHandlers (void);
Cleanup the memory allocated for the char encoding support, it
unregisters all the encoding handlers and the aliases.
Compare the string to the known encoding schemes already known. Note
that the comparison is case insensitive accordingly to the section
[XML] 4.3.3 Character Encoding in Entities.
name :
the encoding name as parsed, in UTF-8 format (ASCII actually)
Returns :
one of the XML_CHAR_ENCODING_... values or XML_CHAR_ENCODING_NONE
if not recognized.
The "canonical" name for XML encoding.
C.f. http://www.w3.org/TR/REC-xmlcharencoding
Section 4.3.3 Character Encoding in Entities
enc :
the encoding
Returns :
the canonical name for the given encoding
xmlDetectCharEncoding ()
xmlCharEncoding xmlDetectCharEncoding (unsigned char *in,
int len);
Guess the encoding of the entity using the first bytes of the entity content
accordingly of the non-normative appendix F of the XML-1.0 recommendation.
in :
a pointer to the first bytes of the XML entity, must be at least
4 bytes long.
Generic front-end for the encoding handler output function
a first call with in == NULL has to be made firs to initiate the
output in case of non-stateless encoding needing to initiate their
state or the output (like the BOM in UTF16).
In case of UTF8 sequence conversion errors for the given encoder,
the content will be automatically remapped to a CharRef sequence.
handler :
char enconding transformation data structure
out :
an xmlBuffer for the output.
in :
an xmlBuffer for the input
Returns :
the number of byte written if success, or
-1 general error
-2 if the transcoding fails (for *in is not valid utf8 string or
the result of transformation can't fit into the encoding we want), or
Generic front-end for the encoding handler input function
handler :
char encoding transformation data structure
out :
an xmlBuffer for the output.
in :
an xmlBuffer for the input
Returns :
the number of byte written if success, or
-1 general error
-2 if the transcoding fails (for *in is not valid utf8 string or
the result of transformation can't fit into the encoding we want), or
Front-end for the encoding handler input function, but handle only
the very first line, i.e. limit itself to 45 chars.
handler :
char enconding transformation data structure
out :
an xmlBuffer for the output.
in :
an xmlBuffer for the input
Returns :
the number of byte written if success, or
-1 general error
-2 if the transcoding fails (for *in is not valid utf8 string or
the result of transformation can't fit into the encoding we want), or
Generic front-end for encoding handler close function
handler :
char enconding transformation data structure
Returns :
0 if success, or -1 in case of error
UTF8Toisolat1 ()
int UTF8Toisolat1 (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen);
Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1
block of chars out.
out :
a pointer to an array of bytes to store the result
outlen :
the length of out
in :
a pointer to an array of UTF-8 chars
inlen :
the length of in
Returns :
0 if success, -2 if the transcoding fails, or -1 otherwise
The value of inlen after return is the number of octets consumed
as the return value is positive, else unpredictable.
The value of outlen after return is the number of ocetes consumed.
isolat1ToUTF8 ()
int isolat1ToUTF8 (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen);
Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8
block of chars out.
out :
a pointer to an array of bytes to store the result
outlen :
the length of out
in :
a pointer to an array of ISO Latin 1 chars
inlen :
the length of in
Returns :
0 if success, or -1 otherwise
The value of inlen after return is the number of octets consumed
as the return value is positive, else unpredictable.
The value of outlen after return is the number of ocetes consumed.
xmlCheckUTF8 ()
int xmlCheckUTF8 (unsigned char *utf);
Checks utf for being valid utf-8. utf is assumed to be
null-terminated. This function is not super-strict, as it will
allow longer utf-8 sequences than necessary. Note that Java is
capable of producing these sequences if provoked. Also note, this
routine checks for the 4-byte maximum size, but does not check for
0x10ffff maximum value.