OS4Depot.net 20130227_1013
  Home Search Mirrors Contact Info Credits

 Readme data for Root » Library » unilibdev.lha

Description: Unicode code point/UTF-8 support lib
Install: unilibdev.lha
Size: 281kb 5
Version: 5.14beta
Date: 22 Jul 08
Author: Joerg van de Loo
Submitter: Steven Solie
Email: ssolie/telus net
Category: library
License: Other
Distribute: yes
Min OS Version: 4.0
Foreword:
---------
With  the  ongoing  progress in development of MorphOS and AmigaOS4, also in
order  to  treat  UTF-8  no  longer  as  stepchild, I do hope that they will
render this library useless (no, I'm not kidding).
That  means  that you should first check whether there is support in your OS
for  a  certain  task and only in case not, you should fallback on functions
provided by Uni library. 
Unfortunately,  I  am  missing information about how far Unicode support has
been established in MorphOS 2.0 and for the upcoming OS4.1.


Introduction:
-------------
Uni  library is a support library for Unicode code points in range from 0 to
1'114'109  -  thus  not  limited to the Basic Multilingual Plane (range 0 to
65'535).
You    may    determine    code    point    attributes    (UPPERCASE_LETTER,
LOWERCASE_LETTER,  TITLECASE_LETTER  etc.) as well as you are able to change
these   attributes  for  a  code  point  (mapping  the  code  point  to  its
counterpart).

Because  I  haven't  found  a shared library with support functions that can
cover  UTF-8  strings,  I've  built  them into Uni Library as well, like for
example: UTF8StrCmp().

Furthermore,  transcoding  of  strings  from  one format to an other is also
implemented, like through: UTF16ToUTF8().


Thus, it's a shared library for three tasks:
    Determining code point attributes / mapping code points.
    Handling of UTF-8 multibyte sequences.
    Transcoding strings.

The  enclosed documentation was drawn up in HTML - and I spent a lot of time
in  order  to  clarify  some  misleading terms, which are frequently used by
people,  who  do not fully understand for what Unicode and its related terms
stand  for.  Okay,  I'm  not  an  expert  myself,  however,  please read the
documentation  I  provided before you study the API of this library; it will
be your benefit.


Changes:
-------
This  new  version of Uni library was upgraded in order to adopt the Unicode
Standard,  Version  5.1.0  character  encoding  scheme  as  published by the
Unicode Consortium and so far as my limited implementation can support it.

In  addition,  this  new  version  fixes  a  bug  which  surfaced  in case a
UTF-32/UTF-16  string was to be transcoded to UTF-8. The UTF-8 string buffer
had to be at least four bytes bigger than required (ouch...).

UniCodeChart()  supports  32  new  code charts and with that it supports 201
code charts in total.


Notes on transcoding singlebyte character encoding schemes:
-----------------------------------------------------------
I'll  release  an  additional  archive  (UniLibSupp  - already used by a 3rd
party)  that  shall make it easier for you to transcode strings by utilizing
IANA-IDs,  which  are  also used by the operating system's Locale library up
from version 50 (MorphOS, AmigaOS4).

Functions:
----------
The API provides these functions:

        Code Points Attribute Information

UniIsAlpha()
UniIsAttr()
UniIsCon()
UniIsDigit()
UniIsLower()
UniIsNSM()
UniIsPrint()
UniIsPunct()
UniIsSpace()
UniIsTitle()
UniIsUpper()

UniToLower()
UniToTitle()
UniToUpper()

UniCodeChart()

        UTF-8 String Information

UTF8IsLegal()
UTF8LegalStart()
UTF8NextChar()
UTF8PrevChar()
UTF8CharAtIndex()

UTF8StrInfo()
UTF8StrLen()
UTF8StrOfSize()
UTF8StrVisibleLen()

        UTF-8 String Comparison / Modifiers

UTF8StrCat()
UTF8StrCmp()
UTF8StrCmpI()
UTF8StrCpy()
UTF8StrFind()
UTF8StrMatch()
UTF8StrNCat()
UTF8StrNCmp()
UTF8StrNCmpI()
UTF8StrNCpy()
UTF8StrPaste()
UTF8StrReplace()
UTF8StrTerminate()
UTF8StrToken()
UTF8StrToLower()
UTF8StrToTitle()
UTF8StrToUpper()

        Miscellaneous (Wide Char) String Functions

UTF16StrLen()
UTF32StrLen()

UTF16CharAsUTF8Len()
UTF32CharAsUTF8Len()

        Transcodings

LatinToUTF8()
UTF8ToLatin()

UTF16ToUTF8()
UTF32ToUTF8()

UTF8ToUTF16()
UTF8ToUTF16Char()
UniResultIsSurrogate()
UTF8ToUTF32()
UTF8ToUTF32Char()

        Encodings

UniCheckEncoding()
UniBomHasSize()
UniSwitchEncoding()





Copyright (c) 2004-2013 Björn Hagström. All rights reserved.
OS4 and its logos are registered trademarks of Hyperion Entertainment