Unit CastleUnicode

Description

Unicode utilities.

Uses

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TUnicodeCharList  

Functions and Procedures

function UTF8CharacterLength(p: PChar): integer;
function UTF8Length(const s: string): PtrInt; overload;
function UTF8Length(p: PChar; ByteCount: PtrInt): PtrInt; overload;
function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar;
function UTF8Copy(const s: string; StartCharIndex, CharCount: PtrInt): string;
function UTF8SEnding(const S: String; const StartCharIndex: PtrInt): String;
function UTF8CharacterToUnicode(p: PChar; out CharLen: integer): TUnicodeChar;
function UnicodeToUTF8(CodePoint: TUnicodeChar): string;
function UnicodeToUTF8Inline(CodePoint: TUnicodeChar; Buf: PChar): integer;
function UTF8ToHtmlEntities(const S: String): String;

Types

TUnicodeChar = Cardinal;

Description

Functions and Procedures

function UTF8CharacterLength(p: PChar): integer;
 
function UTF8Length(const s: string): PtrInt; overload;
 
function UTF8Length(p: PChar; ByteCount: PtrInt): PtrInt; overload;
 
function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar;
 
function UTF8Copy(const s: string; StartCharIndex, CharCount: PtrInt): string;
 
function UTF8SEnding(const S: String; const StartCharIndex: PtrInt): String;
 
function UTF8CharacterToUnicode(p: PChar; out CharLen: integer): TUnicodeChar;

Return unicode character pointed by P. CharLen is set to 0 only when pointer P is Nil, otherwise it's always > 0.

The typical usage of this is to iterate over UTF-8 string char-by-char, like this:

var
  C: TUnicodeChar;
  TextPtr: PChar;
  CharLen: Integer;
begin
  TextPtr := PChar(S);
  C := UTF8CharacterToUnicode(TextPtr, CharLen);
  while (C > 0) and (CharLen > 0) do
  begin
    Inc(TextPtr, CharLen);
    // here process C...
    C := UTF8CharacterToUnicode(TextPtr, CharLen);
  end;
end;

function UnicodeToUTF8(CodePoint: TUnicodeChar): string;

function UTF8CharacterToUnicode(const S: string): TUnicodeChar;

function UnicodeToUTF8Inline(CodePoint: TUnicodeChar; Buf: PChar): integer;
 
function UTF8ToHtmlEntities(const S: String): String;

Convert all special Unicode characters in the given UTF-8 string to HTML entities. This is a helpful routine to visualize a string with any Unicode characters using simple ASCII.

"Special" Unicode characters is "anything outside of safe ASCII range, which is between space and ASCII code 128". The resulting string contains these special characters encoded as HTML entities that show the Unicode code point in hex. Like &#xNNNN; (see https://en.wikipedia.org/wiki/Unicode_and_HTML ). Converts also ampersand & to & to prevent ambiguities.

Tip: You can check Unicode codes by going to e.g. https://codepoints.net/U+F3 for ó. Just edit this URL in the WWW browser address bar.

Types

TUnicodeChar = Cardinal;
 

Generated by PasDoc 0.16.0.