'From Squeak3.8alpha of ''17 July 2004'' [latest update: #6326] on 19 October 2004 at 10:44:18 pm'! "Change Set: m17nClassesComment1 Date: 19 October 2004 Author: Yoshiki Ohshima Adds the comments to the classes in 'Multilingual-BaseClasses' and 'Multilingual-Encodings'. Also the method category of those classes are revisited."! !AbstractString commentStamp: 'yo 10/19/2004 22:36' prior: 0! This class provides the abstract super class for the original String (that represents an array of 8-bit Characters) and MultiString (that represents an array of 32-bit MultiCharacters). In the similar manner of LargeInteger and SmallInteger, those subclasses are chosen accordingly for a string; namely as long as the system can figure out so, the String is used to represent the given string. The methods of this class were copied from String. Most of the methods only use #at: and #at:put: to access the elements and don't care about the actual type, so they work ok with proper accessors to the slots. Some of the methods of this class call #subclassResponsibility, and some other provides the default behavior and MultiString overrides the default behavior. Probably there should be a clearer organization in this regard. ! !EncodedCharSet commentStamp: 'yo 10/19/2004 19:08' prior: 0! An abstract superclasss of the classes that represent encoded character sets. In the old implementation, the charsets had more important role. However, in the current implementation, the subclasses are used only for keeping the backward compatibility. The other confusion comes from the name of "Latin1" class. It used to mean the Latin-1 (ISO-8859-1) character set, but now it primarily means that the "Western European languages that are covered by the characters in Latin-1 character set. ! !FilePath commentStamp: 'yo 10/19/2004 21:36' prior: 0! This class absorb the difference of internal and external representation of the file path. The idea is to keep the internal one as much as possible, and only when it goes to a primitive, the encoded file path, i.e. the native platform representation is passsed to the primitive. The converter used is obtained by "LanguageEnvironment defaultFileNameConverter". ! !GB2312 commentStamp: 'yo 10/19/2004 19:52' prior: 0! This class represents the domestic character encoding called GB 2312 used for simplified Chinese. ! !JISX0208 commentStamp: 'yo 10/19/2004 19:52' prior: 0! This class represents the domestic character encoding called JIS X 0208 used for Japanese.! !KSX1001 commentStamp: 'yo 10/19/2004 19:53' prior: 0! This class represents the domestic character encoding called KS X 1001 used for Korean.! !Latin1 commentStamp: 'yo 10/19/2004 19:53' prior: 0! This class represents the domestic character encoding called ISO-8859-1, also known as Latin-1 used for Most of the Western European Languages.! !MultiCharacter commentStamp: 'yo 10/19/2004 22:28' prior: 0! This class represents 32-bit wide characters. In practice, you don't want to go into negative values, so it uses 30-bit (another bit is used for the SmallInteger tag). The code point is based on Unicode. Since Unicode is 21-bit wide character set, we have several bits available for other information. As the Unicode Standard states, a Unicode code point doesn't carry the language information. This is going to be a problem with the languages so called CJK (Chinese, Japanese, Korean. Or often CJKV including Vietnamese). Since the characters of those languages are unified and given the same code point, it is impossible to display a bare Unicode code point in an inspector or such tools. To utilize the extra available bits, we use them for identifying the languages. Since the old implementation uses the bits to identify the character encoding, the bits are sometimes called "encoding tag" or neutrally "leading char", but the bits rigidly denotes the concept of languages. The other languages can have the language tag if you like. This will help to break the large default font (font set) into separately loadable chunk of fonts. However, it is open to the each native speakers and writers to decide how to define the character equality, since the same Unicode code point may have different language tag thus simple #= comparison may return false. ! !MultiString commentStamp: 'yo 10/19/2004 22:34' prior: 0! This class represents the array of 32 bit wide characters. ! !MultiSymbol commentStamp: 'yo 10/19/2004 22:42' prior: 0! This class represents the symbols whose slots are the MultiCharacters. The protocol is basically the same as the one of Symbol with a few exceptions. Some think the separated symbol tables from the ones of Symbol isn't a great idea. I kind of disagree but would like to see a better solution. ! !UCSTable commentStamp: 'yo 10/19/2004 19:54' prior: 0! This class represents the Unicode conversion table from/to the domestic encodings and Unicode. ! !Unicode commentStamp: 'yo 10/19/2004 20:44' prior: 0! This class holds the entry points for the utility functions around characters. ! !Unicode class reorganize! ('subencodings' isJapanese: isKorean: isSimplifiedChinese: isTraditionalChinese: isUnifiedKanji:) ('character classification' isDigit: isLetter: isLowercase: isUppercase:) ('class methods' charSetSize compoundTextFinalChar compoundTextSequence digitValue: generalCategory generalCategoryComment leadingChar nextPutValue:toStream:withShiftSequenceIfNeededForTextConverterState: parseUnicodeDataFrom: ucsTable) ('accessing - displaying' isBreakableAt:in: printingDirection scanSelector) ('comments' blocks320Comment blocks320Comment2) ('instance creation' charFromUnicode: value:) ! !UCSTable class reorganize! ('accessing - table' gb2312Table initialize initializeGB2312Table initializeJISX0208Table initializeKSX1001Table initializeLatin1Table jisx0208Table ksx1001Table latin1Table) ! !MultiSymbol class reorganize! ('accessing' selectorsContaining: thatStarts:skipping:) ('class initialization' allMultiSymbolTablesDo: allMultiSymbolTablesDo:after: compactMultiSymbolTable compareTiming initialize) ('instance creation' intern: internCharacter: internLoadedSymbol: lookup: lookupForLoadedSymbol: newFrom: newFromStream: readFrom:) ('private' hasInterned:ifTrue: hasInternedALoadedSymbol:ifTrue: possibleSelectorsFor: rehash shutDown:) ! !MultiSymbol reorganize! ('filter streaming' byteEncode:) ('copying' clone copy shallowCopy veryDeepCopyWith:) ('system primitives' flushCache) ('converting' asExplorerString asMultiSymbol asString asSymbol capitalized) ('accessing' at:put: precedence replaceFrom:to:with:startingAt:) ('testing' isInfix isKeyword isLiteral isOrientedFill isPvtSelector isSymbol isUnary) ('comparing' =) ('printing' storeOn:) ('private' errorNoModification species string:) ('Camp Smalltalk' sunitAsClass) ! !MultiCharacter class reorganize! ('instance creation' allCharacters from: leadingChar:code: value:) ! !MultiCharacter reorganize! ('converting' asCharacter asString asUnicode asUnicodeChar isoToSqueak squeakToIso) ('as yet unclassified' value:) ('testing' isUnicode) ('comparing' = hash) ('printing' hex) ! !Latin1 class reorganize! ('class methods' charSetSize emitSequenceToResetStateIfNeededOn:forState: initialize leadingChar nextPutValue:toStream:withShiftSequenceIfNeededForTextConverterState:) ('character classification' isLetter:) ('accessing - displaying' isBreakableAt:in: printingDirection scanSelector) ('private' nextPutRightHalfValue:toStream:withShiftSequenceIfNeededForTextConverterState:) ! !KSX1001 class reorganize! ('class methods' charSetSize compoundTextSequence initialize leadingChar nextPutValue:toStream:withShiftSequenceIfNeededForTextConverterState: ucsTable) ('character classification' isLetter:) ('accessing - displaying') ! !JISX0208 class reorganize! ('class methods' charAtKuten: charSetSize compoundTextSequence initialize leadingChar nextPutValue:toStream:withShiftSequenceIfNeededForTextConverterState: printingDirection stringFromKutenArray: ucsTable unicodeLeadingChar) ('character classification' isLetter:) ('accessing - displaying' isBreakableAt:in:) ! !GB2312 class reorganize! ('class methods' charSetSize compoundTextSequence initialize isLetter: leadingChar nextPutValue:toStream:withShiftSequenceIfNeededForTextConverterState: ucsTable) ('accessing - displaying') ! !FilePath class reorganize! ('instance creation' pathName: pathName:isEncoded:) ! !EncodedCharSet class reorganize! ('class methods' charFromUnicode: charSetSize charsetAt: digitValue: initialize leadingChar nextPutValue:toStream:withShiftSequenceIfNeededForTextConverterState: ucsTable) ('character classification' canBeGlobalVarInitial: canBeNonGlobalVarInitial: isDigit: isLetter: isLowercase: isUppercase:) ('accessing - displaying' isBreakableAt:in: printingDirection) !