'From Squeak3.7alpha of ''11 September 2003'' [latest update: #5623] on 5 January 2004 at 6:09:45 pm'! "Change Set: CompressedSources Date: 20 November 2003 Author: Dan Ingalls md: fixed for KCP (to call some methods that where moved to SmalltalkImage). Changes for Version 1.1: Fixed a bug in #next, and added #fileID which is needed by OSProcess. This changeSet defines CompressedSourceStream which writes files as a series of GZipped segments that can be written and read sequentially like regular source files, and yet remain compressed by about a factor of four on the disk. Useful for developing with full sources on, eg, PDAs or Squeak PCs with limited file space available. Full sources for Squeak 3.6 fit in 3.5MB. It also defines Smalltalk compressSources, a method that will write a copy of the sources file in this format, with the extension 'stc' in place of 'sources'. Finally, it modifies the openSources:forImage: method so that it will look first for such a compressed copy, and open it if found."! ReadWriteStream subclass: #CompressedSourceStream instanceVariableNames: 'segmentFile segmentSize nSegments segmentTable segmentIndex dirty endOfFile' classVariableNames: '' poolDictionaries: '' category: 'System-Files'! !CompressedSourceStream commentStamp: 'di 11/3/2003 17:58' prior: 0! I implement a file format that compresses segment by segment to allow incremental writing and browsing. Note that the file can only be written at the end. Structure: segmentFile The actual compressed file. segmentSize This is the quantum of compression. The virtual file is sliced up into segments of this size. nSegments The maximum number of segments to which this file can be grown. endOfFile The user's endOfFile pointer. segmentTable When a file is open, this table holds the physical file positions of the compressed segments. segmentIndex Index of the most recently accessed segment. Inherited from ReadWriteStream... collection The segment buffer, uncompressed position This is the position *local* to the current segment buffer readLimit ReadLimit for the current buffer writeLimit WriteLimit for the current buffer Great care must be exercised to distinguish between the position relative to the segment buffer and the full file position (and, or course, the segment file position ;-). The implementation defaults to a buffer size of 20k, and a max file size of 34MB (conveniently chosen to be greater than the current 33MB limit of source code pointers). The format of the file is as follows: segmentSize 4 bytes nSegments 4 bytes endOfFile 4 bytes segmentTable 4 bytes * (nSegments+1) beginning of first compressed segment It is possible to override the default allocation by sending the message #segmentSize:nSegments: immediately after opening a new file for writing, as follows: bigFile _ (CompressedSourceStream on: (FileStream newFileNamed: 'biggy.stc')) segmentSize: 50000 maxSize: 200000000 The difference between segment table entries reveals the size of each compressed segment. When a file is being written, it may lack the final segment, but any flush, position:, or close will force a dirty segment to be written.! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/1/2003 22:36'! binary self error: 'Compressed source files are ascii to the user (though binary underneath)'! ! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/1/2003 22:36'! close self flush. segmentFile close! ! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/3/2003 17:54'! openOn: aFile "Open the receiver." segmentFile _ aFile. segmentFile binary. segmentFile size > 0 ifTrue: [self readHeaderInfo. "If file exists, then read the parameters"] ifFalse: [self segmentSize: 20000 maxSize: 34000000. "Otherwise write default values"]! ! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/3/2003 10:13'! openReadOnly segmentFile openReadOnly! ! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/5/2003 22:41'! readHeaderInfo | valid a b | segmentFile position: 0. segmentSize _ segmentFile nextNumber: 4. nSegments _ segmentFile nextNumber: 4. endOfFile _ segmentFile nextNumber: 4. segmentFile size < (nSegments+1 + 3 * 4) ifTrue: "Check for reasonable segment info" [self error: 'This file is not in valid compressed source format']. segmentTable _ (1 to: nSegments+1) collect: [:x | segmentFile nextNumber: 4]. segmentTable first ~= self firstSegmentLoc ifTrue: [self error: 'This file is not in valid compressed source format']. valid _ true. 1 to: nSegments do: "Check that segment offsets are ascending" [:i | a _ segmentTable at: i. b _ segmentTable at: i+1. (a = 0 and: [b ~= 0]) ifTrue: [valid _ false]. (a ~= 0 and: [b ~= 0]) ifTrue: [b <= a ifTrue: [valid _ false]]]. valid ifFalse: [self error: 'This file is not in valid compressed source format']. dirty _ false. self position: 0.! ! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/3/2003 10:09'! readOnlyCopy ^ self class on: segmentFile readOnlyCopy! ! !CompressedSourceStream methodsFor: 'open/close' stamp: 'di 11/2/2003 23:07'! test "FileDirectory default deleteFileNamed: 'test.stc'. (CompressedSourceStream on: (FileStream newFileNamed: 'test.stc')) fileOutChanges" "FileDirectory default deleteFileNamed: 'test2.stc'. ((CompressedSourceStream on: (FileStream newFileNamed: 'test2.stc')) segmentSize: 100 nSegments: 1000) fileOutChanges" "FileDirectory default deleteFileNamed: 'test3.st'. (FileStream newFileNamed: 'test3.st') fileOutChanges" "(CompressedSourceStream on: (FileStream oldFileNamed: 'test.stc')) contentsOfEntireFile" ! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/3/2003 00:41'! atEnd position >= readLimit ifFalse: [^ false]. "more in segment" ^ self position >= endOfFile "more in file"! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/1/2003 22:48'! contentsOfEntireFile | contents | self position: 0. contents _ self next: self size. self close. ^ contents! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/1/2003 19:50'! flush dirty ifTrue: ["Write buffer, compressed, to file, and also write the segment offset and eof" self writeSegment].! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/20/2003 12:03'! next position >= readLimit ifTrue: [^ (self next: 1) at: 1] ifFalse: [^ collection at: (position _ position + 1)]! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/2/2003 11:45'! next: n | str | n <= (readLimit - position) ifTrue: ["All characters are available in buffer" str _ collection copyFrom: position + 1 to: position + n. position _ position + n. ^ str]. "Read limit could be segment boundary or real end of file" (readLimit + self segmentOffset) = endOfFile ifTrue: ["Real end of file -- just return what's available" ^ self next: readLimit - position]. "Read rest of segment. Then (after positioning) read what remains" str _ self next: readLimit - position. self position: self position. ^ str , (self next: n - str size) ! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/2/2003 11:27'! nextPut: char "Slow, but we don't often write, and then not a lot" self nextPutAll: char asString. ^ char! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/2/2003 12:06'! nextPutAll: str | n nInSeg | n _ str size. n <= (writeLimit - position) ifTrue: ["All characters fit in buffer" collection replaceFrom: position + 1 to: position + n with: str. dirty _ true. position _ position + n. readLimit _ readLimit max: position. endOfFile _ endOfFile max: self position. ^ str]. "Write what fits in segment. Then (after positioning) write what remains" nInSeg _ writeLimit - position. nInSeg = 0 ifTrue: [self position: self position. self nextPutAll: str] ifFalse: [self nextPutAll: (str first: nInSeg). self position: self position. self nextPutAll: (str allButFirst: nInSeg)] ! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/2/2003 09:27'! position ^ position + self segmentOffset! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/2/2003 22:24'! position: newPosition | compressedBuffer newSegmentIndex | newPosition > endOfFile ifTrue: [self error: 'Attempt to position beyond the end of file']. newSegmentIndex _ (newPosition // segmentSize) + 1. newSegmentIndex ~= segmentIndex ifTrue: [self flush. segmentIndex _ newSegmentIndex. newSegmentIndex > nSegments ifTrue: [self error: 'file size limit exceeded']. segmentFile position: (segmentTable at: segmentIndex). (segmentTable at: segmentIndex+1) = 0 ifTrue: [newPosition ~= endOfFile ifTrue: [self error: 'Internal logic error']. collection size = segmentSize ifFalse: [self error: 'Internal logic error']. "just leave garbage beyond end of file"] ifFalse: [compressedBuffer _ segmentFile next: ((segmentTable at: segmentIndex+1) - (segmentTable at: segmentIndex)). collection _ (GZipReadStream on: compressedBuffer) upToEnd asString]. readLimit _ collection size min: endOfFile - self segmentOffset]. position _ newPosition \\ segmentSize. ! ! !CompressedSourceStream methodsFor: 'access' stamp: 'di 11/1/2003 11:41'! size ^ endOfFile ifNil: [0]! ! !CompressedSourceStream methodsFor: 'private' stamp: 'di 11/20/2003 12:45'! fileID "Only needed for OSProcess stuff" ^ segmentFile fileID ! ! !CompressedSourceStream methodsFor: 'private' stamp: 'di 11/2/2003 09:35'! firstSegmentLoc "First segment follows 3 header words and segment table" ^ (3 + nSegments+1) * 4! ! !CompressedSourceStream methodsFor: 'private' stamp: 'di 11/2/2003 09:24'! segmentOffset ^ segmentIndex - 1 * segmentSize! ! !CompressedSourceStream methodsFor: 'private' stamp: 'di 11/5/2003 22:41'! segmentSize: segSize maxSize: maxSize "Note that this method can be called after the initial open, provided that no writing has yet taken place. This is how to override the default segmentation." self size = 0 ifFalse: [self error: 'Cannot set parameters after the first write']. segmentFile position: 0. segmentFile nextNumber: 4 put: (segmentSize _ segSize). segmentFile nextNumber: 4 put: (nSegments _ maxSize // segSize + 2). segmentFile nextNumber: 4 put: (endOfFile _ 0). segmentTable _ Array new: nSegments+1 withAll: 0. segmentTable at: 1 put: self firstSegmentLoc. "Loc of first segment, always." segmentTable do: [:i | segmentFile nextNumber: 4 put: i]. segmentIndex _ 1. collection _ String new: segmentSize. writeLimit _ segmentSize. readLimit _ 0. position _ 0. endOfFile _ 0. self writeSegment. ! ! !CompressedSourceStream methodsFor: 'private' stamp: 'di 11/5/2003 22:42'! writeSegment "The current segment must be the last in the file." | compressedSegment | segmentFile position: (segmentTable at: segmentIndex). compressedSegment _ ByteArray streamContents: [:strm | (GZipWriteStream on: strm) nextPutAll: collection asByteArray; close]. segmentFile nextPutAll: compressedSegment. segmentTable at: segmentIndex + 1 put: segmentFile position. segmentFile position: 2 * 4. segmentFile nextNumber: 4 put: endOfFile. segmentFile position: (segmentIndex + 3) * 4. segmentFile nextNumber: 4 put: (segmentTable at: segmentIndex + 1). dirty _ false! ! !CompressedSourceStream class methodsFor: 'as yet unclassified' stamp: 'di 11/1/2003 22:58'! on: aFile ^ self basicNew openOn: aFile! ! !FileDirectory class methodsFor: 'system start up' stamp: 'md 1/5/2004 18:05'! openSources: fullSourcesName forImage: imageName "Initialize the default directory to the image directory and open the sources and changes files, if possible. Look for the changes file in image directory. Look for the system sources (or an alias to it) first in the VM directory, then in the image directory. Open the changes and sources files and install them in SourceFiles." | sources fd sourcesName | (fullSourcesName endsWith: 'sources') ifTrue: ["Look first for a sources file in compressed format." sources _ self openSources: (fullSourcesName allButLast: 7) , 'stc' forImage: imageName. sources ifNotNil: [^ CompressedSourceStream on: sources]]. sourcesName _ FileDirectory localNameFor: fullSourcesName. "look for the sources file or an alias to it in the VM's directory" fd _ FileDirectory on: SmalltalkImage current vmPath. (fd fileExists: sourcesName) ifTrue: [sources _ fd readOnlyFileNamed: sourcesName]. sources ifNotNil: [^ sources]. "look for the sources file or an alias to it in the image directory" fd _ FileDirectory on: (FileDirectory dirPathFor: imageName). (fd fileExists: sourcesName) ifTrue: [sources _ fd readOnlyFileNamed: sourcesName]. sources ifNotNil: [^ sources]. "look for the sources in the current directory" fd _ DefaultDirectory. (fd fileExists: sourcesName) ifTrue: [sources _ fd readOnlyFileNamed: sourcesName]. ^sources ! ! !SystemDictionary methodsFor: 'housekeeping' stamp: 'md 1/5/2004 18:05'! compressSources "Copy all the source file to a compressed file. Usually preceded by Smalltalk condenseSources." "The new file will be created in the default directory, and the code in openSources will try to open it if it is there, otherwise it will look for normal sources." "Smalltalk compressSources" | f cfName cf | f _ SourceFiles first. (SmalltalkImage current sourcesName endsWith: 'sources') ifTrue: [cfName _ (SmalltalkImage current sourcesName allButLast: 7) , 'stc'] ifFalse: [self error: 'Hey, I thought the sources name ended with ''.sources''.']. cf _ (CompressedSourceStream on: (FileStream newFileNamed: cfName)) segmentSize: 20000 maxSize: f size. "Copy the sources" 'Compressing Sources File...' displayProgressAt: Sensor cursorPoint from: 0 to: f size during: [:bar | f position: 0. [f atEnd] whileFalse: [cf nextPutAll: (f next: 20000). bar value: f position]]. cf close. self setMacFileInfoOn: cfName. self inform: 'You now have a compressed sources file!! Squeak will use it the next time you start.'! ! !CompressedSourceStream reorganize! ('open/close' binary close openOn: openReadOnly readHeaderInfo readOnlyCopy test) ('access' atEnd contentsOfEntireFile flush next next: nextPut: nextPutAll: position position: size) ('private' fileID firstSegmentLoc segmentOffset segmentSize:maxSize: writeSegment) !