'From Squeak3.8 of ''5 May 2005'' [latest update: #6665] on 9 March 2010 at 8:52:23 am'! Object subclass: #HtmlParser instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'Network-HTML-Parser'! !HtmlParser commentStamp: '' prior: 0! parses a stream of HtmlToken's into an HtmlDocument. All token become an entity of some sort in the resulting document; some things are left only as comments, though.! "-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "! HtmlParser class instanceVariableNames: ''! !HtmlParser class methodsFor: 'example' stamp: 'ls 6/27/1998 15:32'! example1 "HtmlParser example1" | input | input _ ReadStream on: ' The Gate of Chaos

Chaos

Into the Maelstrom

Direction is useless in the ever-changing Maelstrom. However, if you wander with purpose, you might be able to find....

Paths of Retreat

Several commonly travelled ways have left paths leading away from the maelstrom, too:


usage stats for this server
[blue ribbon campaign] [NOTscape] [Best Viewed With Any Browser] '. ^HtmlParser parse: input! ! !HtmlParser class methodsFor: 'parsing' stamp: 'bolot 12/1/1999 02:46'! parseTokens: tokenStream | entityStack document head token matchesAnything entity body | entityStack _ OrderedCollection new. "set up initial stack" document _ HtmlDocument new. entityStack add: document. head _ HtmlHead new. document addEntity: head. entityStack add: head. "go through the tokens, one by one" [ token _ tokenStream next. token = nil ] whileFalse: [ (token isTag and: [ token isNegated ]) ifTrue: [ "a negated token" (token name ~= 'html' and: [ token name ~= 'body' ]) ifTrue: [ "see if it matches anything in the stack" matchesAnything _ (entityStack detect: [ :e | e tagName = token name ] ifNone: [ nil ]) isNil not. matchesAnything ifTrue: [ "pop the stack until we find the right one" [ entityStack last tagName ~= token name ] whileTrue: [ entityStack removeLast ]. entityStack removeLast. ]. ] ] ifFalse: [ "not a negated token. it makes its own entity" token isComment ifTrue: [ entity _ HtmlCommentEntity new initializeWithText: token source. ]. token isText ifTrue: [ entity _ HtmlTextEntity new text: token text. (((entityStack last shouldContain: entity) not) and: [ token source isAllSeparators ]) ifTrue: [ "blank text may never cause the stack to back up" entity _ HtmlCommentEntity new initializeWithText: token source ]. ]. token isTag ifTrue: [ entity _ token entityFor. entity = nil ifTrue: [ entity _ HtmlCommentEntity new initializeWithText: token source ] ]. (token name = 'body') ifTrue: [body ifNotNil: [document removeEntity: body]. body _ HtmlBody new initialize: token. document addEntity: body. entityStack add: body]. entity = nil ifTrue: [ self error: 'could not deal with this token' ]. entity isComment ifTrue: [ "just stick it anywhere" entityStack last addEntity: entity ] ifFalse: [ "only put it in something that is valid" [ entityStack last mayContain: entity ] whileFalse: [ entityStack removeLast ]. "if we have left the head, create a body" (entityStack size < 2 and: [body isNil]) ifTrue: [ body _ HtmlBody new. document addEntity: body. entityStack add: body ]. "add the entity" entityStack last addEntity: entity. entityStack addLast: entity. ]. ]]. body == nil ifTrue: [ "add an empty body" body _ HtmlBody new. document addEntity: body ]. document parsingFinished. ^document! ! !HtmlParser class methodsFor: 'parsing' stamp: 'ls 7/28/1998 02:02'! parse: aStream ^self parseTokens: (HtmlTokenizer on: aStream) ! !