\SafeHTML

SafeHTML Parser

This parser strips down all potentially dangerous content within HTML:

  • opening tag without its closing tag
  • closing tag without its opening tag
  • any of these tags: "base", "basefont", "head", "html", "body", "applet", "object", "iframe", "frame", "frameset", "script", "layer", "ilayer", "embed", "bgsound", "link", "meta", "style", "title", "blink", "xml" etc.
  • any of these attributes: on*, data*, dynsrc
  • javascript:/vbscript:/about: etc. protocols
  • expression/behavior etc. in styles
  • any other active content

It also tries to convert code to XHTML valid, but htmltidy is far better solution for this task.

Example:

$parser =& new SafeHTML();
$result = $parser->parse($doc);

Summary

Methods
Properties
Constants
SafeHTML()
_writeAttrs()
_openHandler()
_closeHandler()
_closeTag()
_dataHandler()
_escapeHandler()
getXHTML()
clear()
parse()
repackUTF7()
repackUTF7Callback()
repackUTF7Back()
$_xhtml
$_counter
$_stack
$_dcCounter
$_dcStack
$_listScope
$_liStack
$_protoRegexps
$_cssRegexps
$singleTags
$deleteTags
$deleteTagsContent
$protocolFiltering
$blackProtocols
$whiteProtocols
$protocolAttributes
$cssKeywords
$noClose
$closeParagraph
$tableTags
$listTags
$attributes
$attributesNS
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$_xhtml

$_xhtml : string

Storage for resulting HTML output

Type

string

$_counter

$_counter : array

Array of counters for each tag

Type

array

$_stack

$_stack : array

Stack of unclosed tags

Type

array

$_dcCounter

$_dcCounter : array

Array of counters for tags that must be deleted with all content

Type

array

$_dcStack

$_dcStack : array

Stack of unclosed tags that must be deleted with all content

Type

array

$_listScope

$_listScope : int

Stores level of list (ol/ul) nesting

Type

int

$_liStack

$_liStack : array

Stack of unclosed list tags

Type

array

$_protoRegexps

$_protoRegexps : array

Array of prepared regular expressions for protocols (schemas) matching

Type

array

$_cssRegexps

$_cssRegexps : array

Array of prepared regular expressions for CSS matching

Type

array

$singleTags

$singleTags : array

List of single tags ("<tag />")

Type

array

$deleteTags

$deleteTags : array

List of dangerous tags (such tags will be deleted)

Type

array

$deleteTagsContent

$deleteTagsContent : array

List of dangerous tags (such tags will be deleted, and all content inside this tags will be also removed)

Type

array

$protocolFiltering

$protocolFiltering : string

Type of protocols filtering ('white' or 'black')

Type

string

$blackProtocols

$blackProtocols : array

List of "dangerous" protocols (used for blacklist-filtering)

Type

array

$whiteProtocols

$whiteProtocols : array

List of "safe" protocols (used for whitelist-filtering)

Type

array

$protocolAttributes

$protocolAttributes : array

List of attributes that can contain protocols

Type

array

$cssKeywords

$cssKeywords : array

List of dangerous CSS keywords

Whole style="" attribute will be removed, if parser will find one of these keywords

Type

array

$noClose

$noClose : array

List of tags that can have no "closing tag"

Type

array

$closeParagraph

$closeParagraph : array

List of block-level tags that terminates paragraph

Paragraph will be closed when this tags opened

Type

array

$tableTags

$tableTags : array

List of table tags, all table tags outside a table will be removed

Type

array

$listTags

$listTags : array

List of list tags

Type

array

$attributes

$attributes : array

List of dangerous attributes

Type

array

$attributesNS

$attributesNS : array

List of allowed "namespaced" attributes

Type

array

Methods

SafeHTML()

SafeHTML()

Constructs class

_writeAttrs()

_writeAttrs(array $attrs) : boolean

Handles the writing of attributes - called from $this->_openHandler()

Parameters

array $attrs

array of attributes $name => $value

Returns

boolean

_openHandler()

_openHandler(object $parser, string $name, array $attrs) : boolean

Opening tag handler - called from HTMLSax

Parameters

object $parser

HTML Parser

string $name

tag name

array $attrs

tag attributes

Returns

boolean

_closeHandler()

_closeHandler( $parser, string $name) : boolean

Closing tag handler - called from HTMLSax

Parameters

$parser
string $name

tag name

Returns

boolean

_closeTag()

_closeTag(string $tag) : boolean

Closes tag

Parameters

string $tag

tag name

Returns

boolean

_dataHandler()

_dataHandler(object $parser, string $data) : boolean

Character data handler - called from HTMLSax

Parameters

object $parser

HTML parser

string $data

textual data

Returns

boolean

_escapeHandler()

_escapeHandler(object $parser, string $data) : boolean

Escape handler - called from HTMLSax

Parameters

object $parser

HTML parser

string $data

comments or other type of data

Returns

boolean

getXHTML()

getXHTML() : string

Returns the XHTML document

Returns

string —

Processed (X)HTML document

clear()

clear() : boolean

Clears current document data

Returns

boolean

parse()

parse(string $doc) : string

Main parsing fuction

Parameters

string $doc

HTML document for processing

Returns

string —

Processed (X)HTML document

repackUTF7()

repackUTF7(string $str) : string

UTF-7 decoding fuction

Parameters

string $str

HTML document for recode ASCII part of UTF-7 back to ASCII

Returns

string —

Decoded document

repackUTF7Callback()

repackUTF7Callback(string $str) : string

Additional UTF-7 decoding fuction

Parameters

string $str

String for recode ASCII part of UTF-7 back to ASCII

Returns

string —

Recoded string

repackUTF7Back()

repackUTF7Back(string $str) : string

Additional UTF-7 encoding fuction

Parameters

string $str

String for recode ASCII part of UTF-7 back to ASCII

Returns

string —

Recoded string