|
|
| | | | Property expression parsing | | | | |
|
The following discussion of the experiments with alternate
property expression parsing is very much a work in progress,
and subject to sudden changes.
|
The parsing of property value expressions is handled by two
closely related classes: PropertyTokenizer and its
subclass, PropertyParser .
PropertyTokenizer , as the name suggests, handles
the tokenizing of the expression, handing tokens
back to its subclass,
PropertyParser . PropertyParser , in
turn, returns a PropertyValueList , a list of
PropertyValue s.
The tokenizer and parser rely in turn on the datatype
definition from the org.apache.fop.datatypes
package and the datatype static final int
constants from PropertyConsts .
The data types currently defined in
org.apache.fop.datatypes include:
Numbers and lengths
|
Numeric
|
The fundamental numeric data type. Numerics of
various types are constructed by the classes listed
below.
|
|
Constructor classes for Numeric
|
| Angle
|
In degrees(deg), gradients(grad) or
radians(rad)
|
| Ems
|
Relative length in ems
|
| Frequency
|
In hertz(Hz) or kilohertz(kHz)
|
| IntegerType
|
|
| Length
|
In centimetres(cm), millimetres(mm),
inches(in), points(pt), picas(pc) or pixels(px)
|
| Percentage
|
|
| Time
|
In seconds(s) or milliseconds(ms)
|
Strings
|
StringType
|
Base class for data types which result in a String.
|
| Literal
|
A subclass of StringType for literals which
exceed the constraints of an NCName.
|
| MimeType
|
A subclass of StringType for literals which
represent a mime type.
|
| UriType
|
A subclass of StringType for literals which
represent a URI, as specified by the argument to
url().
|
| NCName
|
A subclass of StringType for literals which
meet the constraints of an NCName.
|
|
| Country
|
An RFC 3066/ISO 3166 country code.
|
|
| Language
|
An RFC 3066/ISO 639 language code.
|
|
| Script
|
An ISO 15924 script code.
|
Enumerated types
|
EnumType
|
An integer representing one of the tokens in a set of
enumeration values.
|
| MappedEnumType
|
A subclass of EnumType. Maintains a
String with the value to which the associated
"raw" enumeration token maps. E.g., the
font-size enumeration value "medium" maps to
the String "12pt".
|
Colors
|
ColorType
|
Maintains a four-element array of float, derived from
the name of a standard colour, the name returned by a
call to system-color(), or an RGB
specification.
|
Fonts
|
FontFamilySet
|
Maintains an array of Strings containing a
prioritized list of possibly generic font family names.
|
Pseudo-types
|
A variety of pseudo-types have been defined as
convenience types for frequently appearing enumeration
token values, or for other special purposes.
|
Inherit
|
For values of inherit.
|
Auto
|
For values of auto.
|
None
|
For values of none.
|
Bool
|
For values of true/false.
|
FromNearestSpecified
|
Created to ensure that, when associated with
a shorthand, the from-nearest-specified-value()
core function is the sole component of the expression.
|
FromParent
|
Created to ensure that, when associated with
a shorthand, the from-parent()
core function is the sole component of the expression.
|
|
The tokenizer returns one of the following token
values:
| | | |
static final int
EOF = 0
,NCNAME = 1
,MULTIPLY = 2
,LPAR = 3
,RPAR = 4
,LITERAL = 5
,FUNCTION_LPAR = 6
,PLUS = 7
,MINUS = 8
,MOD = 9
,DIV = 10
,COMMA = 11
,PERCENT = 12
,COLORSPEC = 13
,FLOAT = 14
,INTEGER = 15
,ABSOLUTE_LENGTH = 16
,RELATIVE_LENGTH = 17
,TIME = 18
,FREQ = 19
,ANGLE = 20
,INHERIT = 21
,AUTO = 22
,NONE = 23
,BOOL = 24
,URI = 25
,MIMETYPE = 26
// NO_UNIT is a transient token for internal use only. It is
// never set as the end result of parsing a token.
,NO_UNIT = 27
;
| | | | |
Most of these tokens are self-explanatory, but a few need
further comment.
AUTO -
Because of its frequency of occurrence, and the fact that
it is always the initial value for any property
which supports it, AUTO has been promoted into a
pseudo-type with its on datatype class. Therefore, it is
also reported as a token.
NONE -
Similarly to AUTO, NONE has been promoted to a pseudo-type
because of its frequency.
BOOL -
There is a de facto boolean type buried in the
enumeration types for many of the properties. It had been
specified as a type in its own right in this code.
MIMETYPE -
The property content-type introduces this
complication. It can have two values of the form
content-type:mime-type
namespace-prefix:prefix
(e.g. content-type="namespace-prefix:svg" ). The
experimental code reduces these options to the payload
in each case: an NCName in the case of a
namespace prefix, and a MIMETYPE in the case of a
content-type specification. NCName s cannot
contain a "/".
|
The parser retuns a PropertyValueList ,
necessary because of the possibility that a list of
PropertyValue elements may be returned from the
expressions of soem properties.
PropertyValueList s may contain
PropertyValue s or other
PropertyValueList s. This latter provision is
necessitated for the peculiar case of of
text-shadow, which may contain whitespace separated
sublists of either two or three elements, separated from one
another by commas. To accommodate this peculiarity, comma
separated elements are added to the top-level list, while
whitespace separated values are always collected into
sublists to be added to the top-level list.
Other special cases include the processing of the core
functions from-parent() and
from-nearest-specified-value() when these
function calls are assigned to a shorthand property, or used
with a shorthand property name as an argument. In these
cases, the function call must be the sole component of the
expression. The pseudo-element classes
FromParent and
FromNearestSpecified are generated in these
circumstances so that an exception will be thrown if they
are involved in expression evaluation with other
components. (See Rec. Section 5.10.4 Property Value
Functions.)
The experimental code is a simple extension of the existing
parser code, which itself borrowed heavily from James
Clark's XT processor.
|
|
|
|