Although Relax NG is an alternative to the XSchema standard, it does use an important part of XSchema: the predefined set of standard datatypes.
This defines a plethora of types, including common ones
like xsd:string,
xsd:int,
xsd:unsignedInt, and
xsd:double (which is a long floating
point number).
You can supply additional parameters to these types by following the type name with a parameter set in this syntax:
xsd:datatype{pname=pvalue... }
The pname is the parameter name,
and the pvalue the corresponding
value.
For example, this pattern would match any string whose length is between 7 and 25 characters:
xsd:string { minLength="7" maxLength="25" }
Here is a list of the basic datatypes in the
xsd: namespace. Many of the basic
types are derived from other
types. For example, the decimal
type includes numbers with decimal points, such as
"3.14". The
integer type is derived from
decimal, including only those
values without decimal points, such as "42".
To see what parameters are allowed for a given type, see
Section 6.2, “Parameters to the xsd: types”.
anyURI
The data must conform to the syntax of a Uniform
Resource Identifier
(URI), as defined in RFC
2396 as amended by RFC
2732. Example:
"http://www.nmt.edu/tcc/" is
the URI for the New Mexico Tech Computer Center's
index page.
base64Binary
Represents a sequence of binary octets (bytes) encoded according to RFC 2045, the standard defining the MIME types (look under “6.8 Base64 Content-Transfer-Encoding”).
boolean
A Boolean true or false value. Representations of
true are "true" and "1"; false is denoted as "false" or
"0".
byte
A signed 8-bit integer in the range [-128, 127].
Derived from the short
datatype.
date
Represents a specific date. The syntax is the same
as that for the date part of dateTime, with an optional time
zone indicator. Example:
"1889-09-24".
dateTime
Represents a specific instant of time. It has the
form YYYY-MM-DDThh:mm:ss folowed by an
optional time-zone suffix.
YYYY is the year,
MM is the month number,
DD is the day number,
hh the hour in 24-hour
format, mm the minute,
and ss the second (a
decimal and fraction are allowed for the seconds
part).
The optional zone suffix is either
"Z" for Universal Coordinated
Time (UTC), or a time offset of the form
"±hh:mm", giving the difference
between UTC and local time in hours and minutes.
Example:
"2004-10-31T21:40:35.5-07:00"
is a time on Halloween 2004 in Mountain Standard
time. The equivalent UTC would be
"2004-11-01T04:40:35.5Z".
decimal
Any base-10 fixed-point number. There must be at
least one digit to the left of the decimal point, and
a leading "+" or
"-" sign is allowed.
Examples: "42",
"-3.14159",
"+0.004".
double
A 64-bit floating-point decimal number as specified
in the IEEE 754-1985 standard. The external form
is the same as the float
datatype.
duration
Represents a duration of time, as a composite of years, months, days, hours, minutes, and seconds. The syntax of a duration value has these parts:
If the duration is negative, it starts with
"-".
A capital "P" is always
included.
If the duration has a years part, the number of
years is next, followed by a capital
"Y".
If there is a months part, it is next, followed
by capital "M".
If there is a days part, it is next, followed
by capital "D".
If there are any hours, minutes, or seconds in
the duration, a capital
"T" comes next;
otherwise the duration ends here.
If there is an hours part, it is next, followed
by capital "H".
If there is a minutes part, it is next, followed
by capital "M".
If there is a seconds part, it is next, followed
by capital "S". You can
use a decimal point and fraction to specify
part of a second.
Missing parts are assumed to be zero. Examples:
"P1347Y" is a duration of 1347
Gregorian years; "P1Y2MT2H5.6S" is a
duration of one year, two months, two hours, and
5.6 seconds.
float
A 32-bit floating-point decimal number as specified
in the IEEE 754-1985 standard. Allowable values are
the same as in the decimal
type, optionally followed by an exponent, or one of
the special values "INF"
(positive infinity), "-INF"
(negative infinity), or "NaN"
(not a number).
The exponent starts with either
"e" or
"E", optionally followed by a
sign, and one or more digits.
Example: "6.0235e-23".
gDay
A day of the month in the Gregorian calendar. The
syntax is
"---DD" where
DD is the day of the
month. Example: the 27th of each month would be
represented as "---27".
gMonth
A month number in the Gregorian calendar. The
syntax is
"--MM--", where
MM is the month number.
For example, "--06--"
represents the month of June.
gMonthDay
A Gregorian month and day as
"--MM-DD". Example:
"--07-04" is the Fourth of
July.
gYear
A Gregorian year, specified as YYYY. Example: "1889".
gYearMonth
A Gregorian year and month. The syntax is
YYYY-MM. Example:
"1995-08" represents August
1995.
hexBinary
Represents a sequence of octets (bytes), each given
as two hexadecimal digits. Example:
"0047dedbef" is five octets.
ID
A unique identifier as in the
ID attribute type from the XML
standard.
Derived from the NCName
datatype.
IDREF, IDREFS
An IDREF value is a
reference to a unique identifier as defined under
attribute types in the XML
standard. An IDREFS
value is a space-separated sequence of such references.
Derived from the NCName
datatype.
int
Represents a 32-bit signed integer in the range
[-2,147,483,648, 2,147,483,647]. Derived from the
long
datatype.
integer
Represents a signed integer. Values may begin with
an optional "+" or
"-" sign. Derived from the
decimal
datatype.
language
One of the standardized language codes defined in
RFC
1766. Example: "fj"
for Fijian. Derived from the token
type.
long
A signed, extended-precision integer; at least 18
digits are guaranteed. Derived from the integer
datatype.
Name
A name as defined in the XML
standard. The first character can be a
letter or underbar “_”,
and the remaining characters may be letters,
underbars, digits, hyphen “-”, period “.”, or
colon “:”.
Derived from the token
datatype.
NCName
The local part of a qualified name. See the
NCName definition in the
document Namespaces in XML.
Derived from the name
datatype.
negativeInteger
Represents an integer less than zero. Derived from
the nonPositiveInteger
datatype.
NMTOKEN, NMTOKENS
Any sequence of name characters, defined in the XML
standard: letters, underbars
“_”,
hyphen “-”,
period “.”, or
colon “:”.
A NMTOKENS data value is a
space-separated sequence of
NMTOKEN values.
Derived from the NMTOKEN
datatype.
nonNegativeInteger
An integer greater than or equal to zero. Derived
from the integer
datatype.
nonPositiveInteger
An integer less than or equal to zero. Derived
from the integer
datatype.
normalizedString
This datatype describes a “normalized” string, meaning that it cannot include newline (LF), return (CR), or tab (HT) characters.
Derived from the string
type.
positiveInteger
An extended-precision integer greater than zero.
Derived from the nonNegativeInteger
datatype.
QName
An XML qualified name, such as
"xsl:stylesheet".
short
A 16-bit signed integer in the range [-32,768,
32,767]. Derived from the int
datatype.
string
Any sequence of zero or more characters.
time
A moment of time that repeats every day. The
syntax is the same as that for dateTime,
omitting everything up to and including the
separator "T". Examples:
"00:00:00" is midnight, and
"13:04:00" is an hour and
four minutes after noon.
token
The values of this type represent tokenized
strings. They may not contain newline (LF) or tab
(HT) characters. They may not start or end with
whitespace. The only occurrences of whitespace
allowed inside the string are single spaces, never
multiple spaces together. Derived from normalizedString.
unsignedByte
An unsigned 16-bit integer in the range [0,
255]. Derived from the unsignedShort
datatype.
unsignedInt
An unsigned 32-bit integer in the range [0,
4,294,967,295]. Derived from the unsignedLong
datatype.
unsignedLong
An unsigned, extended-precision integer. Derived
from the nonNegativeInteger
datatype.
unsignedShort
An unsigned 16-bit integer in the range [0,
65,535]. Derived from the unsignedInt
datatype.
You can apply parameters to your datatypes to further constrain what values are legal. Parameters come in four groups:
The length group.
The length parameter specifies
the exact length of valid values, no more and no less.
The minLength specifies the
minimum length, and the
maxLength parameter specifies
the maximum length.
For example, a value declared as
xsd:string { length="4" }can have any characters in it, so long as there are exactly four of them. A value declared this way
xsd:integer { minLength="5" maxLength="9" }must be an integer with between 5 and 9 characters. To specify that a string cannot exceed 20 characters:
xsd:string { maxLength="20" }
The range group. These attributes restrict the
values that a number can have. There are four:
minInclusive="N"
means the number must be greater than or equal to
N, and
minExclusive="N" means the
number must be greater than
N. Similarly,
maxInclusive="N" restricts
values to those less than or equal to
N, and
maxExclusive="N" means values must be
strictly less than
N.
For example, to specify a decimal float that is greater than 0.0 but less than or equal to 10.0:
xsd:decimal { minExclusive="0.0" maxInclusive="10.0" }
The digits group. The
totalDigits="N" attribute
limits the total number of digits in a number to
N. The
fractionDigits="N" attribute limits the
total number of digits after the decimal point to
N.
Here's an example. Suppose you want to restrict the values of a number to have no more than three digits after the decimal. You could do this with:
xsd:decimal { fractionDigits="3" }
The pattern="P" attribute
allows you to restrict values using a regular
expression syntax. See Section 6.3, “
Regular expression syntax for xsd:
”.
Here is a table showing the attribute groups that are allowed for each of the basic types.
| Type | pattern |
length,
minLength,
maxLength
|
minInclusive,
minExclusive,
maxExclusive maxInclusive
|
fractionDigits,
totalDigits
|
|---|---|---|---|---|
anyURI | x | x | ||
base64Binary | x | x | ||
boolean | x | |||
byte | x | x | x | |
date | x | x | ||
dateTime | x | x | ||
decimal | x | x | x | |
double | x | x | ||
duration | x | x | ||
float | x | x | ||
gDay | x | x | ||
gMonth | x | x | ||
gMonthDay | x | x | ||
gYear | x | x | ||
gYearMonth | x | x | ||
hexBinary | x | x | ||
ID | x | x | ||
IDREF | x | x | ||
IDREFS | x | x | ||
int | x | x | x | |
integer | x | x | x | |
language | x | x | ||
long | x | x | x | |
Name | x | x | ||
NCName | x | x | ||
negativeInteger | x | x | x | |
NMTOKEN | x | x | ||
NMTOKENS | x | x | ||
nonNegativeInteger | x | x | x | |
nonPositiveInteger | x | x | x | |
normalizedString | x | x | ||
positiveInteger | x | x | x | |
QName | x | x | ||
short | x | x | x | |
string | x | x | ||
time | x | x | ||
token | x | x | ||
unsignedByte | x | x | x | |
unsignedInt | x | x | x | |
unsignedLong | x | x | x | |
unsignedShort | x | x | x |
The regular expression syntax is fairly similar to that
of Perl. Refer to the Appendix F
of the XML Schema Datatypes specification for a
complete definition of the regular expressions allowed in
the pattern parameter of any of
the xsd: datatypes.
If you are working with Unicode, you should read the full specification, as there are a number of advanced features, not discussed here, that are most useful in Unicode work.
Here is a summary of most of the commonly used features.
| Either pattern
or pattern
.
|
| Pattern followed by pattern
. |
| Matches pattern or nothing at all.
You could think of it as saying “ occurs
optionally here.” |
|
Matches zero or more occurrences of .
|
|
Matches one or more occurrences of .
|
|
Matches exactly occurrences of
pattern .
|
|
Matches at least occurrences, but
no more than occurrences, of
pattern .
|
|
Matches or more occurrences of pattern
.
|
[
|
Matches any single character from inside the
square brackets. For example, the
pattern “
You can specify ranges of characters as
“ |
[^
|
Matches any single character
except those enumerated
inside the square brackets. For example, the
regular expression “xsd:string
{ pattern='[^abc]' }” matches
any single character except
a, b, or c.
|
(
|
Parentheses may be used for grouping. For
example, pattern “(ab)+” matches “ab”, “abab”, “ababab”, and so on.
|
\r
| Matches the carriage return (ASCII CR) character. |
\n
| Matches the newline (ASCII LF) character. |
.
| Matches any character except newline or carriage return. |
\t
| Matches the tab (ASCII HT) character. |
\
|
Any of the following characters must be escaped
by preceding them with a backslash:
“\ | . - ^ ? * + { } ( ) [
]”. For example,
“pattern='\[\*\]'” matches the string “[*]”.
|
\s
| Matches a whitespace character: space, tab, newline, or carriage return. |
\S
| Matches any character except a whitespace character. |
\i
|
Matches a name start character:
a letter, “_”, or “:”.
|
\I
| Matches any character except a name start character. |
\c
| Matches a name character, that is, a name start character or digit. |
\C
| Matches any character except a name character. |
\d
|
Matches a decimal digit (same as
“[0-9]”).
|
\D
| Matches any character except a decimal digit. |
Here's an example of a pattern for a U. S. Postal Service zip code:
xsd:string { pattern='[0-9]{5}(-[0-9]{4})?' }That is, five digits, optionally followed by a hyphen and four more digits.