Next / Previous / Contents / TCC Help System / NM Tech homepage

6. The xsd: datatypes

Although Relax NG is an alternative to the XSchema standard, it does use an important part of XSchema: the predefined set of standard datatypes.

This defines a plethora of types, including common ones like xsd:string, xsd:int, xsd:unsignedInt, and xsd:double (which is a long floating point number).

You can supply additional parameters to these types by following the type name with a parameter set in this syntax:

    xsd:datatype { pname=pvalue ... }

The pname is the parameter name, and the pvalue the corresponding value.

For example, this pattern would match any string whose length is between 7 and 25 characters:

    xsd:string { minLength="7" maxLength="25" }

6.1. The basic xsd: types

Here is a list of the basic datatypes in the xsd: namespace. Many of the basic types are derived from other types. For example, the decimal type includes numbers with decimal points, such as "3.14". The integer type is derived from decimal, including only those values without decimal points, such as "42". To see what parameters are allowed for a given type, see Section 6.2, “Parameters to the xsd: types”.

anyURI

The data must conform to the syntax of a Uniform Resource Identifier (URI), as defined in RFC 2396 as amended by RFC 2732. Example: "http://www.nmt.edu/tcc/" is the URI for the New Mexico Tech Computer Center's index page.

base64Binary

Represents a sequence of binary octets (bytes) encoded according to RFC 2045, the standard defining the MIME types (look under “6.8 Base64 Content-Transfer-Encoding”).

boolean

A Boolean true or false value. Representations of true are "true" and "1"; false is denoted as "false" or "0".

byte

A signed 8-bit integer in the range [-128, 127]. Derived from the short datatype.

date

Represents a specific date. The syntax is the same as that for the date part of dateTime, with an optional time zone indicator. Example: "1889-09-24".

dateTime

Represents a specific instant of time. It has the form YYYY-MM-DDThh:mm:ss folowed by an optional time-zone suffix.

YYYY is the year, MM is the month number, DD is the day number, hh the hour in 24-hour format, mm the minute, and ss the second (a decimal and fraction are allowed for the seconds part).

The optional zone suffix is either "Z" for Universal Coordinated Time (UTC), or a time offset of the form hh:mm", giving the difference between UTC and local time in hours and minutes.

Example: "2004-10-31T21:40:35.5-07:00" is a time on Halloween 2004 in Mountain Standard time. The equivalent UTC would be "2004-11-01T04:40:35.5Z".

decimal

Any base-10 fixed-point number. There must be at least one digit to the left of the decimal point, and a leading "+" or "-" sign is allowed. Examples: "42", "-3.14159", "+0.004".

double

A 64-bit floating-point decimal number as specified in the IEEE 754-1985 standard. The external form is the same as the float datatype.

duration

Represents a duration of time, as a composite of years, months, days, hours, minutes, and seconds. The syntax of a duration value has these parts:

  • If the duration is negative, it starts with "-".

  • A capital "P" is always included.

  • If the duration has a years part, the number of years is next, followed by a capital "Y".

  • If there is a months part, it is next, followed by capital "M".

  • If there is a days part, it is next, followed by capital "D".

  • If there are any hours, minutes, or seconds in the duration, a capital "T" comes next; otherwise the duration ends here.

  • If there is an hours part, it is next, followed by capital "H".

  • If there is a minutes part, it is next, followed by capital "M".

  • If there is a seconds part, it is next, followed by capital "S". You can use a decimal point and fraction to specify part of a second.

Missing parts are assumed to be zero. Examples: "P1347Y" is a duration of 1347 Gregorian years; "P1Y2MT2H5.6S" is a duration of one year, two months, two hours, and 5.6 seconds.

float

A 32-bit floating-point decimal number as specified in the IEEE 754-1985 standard. Allowable values are the same as in the decimal type, optionally followed by an exponent, or one of the special values "INF" (positive infinity), "-INF" (negative infinity), or "NaN" (not a number).

The exponent starts with either "e" or "E", optionally followed by a sign, and one or more digits.

Example: "6.0235e-23".

gDay

A day of the month in the Gregorian calendar. The syntax is "---DD" where DD is the day of the month. Example: the 27th of each month would be represented as "---27".

gMonth

A month number in the Gregorian calendar. The syntax is "--MM--", where MM is the month number. For example, "--06--" represents the month of June.

gMonthDay

A Gregorian month and day as "--MM-DD". Example: "--07-04" is the Fourth of July.

gYear

A Gregorian year, specified as YYYY. Example: "1889".

gYearMonth

A Gregorian year and month. The syntax is YYYY-MM. Example: "1995-08" represents August 1995.

hexBinary

Represents a sequence of octets (bytes), each given as two hexadecimal digits. Example: "0047dedbef" is five octets.

ID

A unique identifier as in the ID attribute type from the XML standard.

Derived from the NCName datatype.

IDREF, IDREFS

An IDREF value is a reference to a unique identifier as defined under attribute types in the XML standard. An IDREFS value is a space-separated sequence of such references.

Derived from the NCName datatype.

int

Represents a 32-bit signed integer in the range [-2,147,483,648, 2,147,483,647]. Derived from the long datatype.

integer

Represents a signed integer. Values may begin with an optional "+" or "-" sign. Derived from the decimal datatype.

language

One of the standardized language codes defined in RFC 1766. Example: "fj" for Fijian. Derived from the token type.

long

A signed, extended-precision integer; at least 18 digits are guaranteed. Derived from the integer datatype.

Name

A name as defined in the XML standard. The first character can be a letter or underbar “_”, and the remaining characters may be letters, underbars, digits, hyphen “-”, period “.”, or colon “:”.

Derived from the token datatype.

NCName

The local part of a qualified name. See the NCName definition in the document Namespaces in XML.

Derived from the name datatype.

negativeInteger

Represents an integer less than zero. Derived from the nonPositiveInteger datatype.

NMTOKEN, NMTOKENS

Any sequence of name characters, defined in the XML standard: letters, underbars “_”, hyphen “-”, period “.”, or colon “:”.

A NMTOKENS data value is a space-separated sequence of NMTOKEN values.

Derived from the NMTOKEN datatype.

nonNegativeInteger

An integer greater than or equal to zero. Derived from the integer datatype.

nonPositiveInteger

An integer less than or equal to zero. Derived from the integer datatype.

normalizedString

This datatype describes a “normalized” string, meaning that it cannot include newline (LF), return (CR), or tab (HT) characters.

Derived from the string type.

positiveInteger

An extended-precision integer greater than zero. Derived from the nonNegativeInteger datatype.

QName

An XML qualified name, such as "xsl:stylesheet".

short

A 16-bit signed integer in the range [-32,768, 32,767]. Derived from the int datatype.

string

Any sequence of zero or more characters.

time

A moment of time that repeats every day. The syntax is the same as that for dateTime, omitting everything up to and including the separator "T". Examples: "00:00:00" is midnight, and "13:04:00" is an hour and four minutes after noon.

token

The values of this type represent tokenized strings. They may not contain newline (LF) or tab (HT) characters. They may not start or end with whitespace. The only occurrences of whitespace allowed inside the string are single spaces, never multiple spaces together. Derived from normalizedString.

unsignedByte

An unsigned 16-bit integer in the range [0, 255]. Derived from the unsignedShort datatype.

unsignedInt

An unsigned 32-bit integer in the range [0, 4,294,967,295]. Derived from the unsignedLong datatype.

unsignedLong

An unsigned, extended-precision integer. Derived from the nonNegativeInteger datatype.

unsignedShort

An unsigned 16-bit integer in the range [0, 65,535]. Derived from the unsignedInt datatype.

6.2. Parameters to the xsd: types

You can apply parameters to your datatypes to further constrain what values are legal. Parameters come in four groups:

  • The length group. The length parameter specifies the exact length of valid values, no more and no less. The minLength specifies the minimum length, and the maxLength parameter specifies the maximum length.

    For example, a value declared as

        xsd:string { length="4" }

    can have any characters in it, so long as there are exactly four of them. A value declared this way

        xsd:integer { minLength="5" maxLength="9" }

    must be an integer with between 5 and 9 characters. To specify that a string cannot exceed 20 characters:

        xsd:string { maxLength="20" }

  • The range group. These attributes restrict the values that a number can have. There are four: minInclusive="N" means the number must be greater than or equal to N, and minExclusive="N" means the number must be greater than N. Similarly, maxInclusive="N" restricts values to those less than or equal to N, and maxExclusive="N" means values must be strictly less than N.

    For example, to specify a decimal float that is greater than 0.0 but less than or equal to 10.0:

        xsd:decimal { minExclusive="0.0" maxInclusive="10.0" }

  • The digits group. The totalDigits="N" attribute limits the total number of digits in a number to N. The fractionDigits="N" attribute limits the total number of digits after the decimal point to N.

    Here's an example. Suppose you want to restrict the values of a number to have no more than three digits after the decimal. You could do this with:

        xsd:decimal { fractionDigits="3" }

  • The pattern="P" attribute allows you to restrict values using a regular expression syntax. See Section 6.3, “ Regular expression syntax for xsd:.

Here is a table showing the attribute groups that are allowed for each of the basic types.

Typepattern length, minLength, maxLength minInclusive, minExclusive, maxExclusive maxInclusive fractionDigits, totalDigits
anyURIxx  
base64Binaryxx  
booleanx   
bytex xx
datex x 
dateTimex x 
decimalx xx
doublex x 
durationx x 
floatx x 
gDayx x 
gMonthx x 
gMonthDayx x 
gYearx x 
gYearMonthx x 
hexBinaryxx  
IDxx  
IDREFxx  
IDREFSxx  
intx xx
integerx xx
languagexx  
longx xx
Namexx  
NCNamexx  
negativeIntegerx xx
NMTOKENxx  
NMTOKENSxx  
nonNegativeIntegerx xx
nonPositiveIntegerx xx
normalizedStringxx  
positiveIntegerx xx
QNamexx  
shortx xx
stringxx  
timex x 
tokenxx  
unsignedBytex xx
unsignedIntx xx
unsignedLongx xx
unsignedShortx xx

6.3.  Regular expression syntax for xsd:

The regular expression syntax is fairly similar to that of Perl. Refer to the Appendix F of the XML Schema Datatypes specification for a complete definition of the regular expressions allowed in the pattern parameter of any of the xsd: datatypes.

Note

If you are working with Unicode, you should read the full specification, as there are a number of advanced features, not discussed here, that are most useful in Unicode work.

Here is a summary of most of the commonly used features.

p|q Either pattern p or pattern q.
pqPattern p followed by pattern q.
p?Matches pattern p or nothing at all. You could think of it as saying “p occurs optionally here.”
p* Matches zero or more occurrences of p.
p+ Matches one or more occurrences of p.
p{n} Matches exactly n occurrences of pattern p.
p{n,m} Matches at least n occurrences, but no more than m occurrences, of pattern p.
p{n,} Matches n or more occurrences of pattern p.
[c1c2...]

Matches any single character from inside the square brackets. For example, the pattern “xsd:string { pattern='[abc]' }” matches any of the characters a, b, or c.

You can specify ranges of characters as “[c1-c2]”. For example, the pattern “[a-zA-Z]” matches any letter, lowercase or uppercase.

[^c1c2...] Matches any single character except those enumerated inside the square brackets. For example, the regular expression “xsd:string { pattern='[^abc]' }” matches any single character except a, b, or c.
(p) Parentheses may be used for grouping. For example, pattern “(ab)+” matches “ab”, “abab”, “ababab”, and so on.
\r Matches the carriage return (ASCII CR) character.
\n Matches the newline (ASCII LF) character.
. Matches any character except newline or carriage return.
\t Matches the tab (ASCII HT) character.
\C Any of the following characters must be escaped by preceding them with a backslash: “\ | . - ^ ? * + { } ( ) [ ]”. For example, “pattern='\[\*\]'” matches the string “[*]”.
\s Matches a whitespace character: space, tab, newline, or carriage return.
\S Matches any character except a whitespace character.
\i Matches a name start character: a letter, “_”, or “:”.
\I Matches any character except a name start character.
\c Matches a name character, that is, a name start character or digit.
\C Matches any character except a name character.
\d Matches a decimal digit (same as “[0-9]”).
\D Matches any character except a decimal digit.

Here's an example of a pattern for a U. S. Postal Service zip code:

    xsd:string { pattern='[0-9]{5}(-[0-9]{4})?' }

That is, five digits, optionally followed by a hyphen and four more digits.