Next / Previous / Contents / TCC Help System / NM Tech homepage

20. Aural stylesheets

Use the rules in this section to control the presentation of your material in audio form. This is a good idea to improve accessibility for blind readers, but may be useful in other situations where your readers are actually listeners.

20.1. Spatial presentation: the azimuth property

These properties control the apparent location of the speaker in space. Any stereo rendering will be able to render the spatial position anywhere between the left and right channels. The standard also defines apparent positions behind the listener, as well as positions above and below the plane of the listener's ear, for more advanced rendering systems that can place the sounds in those positions.

The horizontal position property is called azimuth. A sound directly in front of the listener, equidistant between left and right channels, is the 0° reference; the right channel is azimuth 90°, and the left channel is azimuth 270° or -90°—these angles are equivalent. A source behind the listener is at azimuth 180°.

The values of the azimuth property may be any of:

  • The azimuth in degrees. See Section 6.6, “Specifying angles”.

  • One of the keywords in the diagram below, such as left-side for azimuth 270° or center-left behind for 200°.

  • leftwards to place the source 20° clockwise relative to the azimuth inherited from the parent element, that is, the parent's azimuth minus 20°.

  • rightwards to place the source 20° counterclockwise relative to the azimuth inherited from the parent element, that is, the parent's azimuth plus 20°.

This figure shows the various azimuth keywords and the equivalent angles.

If the rendering agent has speakers above and below the listener, we can also specify the apparent vertical position of the source.

Set the elevation property to one of:


The angle of the source relative to the horizontal; see Section 6.6, “Specifying angles”. For example, the declaration “elevation: -20deg;” would place the source twenty degrees below the horizon.


The source comes from the zenith, straight up.


The source comes from the nadir, straight down.


The source is placed 10° higher than the elevation property of the parent element.


The source is placed 10° lower than the elevation property of the parent element.

20.2. Voice properties

20.3. The volume property

This property controls the loudness of the aural presentation. Permissible values include:


An integer from 0 to 100 selects the loudness. A value of 0 does not mean no sound at all: it means the softest audible level. A value of 100 is the loudest comfortable level. It is the responsibility of the user agent, not CSS, to determine what these numbers represent, taking into account the background noise level and the dynamic range of the audio system.


Turn the sound off altogether.




A relatively soft volume.


A medium volume. This is the default value.


A moderately loud volume.


Extra-loud volume.

20.4. The speak, speak-punctuation , and speak-numeral properties: spelling it out

The speak property has three options:


Render the material in the normal, default way.


Don't speak this element, and don't take any time for it either.

Note that other elements contained inside this element can override this property with a speak rule of their own. If you want to suppress rendering of contained elements, set the display property to “none”.


Instead of speaking the words, spell them letter by letter. This may help people understand acronyms that are not easy to pronounce, such as PNAMBC (Pay No Attention to the Man Behind the Curtain).

The speak-punctuation property controls how punctuation marks are rendered:


Pronounce the name of the punctuation mark. For example, you might hear a voice synthesizer say “calm comma calm comma calm period.”


Render punctuation marks as pauses of appropriate lengths.

Finally, the speak-numeral property specifies how numbers are rendered. Values:


Speak each digit individually, e.g., 869 would be rendered “eight six nine”.


Render the numeral in the customary way for the language, e.g., “eight hundred sixty-nine.”

20.5. General voice qualities: voice-family, pitch, pitch-range, stress, and richness

The voice-family property selects a general voice type or specific personality. Values may include:


A generic female voice.


A generic male voice.


A generic child's voice.


A given voice synthesizer may have any number of specific named voices, somewhat like type font families. Examples: gary owens, talullah bankhead, bullwinkle, lounge singer.

You can provide a comma-separated list of voice names as the value of the voice-family property, and the agent will try them in the given order until it finds one that is actually available. Just as you can't assume any specific font will be available to any reader's browser, it's safest to provide one of the generic voices as a fail-safe at the end of the list. Example:

voice-family: madeline kahn, carol kane, female;

The pitch property controls whether a voice is low or high. You may specify the value as a frequency (see Section 6.8, “Frequencies”), or any of the values x-low (extra-low), low, medium (the default), high, or x-high for extra-high.

Use the pitch-range property to control how much the synthesized voice varies in pitch. The value of this property is a number from 0 to 100, the default being 50. A declaration “pitch-range: 0” would render the voice as a flat, uninflected monotone. Values above 50 give you higher amounts of pitch range.

The stress property controls how much the voice varies in stress. The value is a number from 0 to 100. A zero value would suppress all stress variations. The default value is 50. Higher values add more variation in stress, as if the speaker were agitated.

Finally, the richness property is also a number from 0 to 100, defaulitng to 50. Lower values produce a smoother voice; higher values produce a voice that carries better in a larger room.

20.6. Timing properties: speech-rate, pause-before, pause-after, and pause

The speech-rate property controls how fast the agent speaks the words. Values may be:


An integer specifies the speech rate in words per minute.


About 80 words per minute.


About 120 wpm.


The default speed, somewhere around 180 to 200 wpm.


About 300 wpm.


About 500 wpm.


About 40 wpm faster than the speech-rate of the parent element.


About 40 wpm less than the speech-rate of the parent element.

To add a bit of extra silence before an element, set its pause-before property; to add some after it, set pause-after. Values allowed:


The syntax for specifying times is discussed in Section 6.7, “Times”.


A number followed by “%” specifies the duration as a percentage of an average word length, as specified by the speech-rate property (see Section 20.2, “Voice properties”).

For example, if the current speech rate is 180 words per minute, the average time per word is 1/3 of a second, so a pause-before value of “200%” would add a gap of about 2/3 of a second.

There is also a combination pause property that sets both the pause-before and pause-after properties. If pause is followed by one value, both properties are set to that value; if there are two values, pause-before gets the first value and pause-after the second.

For example, this rule inserts a 150-millisecond pause both before and after any h2 element:

h2 { pause: 150ms; }

20.7. Element cues: cue-before, cue-after, and cue

You can play a sound clip before or after a selected element. The cue-before property plays the clip before the element, and cue-after plays the clip after the element. Values may be:


To supply a sound file, specify the file's URI as the value.


Do not use a cue.

You can set both cues at once with the cue combination property. If you supply one value, that clip is played both before and after the element. You may also supply two values, one before and after. For example:

p.warning { cue: url("reveille.wav") url("retreat.wav"); }

This rule would play reveille.wav before each <p class='warning'>…</p> element, and play retreat.wav after it.

20.8. Audio mixing: play-during

The play-during property causes a recording to be played in the background during the rendering of some element. Values may be:


Retrieve the recording from the given URI. The value may be followed by either of two keywords:


If this keyword is given, and the parent element has also specified a recording with the play-during property, mix the parent's recording with this element's recording. Without this keyboard, the child element's recording would replace that of the parent.


If given, this keyword tells the agent to repeat the recording as many times as necessary to be heard behind the rendering of the element. The default is to play it only once. If the recording is longer than the rendering of the element, it is cut off when the element is finished rendering.


If the parent element has a play-during recording, continue playing it back as this element is rendered.


Do not play a recording during this element.


If the parent element has a play-during recording, start it over again for the current element.