Use the rules in this section to control the presentation of your material in audio form. This is a good idea to improve accessibility for blind readers, but may be useful in other situations where your readers are actually listeners.
These properties control the apparent location of the speaker in space. Any stereo rendering will be able to render the spatial position anywhere between the left and right channels. The standard also defines apparent positions behind the listener, as well as positions above and below the plane of the listener's ear, for more advanced rendering systems that can place the sounds in those positions.
The horizontal position property is called azimuth. A sound directly in front of the
listener, equidistant between left and right channels, is
the 0° reference; the right channel is azimuth
90°, and the left channel is azimuth 270°
or -90°—these angles are equivalent.
A source behind the listener is at azimuth 180°.
The values of the azimuth property may be
any of:
The azimuth in degrees. See Section 6.6, “Specifying angles”.
One of the keywords in the diagram below, such as
left-side for azimuth 270° or
center-left behind for 200°.
leftwards to place the source
20° clockwise relative to the azimuth
inherited from the parent element, that is, the
parent's azimuth minus 20°.
rightwards to place the source
20° counterclockwise relative to the azimuth
inherited from the parent element, that is, the
parent's azimuth plus 20°.

This figure shows the various azimuth keywords and the equivalent angles.
If the rendering agent has speakers above and below the listener, we can also specify the apparent vertical position of the source.
Set the elevation property to one of:
angle
The angle of the source relative to the horizontal; see Section 6.6, “Specifying angles”. For example, the declaration “elevation: -20deg;” would place the source twenty degrees below the horizon.
above
The source comes from the zenith, straight up.
below
The source comes from the nadir, straight down.
higher
The source is placed 10° higher than the
elevation property of the parent
element.
lower
The source is placed 10° lower than the
elevation property of the parent
element.
This property controls the loudness of the aural presentation. Permissible values include:
integer
An integer from 0 to 100 selects the loudness. A value of 0 does not mean no sound at all: it means the softest audible level. A value of 100 is the loudest comfortable level. It is the responsibility of the user agent, not CSS, to determine what these numbers represent, taking into account the background noise level and the dynamic range of the audio system.
silent
Turn the sound off altogether.
x-soft
Extra-soft.
soft
A relatively soft volume.
medium
A medium volume. This is the default value.
loud
A moderately loud volume.
x-loud
Extra-loud volume.
The speak property has three options:
normal
Render the material in the normal, default way.
none
Don't speak this element, and don't take any time for it either.
Note that other elements contained inside this
element can override this property with a speak rule of their own. If you want to
suppress rendering of contained elements, set the
display property to “none”.
spell-out
Instead of speaking the words, spell them letter by letter. This may help people understand acronyms that are not easy to pronounce, such as PNAMBC (Pay No Attention to the Man Behind the Curtain).
The speak-punctuation property controls
how punctuation marks are rendered:
code
Pronounce the name of the punctuation mark. For example, you might hear a voice synthesizer say “calm comma calm comma calm period.”
none
Render punctuation marks as pauses of appropriate lengths.
Finally, the speak-numeral property
specifies how numbers are rendered. Values:
digits
Speak each digit individually, e.g., 869 would be rendered “eight six nine”.
continuous
Render the numeral in the customary way for the language, e.g., “eight hundred sixty-nine.”
The voice-family property selects a
general voice type or specific personality. Values may
include:
female
A generic female voice.
male
A generic male voice.
child
A generic child's voice.
personality
A given voice synthesizer may have any number of
specific named voices, somewhat like type font
families. Examples: gary owens,
talullah bankhead, bullwinkle, lounge singer.
You can provide a comma-separated list of voice names as
the value of the voice-family property,
and the agent will try them in the given order until it
finds one that is actually available.
Just as you can't assume any specific font will be
available to any reader's browser, it's safest to provide
one of the generic voices as a fail-safe at the end of
the list. Example:
voice-family: madeline kahn, carol kane, female;
The pitch property controls whether a
voice is low or high. You may specify the value as a
frequency (see Section 6.8, “Frequencies”), or any of
the values x-low (extra-low), low, medium (the default), high, or x-high for extra-high.
Use the pitch-range property to control
how much the synthesized voice varies in pitch. The
value of this property is a number from 0 to 100, the
default being 50. A declaration “pitch-range: 0” would render the voice as
a flat, uninflected monotone. Values above 50 give you
higher amounts of pitch range.
The stress property controls how much the
voice varies in stress. The value is a number from 0 to
100. A zero value would suppress all stress variations.
The default value is 50. Higher values add more
variation in stress, as if the speaker were agitated.
Finally, the richness property is also a
number from 0 to 100, defaulitng to 50. Lower values
produce a smoother voice; higher values produce a voice
that carries better in a larger room.
The speech-rate property controls how fast
the agent speaks the words. Values may be:
integer
An integer specifies the speech rate in words per minute.
x-slow
About 80 words per minute.
slow
About 120 wpm.
medium
The default speed, somewhere around 180 to 200 wpm.
fast
About 300 wpm.
x-fast
About 500 wpm.
faster
About 40 wpm faster than the speech-rate of the parent element.
slower
About 40 wpm less than the speech-rate of the parent element.
To add a bit of extra silence before an element, set its
pause-before property; to add some after
it, set pause-after. Values allowed:
time
The syntax for specifying times is discussed in Section 6.7, “Times”.
percentage
A number followed by “%” specifies the duration as a percentage of
an average word length, as specified by the speech-rate property (see Section 20.2, “Voice properties”).
For example, if the current speech rate is 180 words
per minute, the average time per word is 1/3 of a
second, so a pause-before value of
“200%” would add a gap
of about 2/3 of a second.
There is also a combination pause property
that sets both the pause-before and pause-after properties. If pause
is followed by one value, both properties are set to that
value; if there are two values, pause-before gets the first value and pause-after the
second.
For example, this rule inserts a 150-millisecond pause
both before and after any h2 element:
h2 { pause: 150ms; }
You can play a sound clip before or after a selected
element. The cue-before property plays
the clip before the element, and cue-after
plays the clip after the element. Values may be:
uri
To supply a sound file, specify the file's URI as the value.
none
Do not use a cue.
You can set both cues at once with the cue
combination property. If you supply one value, that clip
is played both before and after the element. You may
also supply two values, one before and after. For example:
p.warning { cue: url("reveille.wav") url("retreat.wav"); }
This rule would play reveille.wav
before each <p
class='warning'>…</p> element,
and play retreat.wav after it.
The play-during property causes a
recording to be played in the background during the
rendering of some element. Values may be:
uri
Retrieve the recording from the given URI. The value may be followed by either of two keywords:
mix
If this keyword is given, and the parent
element has also specified a recording with
the play-during property, mix
the parent's recording with this element's
recording. Without this keyboard, the child
element's recording would replace that of the
parent.
repeat
If given, this keyword tells the agent to repeat the recording as many times as necessary to be heard behind the rendering of the element. The default is to play it only once. If the recording is longer than the rendering of the element, it is cut off when the element is finished rendering.
auto
If the parent element has a play-during recording, continue playing it
back as this element is rendered.
none
Do not play a recording during this element.
inherit
If the parent element has a play-during recording, start it over again
for the current element.