Audiogum Remote Control and Natural Language Understanding services use common message formats to send control commands to a client device.
Firmware or apps integrating for either Remote Control, Natural Language Understanding or both use cases must interpret these commands and implement appropriate behaviour on the device for the features they support.
This document describes the vocabulary of command messages and guidance for implementation.
Command messages are TEXT messages in JSON format received on the Remote Control WebSocket.
Example:
{
"type": "remotecommand",
"actions": [
{ "action": "stop" }
]
}
In the case of Natural Language Understanding, command messages resulting from voice interaction have the type voiceresult
. These may contain the same data and should invoke the same behaviour.
Generally a command message will include an actions
field* with one more more actions. The action
field of each indicates the type of intended action. Each action object has corresponding other data included in the command message as necessary, as described in the Actions table below.
Firmware or app clients should ignore any commands where the action
value is not recognised. Audiogum may add new actions to support new features over time without requiring a newly versioned API. Although the client can declare its supported actions to the Audiogum platform, this does not guarantee other actions will not be received - see Capabilities below.
Similarly firmware must ignore any additional JSON fields included in the command message that they do not recognise, including inside the actions. Audiogum may augment the command vocabulary for existing actions over time. This will be done in such a way that it is backward compatible - i.e. new devices that understand the extra detail can benefit, while older ones can gracefully degrade to previous behaviour by ignoring the new detail.
(* It is possible for a command message to include no action if it only has side effects. An example can be a command containing only respond
- see Voice feedback)
Clients declare which actions they support with the capabilities
parameter when making a websocket connection or through the Natural Language Understanding REST API.
Capabilities must include all action
values that are supported. See below for the full command message vocabulary, intended behaviour and guidance for implementation.
Capabilities must also specify the types of playback that a device supports. This could include one or all of the following: http
, https
, applesdk
. http
and https
indicate that the device is capable of streaming playback from http or https streamurls returned by audiogum, applesdk
indicates that the device can handle apple music refs using MusicKit (this is only relevant for iOS devices).
Additional special features that are not represented as action
values may also require capability values. For example the capability value expectreply
enables additional conversation features for Natural Language Understanding.
Some commands are requests for data, for example state of playback. Generally these have action names prefixed with "get", e.g. getplayer
, getpresets
, as listed in the Get state actions table.
Clients supporting these actions should respond by sending back a text message on the same web socket connection, in JSON form as described by the State data examples.
Each "get" command actions will include a unique correlationid
field. The same value should be included in the response message. This allows responses to be correlated with requests as necessary.
Command messages may contain a respond
field indicating a voice response to play to the user. This can occur either on its own or simultaneously with other fields including actions
.
{
"type": "remotecommand",
"respond": {
"text": "something to say",
"audio": "someaudiourl.mp3",
"languagecode": "en-GB"
}
}
Firmware clients implementing voice feedback may either play the media specified by the audio
field or generate a voice based on text
using a separate text-to-speech service. It is expected that the voice playback occurs before performing the action of the command (if any). To ensure the voice can be heard, music playback volume should be reduced or muted for the duration of the response.
General controller actions.
Action | Description | Command message example |
---|---|---|
play | Play/resume playback, if any already started | { |
stop | Stop playback | { |
pause | Pause playback | { |
skipnext | Skip player to next track | { |
skipprev | Skip player to previous track | { |
rewind | Re-start current track from the beginning | { |
seek | Move player to specified offset in current track. The offset is specified in seconds, relative is optional and defaults to true . | { |
playpreset | Switch to a specified store preset | { |
playsource | Switch to a specified input source | { |
volumeup | Increase volume by one increment or by a specified relative value. The parameters part is optional. See note below. | { { |
volumedown | Decrease volume by one increment or by specified relative value. The parameters part is optional. See note below. | { { |
setvolume | Set volume to specified value. The value is specified in the range 0 to 100100 should be considered highest supported volume level. See note below. | { |
mute | Mute the audio output (without changing current volume value) | { |
unmute | Unmute the audio output (revert to current volume value) | { |
shuffle | Activate 'shuffle' playback mode | { |
unshuffle | Deactivate 'shuffle' playback mode | { |
repeat | Activate 'repeat' playback mode | { |
norepeat | Deactivate 'repeat' playback mode | { |
Volume has a range of 0 - 100
. The commands for volume allow the user to set an absolute volume either by percentage or a range of 0 - 10
. We also support relative commands (e.g. louder
, quieter
, etc.) that will give a delta to the client to affect the volume.
Actions relating to Audiogum playables assume integration with Audiogum's playback features - see Firmware Integration and Playback documentation for details.
Action | Description | Command message example |
---|---|---|
playplayable | Play an Audiogum playable. The startindex parameter is optional and should be passed to Audiogum API only if present. | { |
refreshplayable | Refresh playable data without interruption so that next track will be from new data. The id of the playable provided in parameters may or may not be the same as the current playing playable.In either case the currently playing item should continue until it completes and then the player should move on to the first item from the new playable response. If the currently playing item has the flag continuous: true , this will not be possible.In this case, or if what is playing is not an Audiogum playable, the new playable should begin immediately as per playplayable .The startindex parameter is optional and should be passed to Audiogum API only if present. | { |
setpreset | Add a playable to a preset. The startindex parameter is optional.It should be passed to Audiogum API only if present, whenever the preset is used. | { |
Action | Description | Command message example |
---|---|---|
getdevicedetails | Respond with device details about the host device See also Get state commands. | { |
getplayer | Respond with player state details. See also Get state commands. | { |
getpresets | Respond with current preset state. See also Get state commands. | { |
Action | Description | Command message example |
---|---|---|
reboot | Reboot the device | { |
shutdown | Shutdown the device and remain in the power-off state | { |
remotecontrolrefresh | Discard the current remotecontrol token, acquire a new one, and reconnect | { |
sendlogs | Send the speaker's log to the Audiogum Device Logs API (see Devices: Device Logs) | { |
The following examples describe the messages expected from the device in response to get...
actions.
{
"type": "devicedetails",
"correlationid": "991001",
"deviceid": "abc123xyz",
"friendlyname": "My XYZ Speaker",
"devicetype": "XYZspeaker",
"serial": "1234-5678-9abc-def0",
"location": {
"country": "ie",
"city": "Dublin",
"latitude": 53.3389,
"longitude": -6.2595
},
"ip": "54.75.236.122",
"rssi": 46,
"powerlevel": 52,
"location": {
"country": "ie",
"city": "Dublin",
"latitude": 53.3389,
"longitude": -6.2595
},
"platformversion": "1.2.3",
"platform": "XYZbaseplatform"
}
{
"type": "player",
"correlationid": "123456",
"playstate": "playing",
"source": "playable",
"presetnumber": 1,
"offset": 45,
"playable": {
},
"item": {
},
"volume": {
"value": 30,
"mute": false
}
}
The fields source
, offset
, item
, playable
, sourcedata
, presetnumber
should be provided when relevant as defined for analytics play events.
playstate
indicates the playing state of the player. Allowed values: idle
, buffering
, playing
, paused
.
volume
should include the current volume as value
mapped to the scale of 0 to 100, and if the player supports muting, the current state of mute
, either true
(muted) or false.
{
"type": "presets",
"correlationid": "123abc",
"presets": [
{
"presetnumber": 1,
"source": "playable",
"playable": {
"id": "f75d65fe39814a659af67ceb5853efc0",
"name": "Music Now"
}
},
{
"presetnumber": 2,
"source": "spotifyconnect",
"sourcedata": {
"service": "spotify",
"ref": "spotify:user:212pk…:playlist:51T9…",
"name": "My Example Playlist"
}
},
{
"presetnumber": 3,
"source": "hdmi"
}
]
}
This assumes the device has enumerable presets that may be assigned Audiogum playables. Any unassigned presets should be included with only the presetnumber
field.