Remote Control and Natural Language Understanding Commands

Audiogum Remote Control and Natural Language Understanding services use common message formats to send control commands to a client device.

Firmware or apps integrating for either Remote Control, Natural Language Understanding or both use cases must interpret these commands and implement appropriate behaviour on the device for the features they support.

This document describes the vocabulary of command messages and guidance for implementation.


Concepts and implementation guidance

Command messages

Command messages are TEXT messages in JSON format received on the Remote Control WebSocket.

Example:

{
  "type": "remotecommand",
  "actions": [
    { "action": "stop" }
  ]
}

In the case of Natural Language Understanding, command messages resulting from voice interaction have the type voiceresult. These may contain the same data and should invoke the same behaviour.

Actions and compatibility

Generally a command message will include an actions field* with one more more actions. The action field of each indicates the type of intended action. Each action object has corresponding other data included in the command message as necessary, as described in the Actions table below.

Firmware or app clients should ignore any commands where the action value is not recognised. Audiogum may add new actions to support new features over time without requiring a newly versioned API. Although the client can declare its supported actions to the Audiogum platform, this does not guarantee other actions will not be received - see Capabilities below.

Similarly firmware must ignore any additional JSON fields included in the command message that they do not recognise, including inside the actions. Audiogum may augment the command vocabulary for existing actions over time. This will be done in such a way that it is backward compatible - i.e. new devices that understand the extra detail can benefit, while older ones can gracefully degrade to previous behaviour by ignoring the new detail.

(* It is possible for a command message to include no action if it only has side effects. An example can be a command containing only respond - see Voice feedback)

Capabilities

Clients declare which actions they support with the capabilities parameter when making a websocket connection or through the Natural Language Understanding REST API.

Capabilities must include all action values that are supported. See below for the full command message vocabulary, intended behaviour and guidance for implementation.

Capabilities must also specify the types of playback that a device supports. This could include one or all of the following: http, https, applesdk. http and https indicate that the device is capable of streaming playback from http or https streamurls returned by audiogum, applesdk indicates that the device can handle apple music refs using MusicKit (this is only relevant for iOS devices).

Additional special features that are not represented as action values may also require capability values. For example the capability value expectreply enables additional conversation features for Natural Language Understanding.

Get state commands

Some commands are requests for data, for example state of playback. Generally these have action names prefixed with "get", e.g. getplayer, getpresets, as listed in the Get state actions table.

Clients supporting these actions should respond by sending back a text message on the same web socket connection, in JSON form as described by the State data examples.

Each "get" command actions will include a unique correlationid field. The same value should be included in the response message. This allows responses to be correlated with requests as necessary.

Voice feedback

Command messages may contain a respond field indicating a voice response to play to the user. This can occur either on its own or simultaneously with other fields including actions.

{
  "type": "remotecommand",
  "respond": {
    "text": "something to say",
    "audio": "someaudiourl.mp3",
    "languagecode": "en-GB"
  }
}

Firmware clients implementing voice feedback may either play the media specified by the audio field or generate a voice based on text using a separate text-to-speech service. It is expected that the voice playback occurs before performing the action of the command (if any). To ensure the voice can be heard, music playback volume should be reduced or muted for the duration of the response.


Actions

General controller actions.

ActionDescriptionCommand message example
playPlay/resume playback, if any already started
{
"action": "play"
}
stopStop playback
{
"action": "stop"
}
pausePause playback
{
"action": "pause"
}
skipnextSkip player to next track
{
"action": "skipnext"
}
skipprevSkip player to previous track
{
"action": "skipprev"
}
rewindRe-start current track from the beginning
{
"action": "rewind"
}
seekMove player to specified offset in current track.
The offset is specified in seconds, relative is optional and defaults to true.
{
"action": "seek",
"parameters": {
"offset": 23,
"relative": false
}
}
playpresetSwitch to a specified store preset
{
"action": "playpreset",
"parameters": {
"presetnumber": 3
}
}
playsourceSwitch to a specified input source
{
"action": "playsource",
"parameters": {
"name": "aux"
}
}
volumeupIncrease volume by one increment or by a specified relative value.
The parameters part is optional. See note below.
{
"action": "volumeup"
}
{
"action": "volumeup",
"parameters": { "value": 30 }
}
volumedownDecrease volume by one increment or by specified relative value.
The parameters part is optional. See note below.
{
"action": "volumedown"
}
{
"action": "volumedown",
"parameters": { "value": 30 }
}
setvolumeSet volume to specified value.
The value is specified in the range 0 to 100
100 should be considered highest supported volume level. See note below.
{
"action": "setvolume",
"parameters": { "value": 30 }
}
muteMute the audio output (without changing current volume value)
{
"action": "mute"
}
unmuteUnmute the audio output (revert to current volume value)
{
"action": "unmute"
}
shuffleActivate 'shuffle' playback mode
{
"action": "shuffle"
}
unshuffleDeactivate 'shuffle' playback mode
{
"action": "unshuffle"
}
repeatActivate 'repeat' playback mode
{
"action": "repeat"
}
norepeatDeactivate 'repeat' playback mode
{
"action": "norepeat"
}

Volume Range

Volume has a range of 0 - 100. The commands for volume allow the user to set an absolute volume either by percentage or a range of 0 - 10. We also support relative commands (e.g. louder, quieter, etc.) that will give a delta to the client to affect the volume.

Playlisting actions

Actions relating to Audiogum playables assume integration with Audiogum's playback features - see Firmware Integration and Playback documentation for details.

ActionDescriptionCommand message example
playplayablePlay an Audiogum playable.
The startindex parameter is optional and should be passed to Audiogum API only if present.
{
"action": "playplayable",
"parameters": {
"id": "[playableid]",
"startindex": [startindex]
}
}
refreshplayableRefresh playable data without interruption so that next track will be from new data.
The id of the playable provided in parameters may or may not be the same as the current playing playable.
In either case the currently playing item should continue until it completes and then the player should move on to the first item from the new playable response.
If the currently playing item has the flag continuous: true, this will not be possible.
In this case, or if what is playing is not an Audiogum playable, the new playable should begin immediately as per playplayable.
The startindex parameter is optional and should be passed to Audiogum API only if present.
{
"action": "refreshplayable",
"parameters": {
"id": "[playableid]",
"startindex": [startindex]
}
}
setpresetAdd a playable to a preset.
The startindex parameter is optional.
It should be passed to Audiogum API only if present, whenever the preset is used.
{
"action": "setpreset",
"parameters": {
"presetnumber": 3,
"id": "[playableid]",
"startindex": [startindex]
}
}

Get state actions

ActionDescriptionCommand message example
getdevicedetailsRespond with device details about the host device
See also Get state commands.
{
"action": "getdevicedetails",
"correlationid": "991001"
}
getplayerRespond with player state details.
See also Get state commands.
{
"action": "getplayer",
"correlationid": "123456"
}
getpresetsRespond with current preset state.
See also Get state commands.
{
"action": "getpresets",
"correlationid": "123abc"
}

Administrative actions

ActionDescriptionCommand message example
rebootReboot the device
{
"action": "reboot"
}
shutdownShutdown the device and remain in the power-off state
{
"action": "shutdown"
}
remotecontrolrefreshDiscard the current remotecontrol token, acquire a new one, and reconnect
{
"action": "remotecontrolrefresh"
}
sendlogsSend the speaker's log to the Audiogum Device Logs API (see Devices: Device Logs)
{
"action": "sendlogs"
}

State data

The following examples describe the messages expected from the device in response to get... actions.

Device details

{
  "type": "devicedetails",
  "correlationid": "991001",
  "deviceid": "abc123xyz",
  "friendlyname": "My XYZ Speaker",
  "devicetype": "XYZspeaker",
  "serial": "1234-5678-9abc-def0",
  "location": {
    "country": "ie",
    "city": "Dublin",
    "latitude": 53.3389,
    "longitude": -6.2595
  },
  "ip": "54.75.236.122",
  "rssi": 46,
  "powerlevel": 52,
  "location": {
    "country": "ie",
    "city": "Dublin",
    "latitude": 53.3389,
    "longitude": -6.2595
  },
  "platformversion": "1.2.3",
  "platform": "XYZbaseplatform"
}

Player

{
  "type": "player",
  "correlationid": "123456",
  "playstate": "playing",
  "source": "playable",
  "presetnumber": 1,
  "offset": 45,
  "playable": {
  },
  "item": {
  },
  "volume": {
    "value": 30,
    "mute": false
  }
}

The fields source, offset, item, playable, sourcedata, presetnumber should be provided when relevant as defined for analytics play events.

playstate indicates the playing state of the player. Allowed values: idle, buffering, playing, paused.

volume should include the current volume as value mapped to the scale of 0 to 100, and if the player supports muting, the current state of mute, either true (muted) or false.

Presets

{
  "type": "presets",
  "correlationid": "123abc",
  "presets": [
    {
      "presetnumber": 1,
      "source": "playable",
      "playable": {
        "id": "f75d65fe39814a659af67ceb5853efc0",
        "name": "Music Now"
      }
    },
    {
      "presetnumber": 2,
      "source": "spotifyconnect",
      "sourcedata": {
        "service": "spotify",
        "ref": "spotify:user:212pk…:playlist:51T9…",
        "name": "My Example Playlist"
      }
    },
    {
      "presetnumber": 3,
      "source": "hdmi"
    }
  ]
}

This assumes the device has enumerable presets that may be assigned Audiogum playables. Any unassigned presets should be included with only the presetnumber field.