Remote Control

The Audiogum Remote Control system provides the means to control a connected device via a long-standing WebSocket connection over the internet.

Example use cases:

  • Control a device remotely via a companion app on a different network (e.g. from 4G cellular connection rather than home wifi network)
  • Control a device via a 3rd party voice control integration (e.g. Alexa Skills Kit or Google Home)
  • Control a device using Audiogum Natural Language Understanding

This document provides integration guidance in two parts:

The command message vocabulary for Remote Control is described separately here: Remote Control Commands


Security

To secure the control of the device and to protect the user's privacy, the Audiogum Remote Control system operates as follows:

  • Devices make a secure WebSocket connection (wss://) so that communication is encrypted
  • When making the connection, devices must provide both:
    • A client authenticated access token (the same as for other Audiogum API features requiring client authentication)
    • A remotecontrol token, previously requested from the Audiogum API and persisted for any subsequent connections
  • When attempting to send a command, a controller client must provide both:
    • A user authenticated access token (same as for other Audiogum API features requiring user authentication)
    • The current remotecontrol token of the device (which it must have obtained from the device previously, e.g. via a local network interface)

The remotecontrol token requirements ensure that no controller client can issue commands to the wrong device, either maliciously or accidentally, for example by enumerating other device identifiers such as MAC addresses or serial numbers. The remotecontrol token contains encrypted data about the device and is also random such that it will not be the same if generated again, even for the same device.

This also provides a means to revoke remote control access from any previous users or controlling apps simply by refreshing the remotecontrol token on the device. This could be done on factory reset for example.


Device firmware integration

The general approach to integrating Remote Control in device firmware is as follows:

  • Obtain a client authenticated token for Audiogum API see Firmware Integration documentation
  • Obtain a remotecontrol token from Audiogum API and persist it for future connections
  • Specify device capabilities (what command actions can be supported)
  • Establish a secure WebSocket connection
  • Implement handler code to respond to remote control command messages

The Audiogum Embedded C SDK (libaudiogum) provides much of the necessary plumbing, but any client platform capable of making secure WebSocket connections and handling JSON data can use the Web APIs directly. Both approaches are included here for reference.

Obtaining remotecontrol token

A remotecontrol token is requested automatically (if one is not already stored) and persisted at a file location specified, when calling ag_start_remotecontrol_loop

A remotecontrol token can be obtained using the main Audiogum REST API by POST /v1/remotecontrol/tokens. There is no need for a body in this request, but a device access token is required.

POST /v1/remotecontrol/tokens
Authorization: Bearer {access_token}

Example response:

{
"remotecontroltoken": "v1::dSxwOotdwTT3nCQ5MPx4mg==::FOHfjq1g5BE0byibonVtDYNTE38asbPXuHwbXN+Kp1IF+46A4i/WCaN1O+VPNGTyvAWnYcBF2gSXeitp0KUx2VGQiY2ZP2h9Vt7Eme+HPjE="
}

The device should only make this request once and then persist the remotecontrol token for future use, otherwise controlling apps that have been granted access will lose it.

Establishing WebSocket connection

A connection is made by calling ag_start_remotecontrol_loop

To make the connection, make a WebSocket upgrade request to wss://remote.audiogum.com/v1/remotecontrol. The following parameters must be provided, either as HTTP headers or query string parameters:

HeaderQuery string parameterExample
Authorizationaccess_tokenAuthorization: Bearer v1::YIiKfEuko4wOXS...
?access_token=v1::YIiKfEuko4wOXS...
remotecontroltokenremotecontroltokenremotecontroltoken: v1::dSxwOotdwTT3nCQ5MP...
&remotecontroltoken=v1::dSxwOotdwTT3nCQ5MP...
capabilitiescapabilitiescapabilities: play,stop,skipnext,skipprev
&capabilities=play,stop,skipnext,skipprev

If using query parameters don't forget the URL-encode the tokens.

Capabilities

The capabilities parameter specified when making the connection indicates to the service which features the device can support. This is to allow controlling applications to know what kind of commands can be sent.

Capabilities must include all action values that the device supports. See Remote Control Commands for full command message vocabulary, intended behaviour and guidance for implementation.

Capabilities must also specify the types of playback that a device supports. This could include one or all of the following: http, https, applesdk. http and https indicate that the device is capable of streaming playback from http or https streamurls returned by audiogum, applesdk indicates that the device can handle apple music refs using MusicKit (this is only relevant for iOS devices).

Additional special features that are not represented as action values may also require capability values. For example the capability value expectreply enables additional conversation features for Natural Language Understanding.

The actions_understood parameter for ag_start_remotecontrol_loop must be provided as an AG_ACTION bit flag, similarly with the optional voice_capabilities parameter.

For web socket connections, the capabilities header or query parameter as described above takes a comma separated list of all the action values that the device implements. For the REST API, a json string array is sent.

Handling command messages

Once connected, the device may receive TEXT messages from the WebSocket. These are command messages as described in Remote Control Commands. Note that some of these require responses to be sent back to the service.

ag_start_remotecontrol_loop requires a callback function pointer action_callback. This must implement the intended behaviour and/or return appropriate response data.

When implementing directly against the WebSocket API, each command is received as an individual TEXT message, in JSON form, as described in Remote Control Commands. Responses, if required, should be sent similarly as JSON formatted TEXT messages on the same WebSocket.

Maintaining connection

To ensure the device can receive commands at any time, it should automatically attempt to re-connect whenever the WebSocket connection is closed. This can happen for a variety of reasons, either automatically due to idle timeouts or due to temporary network connectivity issues.

To prevent idle timeouts closing the connection and potentially missing commands whilst re-connecting, we also advise that devices send a "ping" message to the service every 30 seconds.

Once started with ag_start_remotecontrol_loop the WebSocket connection will be re-connected automatically if it drops. Additionally "ping" messages will be sent automatically every 30 seconds to prevent idle timeouts.

When implementing directly against the WebSocket API, devices should re-connect in the same way as described above if the connection drops.

To implement the "ping" message, send a TEXT message on the WebSocket in JSON form as follows:

{"type":"ping"}

The service will respond with a TEXT message contining an empty JSON object which may be ignored:

{}

Revocation

To revoke remote access from all users/apps who have previously been granted access, simply delete the remotecontrol token, request and persist a new one for subsequent connections.

In the case of using libaudiogum, this can be done by calling ag_delete_remote_control_token and then cancelling and re-starting the remote control loop.

It is recommended that on factory reset a device should revoke remote control access, either by simply wiping the appropriate part of the file system or as described above.


Controller integration

Audiogum Remote Control service provides REST API features for controller apps and services to send commands to and receive state data from device clients.

To use these features requires user authentication as described on the authentication page and knowledge of the remotecontrol token of the target device.

APIPurpose
HEAD /v1/user/remotecontrolCheck whether a device is currently connected
GET /v1/user/remotecontrolGet details of connected device including capabilities
POST /v1/user/remotecontrol/commandsSend commands to a device
GET /v1/user/remotecontrol/state/{type}Get state data from a device

Check if a device is connected

Example request:

HEAD /v1/user/remotecontrol?remotecontroltoken=v1::X28+U9f98YDX4...
Authorization: Bearer {access_token}

Response if device connected:

204 No Content

or if not:

404 Not Found

Get details of connected device

Example request:

GET /v1/user/remotecontrol?remotecontroltoken=v1::X28+U9f98YDX4...
Authorization: Bearer {access_token}

Response:

200 OK
{
  "deviceid": "abc123xyz",
  "capabilities": ["play", "stop"]
}

Send a command to a device

Example request:

POST /v1/user/remotecontrol/commands
Authorization: Bearer {access_token}
{
  "remotecontroltoken": "v1::X28+U9f98YDX4...",
  "command": {
    "action": "stop"
  }
}

The command object represents the command message sent to the device via its WebSocket connection and should be formed as described on the Remote Control Commands page.

Response:

202 No Content

Get state from device

GET /v1/user/remotecontrol/state/player?remotecontroltoken=v1::X28+U9f98YDX4...
Authorization: Bearer {access_token}

Response:

200 OK
{
  "type": "player",
  "playstate": "playing",
  "source": "playable",
  "presetnumber": 1,
  "offset": 45,
  "playable": { ... },
  "item": { ... },
  "volume": {
    "value": 30,
    "mute": false
  }
}

The type part of the request (e.g. player) maps to a "get" command with an action such as getplayer sent to the device via its WebSocket connection as described on the Remote Control Commands page. The service waits for a response from the device before returning the result to the client.


Demo Web Player

We have a demo web player to help test integrations.