Skip to content

Messages

All control messages are exchanges in JSON format, with exception of the audio stream; the following control messages are supported.

Client Messages

The following messages can be sent to the server

Action Message Description
Start Session {"action": "start"} Connection with the Speech to Text engine is initialised. If the connection is successful and the websocket is ready to process data you will receive back a "state: listening" message.
Stop Session {"action": "stop"} Connection with the Speech to Text engine is stopped. The service will return the remaining blob that has to be processed. Once the connection with the Speech to Text engine is closed you will receive a "state: stopped" message. It is not possible to start a new session once you've stopped it.
Send audio data <binary> Binary data formatted in PCM WAVE Mono 16khz

Server messages

The following messages can be received from the server

Status changes

Action Message Description
Server listenining {"state": "listening"} Connection with the Speech to Text engine is initialised. If the connection is successful and the websocket is ready to process data, you will receive back a "state: listening" message. After receiving this message, you are able to send audio.
Server stopped listening {"state": "stopped"} Connection with the Speech to Text engine is stopped. The service will return the remaining blob that has to be processed. Once the connection with the Speech to Text engine is closed you will receive a "state: stopped" message. It is not possible to start a new session once you've stopped it.
{"state": "shutting_down", "at": 1234567890} The real-time engine will throw this message to let you know it will shut down at a specific time in the future. This message is sent one hour before shutting down and will give you enough time (one hour by default) to either finish

Transcription results

Partial results

Partial result The first result sent after receiving a binary blob is a partial. The partial contains the spoken text currently detected and may be subject to change.

{
    "partial": "Je hoort natuurlijk zeker”
}

Full result

When the real-time engine is confident about a full result it will send it in the following format:

{
    "result": [
        [ "Je", 12046, 12286, 1 ],
        [ "hoort", 12286, 12526, 1 ],
        [ "natuurlijk", 12526, 12796, 1 ],
        [ "zeker", 12796, 13096, 1 ],
        [ "in", 13096, 13156, 1 ],
        [ "’s-gravenhage", 13156, 13666, 1 ],
        [ "verhalen", 13666, 14055, 0.999713 ],
    ],
    "text": "Je hoort natuurlijk zeker in ‘s-gravenhage verhalen"
}

The result is an object with two keys, the result and text. The result consists of the words spoken with metadata. The metadata is formatted as following:

     [ "word", time_start, time_stop, confidence ]

The time_start and time_stop are timepoints in the amount audio processed (including silence). This does not correspond with the actual session duration. This makes it possible to send audio up to twice the real-time speed. The confidence is a float corresponding to a percentage of confidence the result is accurate. The text key contains a full representation of all words detected

Error messages

Message Description
{"error": "Session not started"} The client sent binary data (audio stream) but did not start a session yet. Data will not be processed.
{"error": "backend Client tried to start a new session while there is already listening"} already a backend session running
{"error": "restarting of sessions is not supported"} Client tried to start a new session after finishing a previous one. This is not supported, disconnect after finishing the session.
{"error": "unable to start backend"} The real-time engine was unable to connect to a backend system. This could be due to multiple reasons not defined further. Contact support if the problem persists.
{"error": "engine_not_responding"} This error is raised after a request for a session in the backend system is not responding. Contact support if this problem persists.