Start a connection
You can reach our API service by using the WebSocket Secure (WSS) protocol. The endpoint is:
wss://realtime.scriptix.io/v2/realtime
Authentication and configuration are provided via query parameters (see below).
Query parameters
| Parameter | Value | Description | Required |
|---|---|---|---|
| token | string | Scriptix Realtime API Token for authentication (recommended for browser/WebSocket clients) | Yes |
| language | string | Language identifier in standard ISO-639-1 for speech-to-text-session | No |
| type | string | Model type: fast (default) or quality |
No |
Example connection URL:
wss://realtime.scriptix.io/v2/realtime?token=your-api-token&language=en&type=fast
Request headers (Alternative for server-side clients)
Server-side WebSocket clients can optionally use headers instead of query parameters:
| Parameter | Value | Description |
|---|---|---|
| x-zoom-s2t-key | Scriptix Realtime API Token | API key of type real-time needed for authorization |
| x-api-key | Scriptix Realtime API Token | Alternative header name for API key |
| api-key | Scriptix Realtime API Token | Alternative header name for API key |
Note: Browser-based WebSocket connections cannot use custom headers, so the token query parameter is required for web applications.
Transcription results
Partial results
The first results sent after receiving audio data are partials. Partials contain the spoken text currently detected and may change as more audio is processed. The text grows incrementally with each update, replacing the previous partial.
{
"text": "hi how are",
"is_final": false,
"offset_ms": 1234,
"stability": 0.8
}
| Field | Type | Description |
|---|---|---|
| text | string | Growing text that builds incrementally. Replaces previous partial. |
| is_final | boolean | Always false for partial results |
| offset_ms | integer | Position in audio stream (milliseconds) for synchronization |
| stability | float | Confidence score between 0 and 1 |
Note: Partials are only sent when speech is detected. No results are sent during silence.
Final results
When the realtime engine is confident about a transcription segment, it sends a final result. Finals are emitted after approximately 15 words or after 30 seconds of accumulated audio.
{
"text": "hi how are you doing today",
"is_final": true,
"offset_ms": 1234,
"words": [
[" hi", 0, 200, 0.95],
[" how", 200, 400, 0.92],
[" are", 400, 600, 0.94],
[" you", 600, 800, 0.91],
[" doing", 800, 1000, 0.93],
[" today", 1000, 1300, 0.96]
]
}
| Field | Type | Description |
|---|---|---|
| text | string | Finalized transcription text that won't change |
| is_final | boolean | Always true for final results |
| offset_ms | integer | Position in audio stream (milliseconds) for synchronization |
| words | array | Word-level timestamps: [word, start_ms, end_ms, confidence] |
Example flow
Partial: {"text": "hi", "is_final": false, "offset_ms": 100, "stability": 0.6}
Partial: {"text": "hi how", "is_final": false, "offset_ms": 100, "stability": 0.7}
Partial: {"text": "hi how are", "is_final": false, "offset_ms": 100, "stability": 0.8}
Partial: {"text": "hi how are you", "is_final": false, "offset_ms": 100, "stability": 0.85}
... more partials ...
Final: {"text": "hi how are you doing today", "is_final": true, "offset_ms": 100, "words": [...]}
Partial: {"text": "I'm", "is_final": false, "offset_ms": 2500, "stability": 0.6} ← New segment starts
Partial: {"text": "I'm doing", "is_final": false, "offset_ms": 2500, "stability": 0.7}