Reference16r1:Concept App Service Transcriptions: Difference between revisions
| Line 9: | Line 9: | ||
== Overview == | == Overview == | ||
The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It | The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It is located between client services that generate or capture audio data and the transcription backends that perform the actual speech-to-text processing. | ||
These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API. | These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API. | ||
Revision as of 11:34, 30 January 2026
Applies To
- innovaphone from version 16r1
Overview
The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It is located between client services that generate or capture audio data and the transcription backends that perform the actual speech-to-text processing.
These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API.
The Transcription Service itself is responsible for managing sessions, handling parallel requests, and coordinating the data flow between clients and the selected backend. The external ASR model performs the actual transcription.
The service is designed to work with OpenAI-compatible APIs, enabling clients to freely choose their preferred backend provider.
Licensing
In order to use the Transcriptions Service including the Transcriptions App the newly introduced UCC license is necessary.
Installation
Go to the Settings App (PBX manager) and open the "AP app installer" plugin. On the right panel, the App Store will be shown. Hint : if you access it for the first time, you will need to accept the "Terms of Use of the innovaphone App Store"
- In the search field located on the top right corner of the store, search for "Transcriptions" and click on it
- Select the proper firmware version, for example "Version 16r1" and click on install
- Tick "I accept the terms of use" and continue by clicking on the install yellow button
- Wait until the install has been finished
- Close and reopen the Settings App (PBX manager) again in order to refresh the list of the available colored AP plugin
- Click on the "AP transcriptions" and click on " + Add an App" and then on the "Transcriptions API" button.
- Enter a "Name" that is used as display name (all character allowed) for it and the "SIP" name that is the administrative field (no space, no capital letters). e.g : Name: Transcriptions API, SIP: transcriptions-api
- Choose a LLM (model) from the dropdown
- Tick the appropriate template to distribute the App (the app is needed at every user object from any user who wants to use the assistant API)
- Click OK to save the settings and a green check mark will be shown to inform you that the configuration is good
How it works
- A client service that requires transcription initiates the process by sending a transcription request via a WebSocket connection.
- The transcription service creates a session, assigns a transcription Id, and returns a dedicated HTTP endpoint to the client.
- The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.
- The transcription service forwards the received audio data to the configured ASR backend (for example, a Whisper-compatible API) over HTTP.
- The ASR backend performs the transcription and returns the result to the transcription service, which then forwards the outcome back to the client.
Transcriptions Flow Overview

Transcriptions - App Service
The App Service implements the API to a remote transcription server (e.g. whisper)
It can be configured in the Settings App (PBX Manager App) Reference16r1:Apps/PbxManager/App_myApps_Transcriptions
- The Remote service Url defines where requests are sent.
- API key required to access the selected backend and to authenticates with the remote provider.
- Model value defines which model the backend should use for transcription.
These parameters are stored as a configuration and are forwarded to the backend. At present, these values must be entered manually in the Settings plugin.
The service does not validate or confirm these values. It assumes that all the given values are correct and only uses them for communication with the backend. Since the users are able to chose their own providers, they are also responsible for selecting a fitting model and understanding the limitations of the models (such as supported Audio formats, size limits etc.)
Furthermore, Applications that require transcription functionality must explicitly consume the Transcription Service API and demonstrate this functionality in their own user interface, such as in Conference Transcriptions.
Transcriptions App
The service also provides a user interface where audio files can be uploaded and transcribed directly.
Once the transcription is complete, a simple summary can be generated and exported as a PDF (basic version).
Audio files can be selected using the Choose audio file button. The transcription process and its results are displayed on the same screen.
Neither the uploaded audio data nor the generated transcription text is stored by the service.

Troubleshooting
To troubleshoot this App Service, you need the traceflags App, Database, HTTP-Client in your App instance.
Limitations
- Limitations such as maximum audio size, supported languages, or handling multilingual audio mainly depend on the selected model and provider, and may vary based on the user’s provider choice.
- Differences in response structure can also occur. These responses are forwarded unchanged, since they may contain important metadata for the client, such as timestamps.
- The service does not validate the selected model. Choosing a suitable model is therefore the user’s responsibility.
- Transcriptions may contain misheard words or spelling inaccuracies, especially in cases of background noise or strong accents.