Reference16r1:Concept App Service Transcriptions: Difference between revisions

Revision as of 11:34, 30 January 2026

FIXME: This article is still work in progress

Applies To

innovaphone from version 16r1

Overview

The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It is located between client services that generate or capture audio data and the transcription backends that perform the actual speech-to-text processing.

These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API.

The Transcription Service itself is responsible for managing sessions, handling parallel requests, and coordinating the data flow between clients and the selected backend. The external ASR model performs the actual transcription.

The service is designed to work with OpenAI-compatible APIs, enabling clients to freely choose their preferred backend provider.

Licensing

In order to use the Transcriptions Service including the Transcriptions App the newly introduced UCC license is necessary.

Installation

Go to the Settings App (PBX manager) and open the "AP app installer" plugin. On the right panel, the App Store will be shown. Hint : if you access it for the first time, you will need to accept the "Terms of Use of the innovaphone App Store"

In the search field located on the top right corner of the store, search for "Transcriptions" and click on it
Select the proper firmware version, for example "Version 16r1" and click on install
Tick "I accept the terms of use" and continue by clicking on the install yellow button
Wait until the install has been finished
Close and reopen the Settings App (PBX manager) again in order to refresh the list of the available colored AP plugin
Click on the "AP transcriptions" and click on " + Add an App" and then on the "Transcriptions API" button.
Enter a "Name" that is used as display name (all character allowed) for it and the "SIP" name that is the administrative field (no space, no capital letters). e.g : Name: Transcriptions API, SIP: transcriptions-api
Choose a LLM (model) from the dropdown
Tick the appropriate template to distribute the App (the app is needed at every user object from any user who wants to use the assistant API)
Click OK to save the settings and a green check mark will be shown to inform you that the configuration is good

How it works

A client service that requires transcription initiates the process by sending a transcription request via a WebSocket connection.
The transcription service creates a session, assigns a transcription Id, and returns a dedicated HTTP endpoint to the client.
The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.
The transcription service forwards the received audio data to the configured ASR backend (for example, a Whisper-compatible API) over HTTP.
The ASR backend performs the transcription and returns the result to the transcription service, which then forwards the outcome back to the client.

Transcriptions Flow Overview

Transcriptions - App Service

The App Service implements the API to a remote transcription server (e.g. whisper)

It can be configured in the Settings App (PBX Manager App) Reference16r1:Apps/PbxManager/App_myApps_Transcriptions

The Remote service Url defines where requests are sent.

API key required to access the selected backend and to authenticates with the remote provider.

Model value defines which model the backend should use for transcription.

These parameters are stored as a configuration and are forwarded to the backend. At present, these values must be entered manually in the Settings plugin.

The service does not validate or confirm these values. It assumes that all the given values are correct and only uses them for communication with the backend. Since the users are able to chose their own providers, they are also responsible for selecting a fitting model and understanding the limitations of the models (such as supported Audio formats, size limits etc.)

Furthermore, Applications that require transcription functionality must explicitly consume the Transcription Service API and demonstrate this functionality in their own user interface, such as in Conference Transcriptions.

Transcriptions App

The service also provides a user interface where audio files can be uploaded and transcribed directly.

Once the transcription is complete, a simple summary can be generated and exported as a PDF (basic version).

Audio files can be selected using the Choose audio file button. The transcription process and its results are displayed on the same screen.

Neither the uploaded audio data nor the generated transcription text is stored by the service.

/ReferenceConceptTranscriptionsAppServiceTranscriptionsApp.png

Troubleshooting

To troubleshoot this App Service, you need the traceflags App, Database, HTTP-Client in your App instance.

Limitations

Limitations such as maximum audio size, supported languages, or handling multilingual audio mainly depend on the selected model and provider, and may vary based on the user’s provider choice.
Differences in response structure can also occur. These responses are forwarded unchanged, since they may contain important metadata for the client, such as timestamps.
The service does not validate the selected model. Choosing a suitable model is therefore the user’s responsibility.
Transcriptions may contain misheard words or spelling inaccuracies, especially in cases of background noise or strong accents.

@@ Line 9: / Line 9: @@
 == Overview ==
-The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It sis located between client services that generate or capture audio data and the transcription backends that perform the actual speech-to-text processing.
+The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It is located between client services that generate or capture audio data and the transcription backends that perform the actual speech-to-text processing.
 These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API.

Reference16r1:Concept App Service Transcriptions: Difference between revisions

Revision as of 11:34, 30 January 2026

Contents

Applies To

Overview

Licensing

Installation

How it works

Transcriptions Flow Overview

Transcriptions - App Service

Transcriptions App

Troubleshooting

Limitations

Related Articles

Navigation menu

Reference16r1:Concept App Service Transcriptions: Difference between revisions

Revision as of 11:34, 30 January 2026

Applies To

Overview

Licensing

Installation

How it works

Transcriptions Flow Overview

Transcriptions - App Service

Transcriptions App

Troubleshooting

Limitations

Related Articles

Navigation menu

Search