Reference16r1:Concept App Service Transcriptions: Difference between revisions

From innovaphone wiki
Jump to navigation Jump to search
Line 35: Line 35:
== Transciptions Flow Overview ==
== Transciptions Flow Overview ==


* [[File:Diagramm.png|thumb|957x957px|/Diagramm.png|center]]A client service that requires transcription initiates the process by sending a t ranscription request via a WebSocket connection.
* [[File:Diagramm.png|765x765px|/Diagramm.png|/Diagramm.png|border|right|frameless]]A client service that requires transcription initiates the process by sending a t ranscription request via a WebSocket connection.
* The transcription service creates a session, assigns a transcription Id, and returns a dedicated HTTP endpoint to the client.
* The transcription service creates a session, assigns a transcription Id, and returns a dedicated HTTP endpoint to the client.
* The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.
* The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.

Revision as of 10:34, 30 January 2026

FIXME: This article is still work in progress

Applies To

  • innovaphone from version 16r1


Overview

The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. It sis located between client services that generate or capture audio data and the transcription backends that perform the actual speech-to-text processing.

These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API.

The Transcription Service itself is responsible for managing sessions, handling parallel requests, and coordinating the data flow between clients and the selected backend. The external ASR model performs the actual transcription.

The service is designed to work with OpenAI-compatible APIs, enabling clients to freely choose their preferred backend provider.

Licensing

In order to use the Transcriptions Service including the Transcriptions App the newly introduced UCC license is necessary.

Installation

Go to the Settings App (PBX manager) and open the "AP app installer" plugin. On the right panel, the App Store will be shown. Hint : if you access it for the first time, you will need to accept the "Terms of Use of the innovaphone App Store"

  • In the search field located on the top right corner of the store, search for "Transcriptions" and click on it
  • Select the proper firmware version, for example "Version 16r1" and click on install
  • Tick "I accept the terms of use" and continue by clicking on the install yellow button
  • Wait until the install has been finished
  • Close and reopen the Settings App (PBX manager) again in order to refresh the list of the available colored AP plugin
  • Click on the "AP transcriptions" and click on " + Add an App" and then on the "Transcriptions API" button.
  • Enter a "Name" that is used as display name (all character allowed) for it and the "SIP" name that is the administrative field (no space, no capital letters). e.g : Name: Transcriptions API, SIP: transcriptions-api
  • Choose a LLM (model) from the dropdown
  • Tick the appropriate template to distribute the App (the app is needed at every user object from any user who wants to use the assistant API)
  • Click OK to save the settings and a green check mark will be shown to inform you that the configuration is good

Transciptions Flow Overview

  • /Diagramm.png
    /Diagramm.png
    A client service that requires transcription initiates the process by sending a t ranscription request via a WebSocket connection.
  • The transcription service creates a session, assigns a transcription Id, and returns a dedicated HTTP endpoint to the client.
  • The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.
  • The transcription service forwards the received audio data to the configured ASR backend (for example, a Whisper-compatible API) over HTTP.
  • The ASR backend performs the transcription and returns the result to the transcription service, which then forwards the outcome back to the client.

Transcriptions - App Service

The App Service performs tasks in the following areas:

  • Implements the API to a remote transcription server (e.g. whisper)

It can be configured in the Settings App (PBX Manager App) Reference16r1:Apps/PbxManager/App_myApps_Transcriptions

Transcriptions App

The service also provides a user interface where audio files can be uploaded and transcribed directly.

Once the transcription is complete, a simple summary can be generated and exported as a PDF (basic version).

Audio files can be selected using the Choose audio file button. The transcription process and its results are displayed on the same screen.

Neither the uploaded audio data nor the generated transcription text is stored by the service.

/ReferenceConceptTranscriptionsAppServiceTranscriptionsApp.png

Troubleshooting

To troubleshoot this App Service, you need the traceflags App, Database, HTTP-Client in your App instance.

Limitations

  • Limitations such as maximum audio size, supported languages, or handling multilingual audio mainly depend on the selected model and provider, and may vary based on the user’s provider choice.
  • Differences in response structure can also occur. These responses are forwarded unchanged, since they may contain important metadata for the client, such as timestamps.
  • The service does not validate the selected model. Choosing a suitable model is therefore the user’s responsibility.
  • Transcriptions may contain misheard words or spelling inaccuracies, especially in cases of background noise or strong accents.

Related Articles