Reference16r1:Concept App Service Transcriptions: Difference between revisions

From innovaphone wiki
Jump to navigation Jump to search
 
(26 intermediate revisions by 3 users not shown)
Line 4: Line 4:
== Applies To ==  
== Applies To ==  


* innovaphone from version 16r1
* innovaphone Transcription Service from version 16r1




== Overview ==
== Overview ==


The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. In terms of its concept, it is located between the client services that generate or capture audio data and the transcription backend that performs the actual speech-to-text processing.
These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API.


The Transcription Service itself is responsible for managing sessions, handling parallel requests, and coordinating the data flow between clients and the selected backend. The external ASR model performs the actual transcription.
The service is designed to work with OpenAI-compatible APIs, enabling clients to freely choose their preferred backend provider.


== Licensing ==
== Licensing ==


 
In order to use the Transcriptions Service including the Transcriptions App the newly introduced UCC license is necessary.
== Installation ==
== Installation ==
Go to the '''Settings App''' (PBX manager) and open the '''"AP app installer"''' plugin. On the right panel, the App Store will be shown. ''Hint : if you access it for the first time, you will need to accept the "Terms of Use of the innovaphone App Store"''
Go to the '''Settings App''' (formerly known as PBX manager) and open the '''"AP app installer"''' plugin. On the right panel, the App Store will be shown. ''Hint : if you access it for the first time, you will need to accept the "Terms of Use of the innovaphone App Store"''
* In the search field located on the top right corner of the store, search for '''"Transcriptions"''' and click on it
* In the search field located on the top right corner of the store, search for '''"Transcriptions"''' and click on it
* Select the proper firmware version, for example '''"Version 16r1"''' and click on install
* Select the proper firmware version, for example '''"Version 16r1"''' and click on install
* Tick "I accept the terms of use" and continue by clicking on the install yellow button
* Tick "I accept the terms of use" and continue by clicking on the install yellow button
* Wait until the install has been finished
* Wait until the install has been finished
* Close and reopen the '''Settings App''' (PBX manager) again in order to refresh the list of the available colored AP plugin
* Close and reopen the '''Settings App''' (PBX manager) in order to refresh the list of the available colored AP plugin icons.
* Click on the '''"AP transcriptions"''' and click on '''" + Add an App"''' and then on the '''"Transcriptions API"''' button.
* Open the '''"AP transcriptions"''' plugin by clicking on it and select '''" + Add an App"''' afterwards. Then click on the '''"Transcriptions API"''' button.
* Enter a '''"Name"''' that is used as display name ''(all character allowed)'' for it and the '''"SIP"''' name that is the administrative field ''(no space, no capital letters)''. ''e.g : Name: Transcriptions API, SIP: transcriptions-api''
* Enter a '''"Name"''' ''(used as display name / all character allowed)'' and the '''"SIP"''' ''(no space, no capital letters)''. ''e.g : Name: Transcriptions API, SIP: transcriptions-api''
* Choose a LLM (model) from the dropdown
* Enter the '''Remote service Url''' (Destination URL of the service provider)
* Enter the '''API Key''' (provided by the service provider)
* Enter the '''model name''' (must be supported by the configured provider)  
* Tick the appropriate template to distribute the App (the app is needed at every user object from any user who wants to use the assistant API)
* Tick the appropriate template to distribute the App (the app is needed at every user object from any user who wants to use the assistant API)
* Click OK to save the settings and a green check mark will be shown to inform you that the configuration is good
* Click OK to save the settings. A green check mark will be shown to confirm that the configuration is good.
 
== How it works ==
 
*A client service that requires transcription initiates the process by sending a transcription request via a WebSocket connection.
* The transcription service creates a session, assigns a transcription ID, and returns a dedicated HTTP endpoint to the client.
* The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.
* The transcription service forwards the received audio data to the configured ASR backend (for example, a Whisper-compatible API) over HTTP.
* The ASR backend performs the transcription and returns the result to the transcription service, which then forwards the outcome back to the client.
 
== Transcriptions Flow Overview ==
[[File:Screenshot 2026-01-29 161703.png|center|thumb|964x964px|alt=flow overview|Transciptions Flow Overview|Transciptions Flow Overview]]


== Transcriptions - App Service ==
== Transcriptions - App Service ==
The App Service performs tasks in the following areas:
The App Service implements an API to use a remote transcription provider (e.g. whisper).
* Implements the API to a remote transcription server (e.g. whisper)


It can be configured in the '''Settings App''' (PBX Manager App) [[Reference16r1:Apps/PbxManager/App_myApps_Transcriptions]]
It can be configured in the '''Settings App''' (PBX Manager App) [[Reference16r1:Apps/PbxManager/App_myApps_Transcriptions]]
* The '''Remote Service URL''' defines the target URL of the remote transcription provider.
* The '''API Key''' is required to access the selected backend and to authenticate with the remote provider.
* The '''Model''' defines which model the backend should use for transcription.
These parameters are stored as a configuration and are forwarded to the backend. At present, these values must be entered manually in the Settings plugin.
The service does not validate or confirm these values. It assumes that all the given values are correct and only uses them for communication with the backend. Since the users are able to chose their own providers, they are also responsible for selecting a fitting model and understanding the limitations of the models  (such as supported Audio formats, size limits, etc.).
Furthermore, applications that require transcription functionality must explicitly consume the Transcription Service API and demonstrate this functionality in their own user interface, such as applied in the Conference Transcriptions app.


== Transcriptions App ==
== Transcriptions App ==
Apart from offering the transcriptions API the service also offers the Transcriptions App. Here you may upload audio files and receive the transcript of it.
The service also provides a user interface where audio files can be uploaded and transcribed directly.


You may also create a simple summary for the transcript and print it as PDF (very simple version)
Once the transcription is complete, a simple summary can be generated and exported as a PDF (basic version).
[[File:ReferenceConceptTranscriptionsAppServiceTranscriptionsApp.png|thumb|Transcriptions App|/ReferenceConceptTranscriptionsAppServiceTranscriptionsApp.png]]
 
Audio files can be selected using the Choose audio file button. The transcription process and its results are displayed on the same screen.
 
Neither the uploaded audio data nor the generated transcription text is stored by the service.[[File:ReferenceConceptTranscriptionsAppServiceTranscriptionsApp.png|thumb|/ReferenceConceptTranscriptionsAppServiceTranscriptionsApp.png|Transcriptions App Review]]


== Troubleshooting ==
== Troubleshooting ==


To troubleshoot this App Service, you need the traceflags ''App'', ''Database'', ''HTTP-Client'' in your App instance.
For troubleshooting the Transcriptions App Service, you need to activate the traceflags ''App'', ''Database and'' ''HTTP-Client'' in your Transcription App instance.
 
== Limitations ==
 
* Limitations such as maximum audio size, supported languages, or handling multilingual audio mainly depend on the selected model and provider and may vary based on the user’s provider choice.
* Differences in response structure can also occur. These responses are forwarded unchanged, since they may contain important metadata for the client, such as timestamps.
* The service does not validate the selected model. Choosing a suitable model is therefore the user’s responsibility.
* Transcriptions may contain misheard words or spelling inaccuracies, especially in cases of background noise or strong accents.


== Related Articles ==
== Related Articles ==
* [https://sdk.innovaphone.com/16r1/web1/com.innovaphone.transcriptions/com.innovaphone.transcriptions.htm SDK Documentation - Transcriptions API]
* [https://sdk.innovaphone.com/16r1/web1/com.innovaphone.transcriptions/com.innovaphone.transcriptions.htm SDK Documentation - Transcriptions API]
* [[Reference16r1:Apps/PbxManager/App_myApps_Transcriptions]]
* [[Reference16r1:Apps/PbxManager/App_myApps_Transcriptions]]

Latest revision as of 15:16, 30 January 2026

FIXME: This article is still work in progress

Applies To

  • innovaphone Transcription Service from version 16r1


Overview

The Transcription Service converts audio input into text using automatic speech recognition (ASR) models. In terms of its concept, it is located between the client services that generate or capture audio data and the transcription backend that performs the actual speech-to-text processing. These transcription backends can run either as a locally hosted AI service within the same environment or as an external service accessed through a compatible API.

The Transcription Service itself is responsible for managing sessions, handling parallel requests, and coordinating the data flow between clients and the selected backend. The external ASR model performs the actual transcription.

The service is designed to work with OpenAI-compatible APIs, enabling clients to freely choose their preferred backend provider.

Licensing

In order to use the Transcriptions Service including the Transcriptions App the newly introduced UCC license is necessary.

Installation

Go to the Settings App (formerly known as PBX manager) and open the "AP app installer" plugin. On the right panel, the App Store will be shown. Hint : if you access it for the first time, you will need to accept the "Terms of Use of the innovaphone App Store"

  • In the search field located on the top right corner of the store, search for "Transcriptions" and click on it
  • Select the proper firmware version, for example "Version 16r1" and click on install
  • Tick "I accept the terms of use" and continue by clicking on the install yellow button
  • Wait until the install has been finished
  • Close and reopen the Settings App (PBX manager) in order to refresh the list of the available colored AP plugin icons.
  • Open the "AP transcriptions" plugin by clicking on it and select " + Add an App" afterwards. Then click on the "Transcriptions API" button.
  • Enter a "Name" (used as display name / all character allowed) and the "SIP" (no space, no capital letters). e.g : Name: Transcriptions API, SIP: transcriptions-api
  • Enter the Remote service Url (Destination URL of the service provider)
  • Enter the API Key (provided by the service provider)
  • Enter the model name (must be supported by the configured provider)
  • Tick the appropriate template to distribute the App (the app is needed at every user object from any user who wants to use the assistant API)
  • Click OK to save the settings. A green check mark will be shown to confirm that the configuration is good.

How it works

  • A client service that requires transcription initiates the process by sending a transcription request via a WebSocket connection.
  • The transcription service creates a session, assigns a transcription ID, and returns a dedicated HTTP endpoint to the client.
  • The client uploads the audio data to the transcription service using HTTP connections, targeting the provided endpoint.
  • The transcription service forwards the received audio data to the configured ASR backend (for example, a Whisper-compatible API) over HTTP.
  • The ASR backend performs the transcription and returns the result to the transcription service, which then forwards the outcome back to the client.

Transcriptions Flow Overview

flow overview
Transciptions Flow Overview

Transcriptions - App Service

The App Service implements an API to use a remote transcription provider (e.g. whisper).

It can be configured in the Settings App (PBX Manager App) Reference16r1:Apps/PbxManager/App_myApps_Transcriptions

  • The Remote Service URL defines the target URL of the remote transcription provider.
  • The API Key is required to access the selected backend and to authenticate with the remote provider.
  • The Model defines which model the backend should use for transcription.

These parameters are stored as a configuration and are forwarded to the backend. At present, these values must be entered manually in the Settings plugin.

The service does not validate or confirm these values. It assumes that all the given values are correct and only uses them for communication with the backend. Since the users are able to chose their own providers, they are also responsible for selecting a fitting model and understanding the limitations of the models  (such as supported Audio formats, size limits, etc.).

Furthermore, applications that require transcription functionality must explicitly consume the Transcription Service API and demonstrate this functionality in their own user interface, such as applied in the Conference Transcriptions app.

Transcriptions App

The service also provides a user interface where audio files can be uploaded and transcribed directly.

Once the transcription is complete, a simple summary can be generated and exported as a PDF (basic version).

Audio files can be selected using the Choose audio file button. The transcription process and its results are displayed on the same screen.

Neither the uploaded audio data nor the generated transcription text is stored by the service.

Transcriptions App Review

Troubleshooting

For troubleshooting the Transcriptions App Service, you need to activate the traceflags App, Database and HTTP-Client in your Transcription App instance.

Limitations

  • Limitations such as maximum audio size, supported languages, or handling multilingual audio mainly depend on the selected model and provider and may vary based on the user’s provider choice.
  • Differences in response structure can also occur. These responses are forwarded unchanged, since they may contain important metadata for the client, such as timestamps.
  • The service does not validate the selected model. Choosing a suitable model is therefore the user’s responsibility.
  • Transcriptions may contain misheard words or spelling inaccuracies, especially in cases of background noise or strong accents.

Related Articles