Howto:Creating fine announcements and music on hold

From innovaphone-wiki

Jump to: navigation, search

The innovaphone PBX allows playing audio files in Waiting Queues, Music on Hold and trough XML commands and so a bunch of “audio-scenarios” are possible; from simple once like customized MOH up to complex setups like small IVR’s.

But often the sound quality is very poor when you call: music on hold with ugly sound, self-made IVR with nearly incomprehensible messages does not really give “additional value” to your customer or improve the image.

A pity because you are in front of a marvellous system which is able to do work perfect if the input is done well. So if you have “experience” with embarrassing voice prompts or worst music reproduction maybe it could help reading this article. If you just plan to offer audio services to your customer; read it too. If you are not sure about audio: read it!


Applies To

This information applies to

innovaphone PBX

More Information

Remember that if the impression for a caller is disastrous the result will be de-marketing for your customer. So it could be better use tones and standard MOH.

One fundamental point is having good audio source files for all “audio services”. The best result is achieved using a professional speaker in a recording studio. If you have this type of quality in mind for your solution we are very happy, contact a recording studio and the result for the source will be perfect.

But many customers want play a “house made” audio file, a quick and in most cases simple solution without any complication and delays trading with recording studios and stuff like that. Audio is available on each PC and so click and go solutions are frequently used.

This article gives you some hints what to observe in producing reasonable good sounding “handmade” audio files.

So here is how to simulate a professional appearance.


innovaphone is operating in the switching market and is not an “audio company”. This article tries just to help people who have even minor experience then us in creating audio files for telephony use (definitely a lot of people …). So do not expect from us the ultimate audio hint, anyway internet is full of that stuff. And if you are a PC freak some statements and tips are obsolete and you can and should use your own professional or alternative tools.

All the software, free- and shareware we mentioned in this article is just to inform you: not to invite you to use it, we do not recommend any application. Inform yourself also always about the copyrights and relatives items. Remember that music is a “protected good” and you have to observe authors rights and generally the relative regulations in your country. In some countries playing any music on a “public” trunk line means also the obligation to pay charges. Note that even all the samples you find here are just samples for training purpose and not for production use. All trademarks and companies named in this article are used just as example. innovaphone will not respond of them nor in the usage of those products.

Problem Details

The production of your audio application in an innovaphone PBX can be splitted in the following steps:

  1. Creating the audio file
  2. Normalization of the audio source
  3. Compiling the audio source in the telephone formats
  4. Copy of the telephone format files in the destination directory
  5. Setup in your innovaphone PBX.

This last step will not be described in this article; you will find a lot of articles about that in this wiki, just follow the links at the end of this article.

1. Creating the audio file

Voice files

The simplest way to produce telephony voice prompts is to use the telephone set, for example using this xml: Howto:Simple Message Management.

In this sample the recording is done using a telephone set, nice for self made prompts, but you will always hear that the recording is done using a phone set. No edit using standard software is possible.

Another way to producing Voice Prompts is using text-to-speech software (TTS). For example Acapela has a nice interface and product; you can try a demo on

The disadvantage using computer generated prompts is that you hear immediately the “computer voice”. On the other hand it is much better than many self-made registrations. So we can say that it is not an optimal solution but you have at least “constantly quality” which is better in many cases. TTS solves also the problem if you don’t have a native speaker.

Better results are possible recording the announcement with a microphone directly on your PC (that is also the option people like). You can do this using an application or just the audio recorder tool in the folder accessories of your XP or Vista/Win7 operating system. Using Laptop build in microphones is not recommended, better are external microphones. The cheapest are the build in one in the headset for gamer. The advantage of those headsets with microphone is that you can use them also for the sound check, you will hear more details with a headset then using external speaker. The headsets for softphones will also work fine. If you record the prompts with a microphone, observe the following hints:

  • Female speaker are better than male because generally female voices have better sound quality ad are more understandable on bad connections.
  • Write down the text for the speaker and print it out. Do that even if the sentence is apparently simple and short.
  • If you have more prompts try to record them all in one single session. In the next session, after an hour or day the voice will change and your setup will never be the same. If it is not possible recording all prompts in one single session and you have to record an announcement in a second time try at least to reproduce the setup (distance of the microphone from the mouth, mixer setup and similar). You will hear anyway small differences, not nice if you compose a sentence with single prompts or reproduce one message after another, a minor problem if just one new text is prompted.
  • Even if you have to produce single prompts it is better record all of them in one session (or at least a bunch of them) and then creates single prompts in the post production.
  • Drink a gulp of water before recording. Lips become so moist and you have less “pop” at the beginning of a word.
  • Record all two or more times, at the end select the best version.
  • Record in the best quality you can, usually the recorder as default will yet have such a setup (typically 44khz, 16 bit stereo), don’t worry at this step about the telephony limitations. After recording in stereo build a mono version maintaining the high sample rate and start post-production with this file. With a good source file you will better recognize small imperfections and can correct them easily.

Now follows the most important step for voice prompts: post-production. The raw audio file will have a lot of imperfections (no studio, no professional speaker and no professional equipment) which will be noticed even on the phone. But we can correct most of them:

Pop: when a speaker starts talking the opening of the lips causes a “pop” and this will be recorded from a microphone. To limit this effect the speaker should drink water before talking. Have you seen some picture or video of recording studios where people is singing and there is a strange round filter like a disk in front of the microphone?


This is to limit this effect. But our €30 microphone and amateur speaker can practically not avoid this effect and you have to cancel this pop manually.


In the picture you can see two typical “pop” produced just opening the mouth. You can replace it easily with silent.

Breath: In a longer sentence people will breathe in the middle and always between sentences. Normally we are not aware about those things, but in a recording and on the phone you hear that perfectly. Professional speaker won’t do that; they have done a special theatre and speaking training and control breathing perfectly. So do not stress you speaker but correct it manually: replace those noises with silence.

Silence: during a sentence and between words are silence periods. The registered “silence” is never a real “digital” silence but, let’s say a “background noise” (in best case).


In the picture you can see the end of a sentence and then the recorded silence. In the following screen shot the “digital” silence was inserted manually:


Even at beginning and end of a record should be a small silence period. Even here the only way to achieve a good result is a manual correction.

For post-productions you will need a good sound editing software witch should be able at least to convert format, edit audio, generate silence and adjust level. A fine product which do that all is cool edit (, (not free, but there is a trail version available, the license cost just about $30). Of course you can also use any other sound editing software. If you play the recorded file with those tools you will immediately see (and hear) the pop and the breath and the non-silence and you can replace it simply with digital silence. Insert also silence at the beginning (remember to place always a header of at least 100 ms at the beginning of a prompt) and at the end of the record (also at least 100 ms). If you do not observe this the file will start with a “click” or if repeated in a WQ produce a “click”.

If you replace sound with silence (and generally doing editing) observe always the zero level point. In cool edit this feature is called “Zero cross adjust”, pressing F4 the position of the cursor will be at a zero level cross point: if you insert or delete sound it is a good idea to do that when the amplitude is zero:


In the picture you see the first point for a good cutting and the second one for a bat cutting. The result of bad cutting is a “click” in the reproduction. So remember that you have to edit sound always in a zero cross point of the wave. Observing all that now we have a raw audio file without pop, no breath, perfect silence, a zero level header and zero level ending and no “surprise” levels in-between because you did a zero cutting. The first step to perfect audio reproduction is done.

Music files

Music files are available in many formats, MP3 for example is much popular. First of all: bring the source in a .wav format (Wave file). You can do that reading a music file in a audio software and then convert and save it in wave format. Of course the audio software has to support that. But what if there is no “source file” but just a web-player or a movie?

An easy way to get a wave file from any source is recording directly from the audio port of the PC using a sound grabber. In this case your audio editor has not to understand the codec or “format” or have access to the original audio file: if the PC reproduces the sound you can grab it. The sample in this article is done grabbing a video.

A nice sound grabber is

In Vista and generally on PC with a build in microphone check first that the microphone is disabled, otherwise you will record trough this micro the sound reproduced from the speaker of the PC and also you talking and the result is poor. If the microphone is disabled the grabber will check that and record directly from the line card (you have to disable all microphones otherwise the grabber presume that you will just have mute the recording). Level can be controlled using the build in mixer of the OS. Of course all those tools do also many other things, but the inline recording is the one we are interested in to get the wave files.

2. Normalization of the audio source

The audio files can be reproduced perfectly on the PC with a player (audio player, real player and similar). Just double-click the file and reproduction starts. Now you have to convert the wave file to the requested format (8 kHz, 16 bit mono). File format conversion is simple and can be done using an audio editor (in cool edit “adjust sample rate”). You must also adjust the volume, and that is a more complex problem which we should analyse and understand well. In telecommunications volume is standardizes issue. The volume must be standard in the whole chain of the voice transmission and therefore all telecom equipment “produce” and process voice with a standard volume. So we have to respect a kind of worldwide audio setup. If you try to submit a “louder” signal distortion will be the result. This distortion could be created in the first component (your telephone), in the last one (the phone of the far partner) or in any other point of the path (which includes even hundreds of devices) if maximum levels are not respected. The “PC world” works similar but….the levels are different. The effect is that if you reproduce a sound in the PBX which is o.k. on the PC the result will be distortion meanwhile if you reproduce on a PC a file recorded from the PBX (for example a Voice-Mail) it will be very quiet (and you have to adjust the PC-Volume). Remember that the innovaphone PBX is just doing streaming of voice data and performs no volume adjustment.

So volume of the audio file becomes important and you must limited it before adjusting the format. How much? Now let say it in this way: at least 80%. So if you have 100% = Red = Overflow on a PC just take 20% as 100%. In some cases even 85% reduction is fine, but you have to note that too silent produce another problem; the reproduction will have drop-outs (for s second or two the volume drops down, all in all a unstable reproduction).

If you download the samples you can hear what this all means in reality: One file is the “normal” PC level, the second one the “20% version”. When you play the 20% version on your PC you will find it very silent. You will find also the .g711 version of the 20% file to check it on your PBX. Remember that even this level is considered „loud” and on the limit!


The picture shows the sound level: The first is the original Wave size, perfect reproduction on a PC, no way on a PBX. The second one is the necessary level for telephony use.

A sound considered “acceptable loud” in the testing between IP-Phones could be “too loud” for an external caller and even cause small distortions. This happens very often, because people test volume with a phone set in hands free mode. The reproduction is fine, but if you call from outside the reproduction is distorted. Why? On-net in the PBX all parameters are perfect, and you have innovaphone devices ho can handle audio data very well. But if you go to external Carrier it is easy to find just one track with a slightly sensitive amplification and the distortion is done. Especially in GSM connections, where even other codec works, reproducing is more sensitive. To avoid that remember: keep your volume down; it is better to quite then to loud! Maybe it is not so a nice effect in demonstration with your customer or boss (because reproductions is quiet in hands free mode) but it works prefect with the receiver and, and this is the most important thing, from external. A far party expects a normal sound level even with MOH and so all sounds should be loud as “normal” talking.

Reproduction of music in G711 is much better than in g729 (which is a coder optimized for human speaking and absolutely not for music reproduction). Sometimes externals calls have jet at the origin a bad quality and if you complicate the situation using a high compression codec like G729 the reproduction of audio files becomes bad, so if possible use G711. This is important especially for IVR systems where you have at least to understand what the options are.

Typically there is even no reason to use G729 if a call came in and ends in the Waiting Queue. The call is “in the box” (or at least go from and to the switch) but not in the IP-network of the customer, so no bandwidth problem. Then, when talking with a phone set, you can switch again on G729.

3. Compiling the audio source in the telephone formats

This method is deprecated - rather use the innovaphone audio converter located at

Telephony codec are basically designed to transport voice and not music. So for a codec it is easy to transport understandable voice, but music is much more complex. More music is “synthetic” and more easily it can be converted it because the wave forms are simpler.

Our sample is a quite hard job for a codec, female singer with drums, natural bass, guitar and piano.

There are two ways to convert the wave file in g7xx files, using the DOS-application Softcod or using the compiling possibility of the innovaphone PBX: Using softcod.exe is jet described in a wiki article

Convert wave files to G7xxx with softcod

Both possibilities are described in

How to convert wave files in to G7xx coder files for the_HTTP_interface

Please note that there is definitive difference in quality between the two methods (try to believe) if we are talking about music. Softcod is a PC software and the coding is done in the PC. When you copy a file on the drive Cx0 the coding from wave to G711 is done using the innovaphone chipset.

So if quality is interesting use the innovaphone PBX for compiling, the result is crystal clear sound without distortion! If you have problem with the CF drive read the next chapter.

4. Copy of the telephone format files in the destination directory

Unfortunately Microsoft has a problem mapping correctly network drives like the CF. In Vista for example it is nearly impossible (done once an error in setup you will never reach again the drive), in XP it is not possible access to the drive if you PBX use a port indication and similar. Forget that stuff, this very good shareware solves that problem as described in

Webdav access to /drive/cf0 slow.

Datafreeway (DataFreeway) gives you fast and perfect access and all necessary operations are possible immediately. Anyway drag and drop operation between explorer and Datafreeway is not supported, so you have to copy and paste the wave file “normally”. Remember that the drive has to be “cx0” and not “cf0” if you want an automatic compiling of the wave files. Compiling of wave files require time and causes heavy processor load, so do that if possible in a calm period (or on a separate box).

Now setup you innovaphone PBX and check the audio quality with IP-Phones, but also calling with a GSM Phone from outside. Remember that from there the customer calls comes in, internal hold is not the goal. If both checks are o.k. you’ve done a professional job!

Related Articles

Reference:Administration/PBX/Objects/Waiting Queue

Installing the voicemail/music on hold on a compact flash card

Howto:Moh and waiting queue files for testing

Personal tools