Howto:Creating fine announcements and music on hold

Applies To

This information applies to

innovaphone PBX

More Information

The innovaphone PBX allows playing audio files in Waiting Queues, Music on Hold and trough XML commands and so a bunch of “audio-scenarios” are possible; from simple once like customized MOH up to complex setups like small IVR’s. But often the sound quality is very poor when you call: music on hold with ugly sound, self-made IVR with nearly incomprensive messages does not really give “additional value” to your customer or improve the image. A pity because you are in front of a marvelous system which is able to do work perfect if the input is done well. So if you have “experience” with embarrassing voice prompts or worst music reproduction maybe it could help reading this article. If you just plan to offer audio services to your customer; read it too. If you are not sure about audio: read it! Remember that if the impression for a caller is disastrous the result will be de-marketing for your customer. So it could be better use tones and standard MOH.

One fundamental point is having good audio source files for all “audio services”. The best result is achieved using a professional speaker in a recording studio. If you have this type of quality in mind for your solution we are very happy, contact a recording studio and the result for the source will be perfect. But many customers want play a “house made” audio file, a quick and in most cases simple solution without any complication and delays trading with recording studios and stuff like that. Audio is available on each PC and so click and go solutions are frequently used. This article gives you some hints what to observe in producing reasonable good sounding “handmade” audio files. So here is how to simulate a professional appearance.

Important

innovaphone is operating in the switching market and is not an “audio company”. This article tries just to help people who have even minor experience then us in creating audio files for telephony use (definitely a lot of people …). So do not expect from us the ultimate audio hint, anyway internet is full of that stuff. And if you are a PC freak some statements and tips are obsolete and you can and should use your own professional or alternative tools.

All the software, free- and shareware we mentioned in this article is just to inform you: not to invite you to use it, we do not recommend any application. Inform yourself also always about the copyrights and relatives items. Remember that music is a “protected good” and you have to observe authors rights and generally the relative regulations in your country. In some countries playing any music on a “public” trunk line means also the obligation to pay charges. Note that even all the samples you find here are just samples for training purpose and not for production use. All trademarks and companies named in this article are used just as example. Innovaphone will not respond of them nor of the usage of those products.

Problem Details

The production of your audio application in an innovaphone PBX can be spitted in the following steps:

one Creating the audio file
two Normalization of the audio source
three Compiling the audio source in the telephone formats
four Copy of the telephone format files in the destination directory
five Setup in your innovaphone PBX.

This last step will not be described in this article; you will find a lot of articles about that in this wiki, just follow the links at the end of this article.

1. Creating the audio file

Voice files

The simplest way to produce telephony voice prompts is to use the telephone set, for example using this xml: Howto:Simple Message Management In this sample the recording is done using a telephone set, nice for self made prompts, but you will always hear that the recording is done using a phone set. No editing using standard software is possible. Another way to producing Voice Prompts is using text-to-speech software (TTS). For example Acapela has a nice interface and product; you can try a demo on http://www.acapela-group.com/text-to-speech-interactive-demo.html The disadvantage using computer generated prompts is that you hear immediately the “computer voice”. On the other hand it is much better than many self-made registrations. So we can say that it is not an optimal solution but you have at least “constantly quality” which is better in many cases. TTS solves also the problem if you don’t have a native speaker. Better results are possible recording the announcement with a microphone directly on your PC (that is also the option people like). You can do this using an application or just the audio recorder tool in the folder accessories of your XP or Vista/Win7 operating system. Using Laptop build in microphones is not recommended, better are external microphones. The cheapest are the build in one in the headset for gamer. The advantage of those headsets with microphone is that you can use them also for the sound check, you will hear more details with a headset then using external speaker. The headsets for softphones will also work fine. If you record the prompts with a microphone, observe the following hints:

Female speaker are better than male because generally female voices have better sound quality ad are more understandable on bad connections.
Write down the text for the speaker and print it out. Do that even if the sentence is apparently simple and short.
If you have more prompts try to record them all in one single session. In the next session, after an hour or day the voice will change and your setup will never be the same. If it is not possible recording all prompts in one single session and you have to record an announcement in a second time try at least to reproduce the setup (distance of the microphone from the mouth, mixer setup and similar). You will hear anyway small differences, not nice if you compose a sentence with single prompts or reproduce one message after another, a minor problem if just one new text is prompted.
Even if you have to produce single prompts it is better record all of them in one session (or at least a bunch of them) and then creates single prompts in the post production.
Drink a gulp of water before recording. Lips become so moist and you have less “pop” at the beginning of a word.
Record all two or more times, at the end select the best version.
Record in the best quality you can, usually the recorder as default will yet have such a setup (typically 44khz, 16 bit stereo), don’t worry at this step about the telephony limitations. After recording in stereo build a mono version maintaining the high sample rate and start postproduction with this file. With a good source file you will better recognize small imperfections and can correct them easily.

Now follows the most important step for voice prompts: post-production. The raw audio file will have a lot of imperfections (no studio, no professional speaker and no professional equipment) which will be hearable even on the phone. But we can correct most of them: Pop: when a speaker starts talking the opening of the lips causes a “pop” and this will be recorded perfectly from a microphone. To limit this effect the speaker should drink water before talking. Have you seen some picture or video of recording studios where people is singing and there is a strange round filter like a disk in front of the microphone?

This is to limit this effect. But our €30 microphone and amateur speaker can practically not avoid this effect and you have to cancel this pop manually.

In the picture you can see two typical “pop” produced just opening the mouth. You can replace it easily with silent. Breath: In a longer sentence people will breathe in the middle and always between sentences. Normally we are not aware about those things, but in a recording and on the phone you hear that perfectly. Professional speaker won’t do that; they have done a special theater and speaking training and control breathing perfectly. So do not stress you speaker but correct it manually: replace those noises with silence. Silence: during a sentence and between words are silence periods. The registered “silence” is never a real “digital” silence but, let’s say a “background noise” (in best case). Even at beginning and end of a record should be a small silence period. Even here the only way to achieve a good result is a manual correction.

In the picture you can see the end of a sentence and then the recordet silence. In the following screen shot the “digital” silence was inserted manually:

For postproductions you will need a good sound editing software witch should be able at least to convert format, edit audio, generate silence and adjust level. A fine product which do that all is cool edit (http://www.coolrecordedit.com/), (not free, but there is a trail version available, the license cost just about $30). Of course you can also use any other sound editing software. If you play the recorded file with those tools you will immediately see (and hear) the pop and the breath and the non-silence and you can replace it simply with digital silence. Insert also silence at the beginning (remember to place always a header of at least 100 ms at the beginning of a prompt) and at the end of the record (also at least 100 ms). If you do not observe this the file will start with a “click” or if repeated in a WQ produce a “click”. If you replace sound with silence (and generally doing editing) observe always the zero level point. In cool edit this feature is called “Zero cross adjust”, pressing F4 the position of the cursor will be at a zero level cross point: if you insert or delete sound it is a good idea to do that when the amplitude is zero:

In the picture you see the first point for a good cutting and the second one for a bat cutting. The result of bad cutting is a “click” in the reproduction. So remember that you have to edit sound always in a zero cross point of the wave. Observing all that now we have a raw audio file without pop, no breath, perfect silence, a zero level header and zero level ending and no “surprise” levels in-between because you did a zero cutting. The first step to perfect audio reproduction is done.

Music files

Music files are available in many formats, MP3 for example is much popular. First of all: bring the source in a .wav format (Wave file). You can do that reading a music file in a audio software and then convert and save it in wave format. Of course the audio software has to support that. But what if there is no “source file” but just a web-player or a movie? An easy way to get a wave file from any source is recording directly from the audio port of the PC using a sound grabber. In this case your audio editor has not to understand the codec or “format” or have access to the original audio file: if the PC reproduces the sound you can grab it. The sample in this article is done grabbing a video. A nice sound grabber is http://www.audiograbber.com-us.net/ In Vista and generally on PC with a build in microphone check first that the microphone is disabled, otherwise you will record trough this micro the sound reproduced from the speaker of the PC and also you talking and the result is poor. If the microphone is disabled the grabber will check that and record directly from the line card (you have to disable all microphones otherwise the grabber presume that you will just have mute the recording). Level can be controlled using the build in mixer of the OS. Of course all those tools do also many other things, but the inline recording is the once we are interested to get the wave files.

2. Normalization of the audio source

The audio files can be reproduced perfectly on the PC with a player (audio player, real player and similar). Just double-click the file and reproduction starts. Now you have to convert the wave file to the requested format (8 kHz, 16 bit mono). File format conversion is simple and can be done using an audio editor (in cool edit “adjust sample rate”). You must also adjust the volume, and that is a more complex problem which we should analyze and understand well. In telecommunications volume is standardizes issue. The volume must be standard in the whole chain of the voice transmission and therefore all telecom equipment “produce” and process voice with a standard volume. So we have to respect a kind of worldwide audio setup. If you try to submit a “louder” signal distortion will be the result. This distortion could be created in the first component (your telephone), in the last one (the phone of the far partner) or in any other point of the path (which includes even hundreds of devices) if maximum levels are not respected. The “PC world” works similar but….the levels are different. The effect is that if you reproduce a sound in the PBX which is o.k. on the PC the result will be distortion meanwhile if you reproduce on a PC a file recorded from the PBX (for example a Voice-Mail) it will be very quiet (and you have to adjust the PC-Volume). Remember that the innovaphone PBX is just doing streaming of voice data and performs no volume adjustment. So volume of the audio file becomes important and you must limited it before adjusting the format. How much? Now let say it in this way: at least 80%. So if you have 100% = Red = Overflow on a PC just take 20% as 100%. In some cases even 85% reduction is fine, but you have to note that too silent produce another problem; the reproduction will have drop-outs (for s second or two the volume drops down, all in all a instable reproduction). If you download the samples you can hear what this all means in reality: One file is the “normal” PC level, the second one the “20% version”. When you play the 20% version on your PC you will find it very silent. You will find also the .g711 version of the 20% file to check it on your PBX. Remember that even this level is considered „loud” and on the limit!

Installation

Configuration

Known Problems

Howto:Moh and waiting queue files for testing

Howto:Creating fine announcements and music on hold

Contents

Applies To