Course11:Basic - VoIP Protocols

From innovaphone-wiki

Jump to: navigation, search

This book describes the different protocol classes used in VoIP communications.



The idea behind Voice over IP is to transport voice (or video) streams between two or more IP endpoints. Since the transported data is real-time media, special conditions must be fulfilled. To construct and synchronize the voice streams from the received IP packets, each packet must be tagged with a timestamp. Since none of the existing transport - protocols offered this stamping mechanism, a new protocol called RTP (Real-time Transport Protocol) was introduced.

The services provided by RTP include:
  • Payload-type identification - Indication of what kind of content is being carried
  • Sequence numbering - PDU sequence number
  • Time stamping - allow synchronization and jitter calculations
  • Delivery monitoring
However before RTP packets are sent between the communication partners, the parameters for the RTP session must be negotiated. This negotiation process is done using a signalling protocol.

screenshot.png VoIP-protocols

As already mentioned, VoIP protocols can be divided into two categories, signalling and media protocols. While RTP is currently the only media protocol in use, the wikipedia.ico list of signalling protocols is rather long. The most important signalling protocols, and also the only implemented by innovaphone, are H.323 and SIP.


A codec converts analog audio signals into digital signals for transmission. A receiving device then converts the digital signals back to analog using an audio decompressor, for playback. The term codec is a combination of 'coder-decoder'.

The wikipedia.ico list of existing codecs is rather long, however there are some standards released by the ITU-T that are supported by most manufacturers. These are also the only codecs used by innovaphone devices:
  • G711 A/U
  • G729
  • G.723
  • G.726
  • G.722 (only supported on the gateways 6010,3010,0010,1060 and phones IP222, IP232)
  • G.722.2 (only supported on the gateways 6010,3010,0010,1060 and phones IP222, IP232)
Ok, so we have 6 codecs but what is the difference between them?

Each of them uses an own compression algorithm, resulting in a different audio quality and bandwidth requirements.


theoretic needed bw
real needed bw
G.711 A/U
64 kbit/s 80 -100 kbit/s
8 kbit/s 24 - 30 kbit/s
~6 kbit/s 20 - 30 kbit/s
16/24/32/40 kbit/s 40 - 60 kbit/s
80-100 kbit/s
10-40 kbit/s

The discrepancy between both bandwidth requirements comes from the use of a packet switched network. The RTP packets are encapsulated in UDP packets, UDP packets in IP packets and so on.
screenshot.png Bandwith-requirements

Audio quality

The perceived quality of a conversation is very dependable on the communication partners and therefore hard to measure. However the ITU-T standardized a method by with each codec can be assigned a certain score.

This values are called MOS(Mean Opinion Score) - values. MOS is expressed as a single number in the range 1 to 5, where 1 is lowest perceived audio quality, and 5 is the highest perceived audio quality measurement.

MOS value
G.711 A/U


As can be seen in the previous description, G.729 is a good alternative to G.711 in case that the bandwidth is limited. However G.729 and also G.723/G.726 have one important drawback, they cannot be used to relay fax signals. The audio signal is too compressed when using this codec and vital information required by fax machines is lost. Therefore a new codec named T.38 was introduced. T.38 has a bandwidth requirement of 14.400 kbps.


Beside audio, it's as well possible to negotiate the use of video via H.323/SIP. Before transmission via RTP, the video data captured by any video device (e.g. camera), is de-/compressed using H.264 codec.
H.264 is the successor of H.263 and was designed to cover application scenarios like HDTV, Portable video, multimedia, video conferencing and many more.
Depending on the resolution and image refresh rate (fps) of the capturing device, bandwidth requirements are starting from 80kbit/s for a one-way communication.
innovaphone supports video over IP with it's V10-firmware in the UC client myPBX.


H.323 is a system specification that describes the use of several ITU-T and IETF protocols. The protocols that comprise the core of almost any H.323 system are:

  • H.225.0 Registration, Admission and Status (RAS), which is used between an H.323 endpoint and a Gatekeeper to provide address resolution and admission control services.
  • H.225.0 Call Signalling, which is used between any two H.323 entities in order to establish communication.
  • H.245 control protocol for multimedia communication, which describes the messages and procedures used for capability exchange, opening and closing logical channels for audio, video and data, control and indications.
  • Real-time Transport Protocol (RTP), which is used for sending or receiving multimedia information (voice, video, or text) between any two entities.

Many H.323 systems also implement other protocols that are defined in various ITU-T recommendations to provide supplementary services support or deliver other functionality to the user. Some of those recommendations are:

  • H.235 series describes security within H.323, used by innovaphone for password and SRTP key encryption.
  • H.450 series describes various supplementary services (e.g. Call Pickup, MWI).

In addition to those ITU-T recommendations, H.323 utilizes various IETF Request for Comments (RFCs) for media transport and media packetization, including the Real-time Transport Protocol (RTP).

screenshot.png Typical_H.323_Stack

H.323 Architecture

The H.323 system defines several network elements that work together in order to deliver rich multimedia communication capabilities. Those elements are Terminals, Multipoint Control Units (MCUs), Gateways, and Gatekeepers. Collectively, terminals, multipoint control units and gateways are often referred to as endpoints.

While not all elements are required, at least two terminals are required in order to enable communication between two people. In most H.323 deployments, a gatekeeper is employed in order to, among other things, facilitate address resolution.

H.323 Network Elements


Terminals in an H.323 network are the most fundamental elements in any H.323 system, as those are the devices that users would normally encounter. They normally exist in the form of an IP phone.

Multipoint Control Units (MCU)

A Multipoint Control Unit (MCU) is responsible for managing multipoint conferences. In more practical terms, an MCU is a conference bridge not unlike the conference bridges used in the PSTN today. The most significant difference, however, is that H.323 MCUs might be capable of mixing or switching video, in addition to the normal audio mixing done by a traditional conference bridge.


Gateways are devices that enable communication between H.323 networks and other networks, such as PSTN or ISDN networks. If one party in a conversation is utilizing a terminal that is not an H.323 terminal, then the call must pass through a gateway in order to enable both parties to communicate.


A Gatekeeper is an optional component in the H.323 network that provides a number of services to terminals, gateways, and MCU devices. Those services include endpoint registration, address resolution, admission control, user authentication, and so forth. Of the various functions performed by the gatekeeper, address resolution is the most important as it enables two endpoints to contact each other without either endpoint having to know the IP address of the other endpoint.

Gatekeepers may be designed to operate in one of two signalling modes, namely "direct routed" and "gatekeeper routed" mode. Direct routed mode is the most efficient and most widely deployed mode. In this mode, endpoints utilize the RAS protocol in order to learn the IP address of the remote endpoint and a call is established directly with the remote device. In the gatekeeper routed mode, call signalling always passes through the gatekeeper. While the latter requires the gatekeeper to have more processing power, it also gives the gatekeeper complete control over the call and the ability to provide supplementary services on behalf of the endpoints. innovaphone device work in "gatekeeper routed" mode.

Mapped to innovaphone devices these H.323 elements would correspond to:

  • Terminal -> IP phone (e.g. IP200, IP150)
  • MCU -> innovaphone gateway (e.g. IP6010) working as conference server
  • Gateway -> all innovaphone gateways. (e.g. IP22, IP24, IP302, IP6010 etc.)
  • Gatekeeper -> all devices running the innovaphone PBX (e.g. IP302, IP6010, etc.)

H.323 - Signaling

As mentioned in the previous chapter, H.323 defines three main protocols for call signalling: RAS, H.225 and H.245.


RAS (Registration, Admission and Status) is a communication protocol between a H.323 Terminal and a Gatekeeper. Unlike the other H.323 signalling protocols, RAS uses UDP as underlying transport protocol.

The main functions of RAS are:
  • Gatekeeper Discovery
  • Registration of Terminals at the Gatekeeper
  • Call admission and address resolution

screenshot.png RAS

When an endpoint is powered on, it will generally send either a gatekeeper request (GRQ) message to "discover" gatekeepers that are willing to provide service or will send a registration request (RRQ) to a gatekeeper that is predefined in the system’s administrative setup. Gatekeepers will then respond with a gatekeeper confirm (GCF). If a GRQ has been sent the endpoint will then select a gatekeeper with which to register by sending a registration request (RRQ), to which the gatekeeper responds with a registration confirm (RCF). At this point, the endpoint is known to the network and can make and place calls.

When an endpoint wishes to place a call, it will send an admission request (ARQ) to the gatekeeper. The gatekeeper will then resolve the address and return the address of the remote endpoint in the admission confirm message (ACF). The endpoint can then place the call.


Once the address of the remote endpoint is resolved using RAS, the terminal will use H.225 in order to establish, control and end a H.323 call. The H.225 call signalling is based on the call setup procedures for ISDN, described in the Q.931 / Q.930 standards. Simplified one can say that the H.225 represent an IP implementation of the ISDN D - channel methods.

screenshot.png H225

In the example above, we will discuss the basic signalling methods in H.225. Also we will concentrate on the "gatekeeper - routed" mode, since this is the common method used with innovaphone devices.
The call is started by Alice sending a SETUP message (1) to the gatekeeper. A SETUP ACKNOWLEDGE message (2) notifies the caller that the request is being processed. The gatekeeper will forward the SETUP message (3) to Bob's terminal, normally resulting into a ringing tone being played on the phone. This is indicated by the Alerting message (4). If Bob picks up the call, a Connect message (5) is sent to the gatekeeper and then gets forwarded to Alice.

The Call Termination is signalled by the Release Complete message (6).


While H.225 is used to signal the remote terminal a call request, it lacks the methods for opening RTP channels needed for the transport of voice/video data. This task is performed by the H.245 protocol.
The main functions of H.245 are:
  • exchange of terminal capabilities (e.g. supported audio codecs)
  • master/slave determination
  • establish, control and terminate logical channels (RTP/RTCP)
Capability Negotiation

Of the functionality provided by H.245, capability negotiation is arguably the most important, as it enables devices to communicate without having prior knowledge of the capabilities of the remote entity. H.245 enables rich multimedia capabilities, including audio, video, text, and data communication. For transmission of audio, video, or text, H.323 devices utilize both ITU-defined codecs and codecs defined outside the ITU. Codecs that are widely implemented by H.323 equipment include:

* Video codecs: H.261, H.263, H.264
* Audio codecs: G.711, G.729, G.729a, G.723.1, G.726
* Text codecs: T.140

When an H.323 device initiates communication with a remote H.323 device and when H.245 communication is established between the two entities, the Terminal Capability Set (TCS) message is the first message transmitted to the other side.

Master/Slave Determination

After sending a TCS message, H.323 entities (through H.245 exchanges) will attempt to determine which device is the "master" and which is the "slave." This process, referred to as Master/Slave Determination (MSD), is important, as the master in a call settles all negotiation conflicts between the two devices.

Logical Channel Signaling

Once capabilities are exchanged and master/slave determination steps have completed, devices may then open "logical channels" or media flows. This is done by simply sending an Open Logical Channel (OLC) message and receiving an acknowledgement message. Upon receipt of the acknowledgement message, an endpoint may then transmit audio or video to the remote endpoint.

H.323 Fast Connect and H.245 Tunneling

The original H.323 signalling protocols underwent many changes in order to shorten the time needed for the RTP session establishment.

Fast Connect (FC)

The first thing to improve was the rather long H.245 message handshake and the support of "early media". Early media is a term used for the setup of RTP channels between the communication partners, before the call has been accepted (Connect) by both endpoints. This feature is used to play announcements or dialtones to the waiting caller.

screenshot.png h.323-simple

As shown in the upper right picture, the OLC (Open Logical Channel) message, a H.245 message, is sent encapsulated in a H.225 message (Connect, Alerting). The drawback of Fast Connect is that it can be used only in homogeneous (all device are compatible) environment. By sending the OLC without initially checking the capabilities (TCS) of the remote terminal, it is assumed that both terminal support the same set of capabilities.

As shown in the picture above, the RTP Stream (red) goes directly from endpoint to endpoint. However some customer scenarios require the voice stream to pass through the PBX (e.g. firewall traversal). innovaphone defined the term 'Media Relay' for gateways working in this mode. The redirection of voice data through the PBX has one main disadvantage, it creates a high CPU load on the 'relaying' gateway. Therefore this option is usually off by default and must be enabled manually.

H.245 Tunnelling

This method is the logical enhancement of the Fast connect procedure, since it encapsulates not only the OLC messages but every H.245 message in H.225 messages. As a result the separate H.245 TCP connection between the conversation partners is not needed. This saves processing power as well as TCP sockets on the innovaphone hardware, and also eases firewall traversal.

Extended Fast Connect(EFC)

EFC fastens the renegotiation of logical channel attributes during a conversation (e.g. a terminal is put on hold and receives MoH). Instead of running through the complete H.245 handshake process, a change of RTP attributes is done by sending a single OLC message to the remote endpoint. Upon it's receipt, the terminal will close the old logical channel and open a new one using the newly obtained parameters.

Each of these enhancements to the original H.323 protocol is implemented by innovaphone. To improve interoperability with 3rd party vendors, it is possible to disable FC and H.245 tunnelling at the GW - interface.

H.450 Supplementary Services

H.450 refers to a set of standards created by the International Telecommunications Union (ITU) to define several Supplementary Services of the packet based telecommunication protocol known as H.323. It parallels another set of standards known as QSIG which define similar services for ISDN based networks.

  • H.450.1, Supplementary Services Framework
The general mechanism for delivering supplementary services is explained in this paper. Supplementary services messages are exchanged by means of ROSE (Remote Operations Service Extension).

  • H.450.2, Call Transfer Supplementary Service
E.g. explains how a party-B can turn an active call between party-A and party-B into a call between party-A and a new party-C.

  • H.450.3, Call Diversion Supplementary Service
E.g. explains how an IP phone can activate a diversion to e.g. a cell phone. E.g. explains how an IP phone can interrogate whether it has any active diversion.

  • H.450.4, Call Hold Supplementary Service
E.g. explains how a call can be put on hold and be fed with a music on hold.

  • H.450.5, Call Park and Pickup Supplementary Service
Think of big warehouse, where a call is coming in at the front-desk for Mrs. Smith. The front-desk parks the call and broadcasts via the intercom: "Mrs. Smith, please 223". Mrs. Smith proceeds soon after to the next wall phone, dials 223 and gets the call.

  • H.450.6, Call Waiting Supplementary Service
Explains how to signal a second call to an IP phone already engaged in an active call.

  • H.450.7, Message Waiting Indication Supplementary Service
Explains elements related to voicemail systems and how these can be implemented by means of H.323.

  • H.450.8, Name Identification Supplementary Service
Explains how names are displayed or how to intentionally call incognito.

  • H.450.9, Call Completion Supplementary Service
Explains how to schedule an automatic call-back request in case of a remote party being busy in a call or being absent for a while and becoming available later on.

  • H.450.10, Call Offer Supplementary Service
A variation of Call Waiting. Also known as "Camp-On".

  • H.450.11, Call Intrusion Supplementary Service
Explains how e.g. the secretary of a CEO can intentionally and legally intrude into a call of her boss, in order to communicate urgent information.

  • H.450.12, Common Information Additional Network Feature for H.323
A means to communicate additional miscellaneous information between endpoints. E.g. whether certain features are available, and/or allowed.


SIP clients typically use TCP or UDP (typically on port 5060 and/or 5061) to connect to SIP servers and other SIP endpoints. SIP is primarily used in setting up and tearing down voice or video calls. However, it can be used in any application where session initiation is a requirement. These include Event Subscription and Notification, Terminal mobility and so on. There are a large number of SIP-related RFCs that define behaviour for such applications. All voice/video communications are done over separate session protocols, typically RTP.

SIP works in concert with several other protocols and is only involved in the signalling portion of a communication session. SIP is a carrier for the Session Description Protocol (SDP), which describes the media content of the session, e.g. what IP ports to use, the codec being used etc. In typical use, SIP "sessions" are simply packet streams of the Real-time Transport Protocol (RTP). RTP is the carrier for the actual voice or video content itself.

screenshot.png Typical_SIP_Stack

SIP is similar to HTTP and shares some of its design principles: It is human readable and request-response structured. SIP shares many HTTP status codes, including the familiar '404 not found'.

SIP Architecture

SIP User Agents (UAs) are the end-user devices, used to create and manage a SIP session. A SIP UA has two main components: 
  • the User Agent Client (UAC), which sends messages and answers with SIP responses,
  • the User Agent Server (UAS), which responds to SIP requests sent by the peer. 
SIP UAs may work in point to point mode. Typical implementations of a UA are SIP softphones, SIP hardphones and SIP-enabled ATAs.

SIP also defines server network elements. Although two SIP endpoints can communicate without any intervening SIP infrastructure, which is why the protocol is described as peer-to-peer, this approach is impractical for a public service. There are various implementations that can act as SIP servers:

Proxy Server:

A Proxy Server is responsible to route incoming call requests to the intended recipient.
screenshot.png SIP-Proxy

Upon receiving of a call (msg. 1) from one UA (Alice), the SIP Proxy looks up the address of the callee (Bob) at the registrar responsible for this domain (msg. 2). Then the server will create a new SIP session to Bob and forward signalling messages between both endpoints. This corresponds to the "Gatekeeper - routed" mode in H.323, as the Proxy remains always in the communication path.

Redirect Server:

Like the Proxy Server, the Redirect Server also is responsible for the correct routing of incoming calls.
screenshot.png SIP-Redirect

First Alice sends a call request (msg. 1) for to her assigned Redirect Server. After the address lookup (msg. 2) for Bob was performed by the Redirect Server, the address details are passed to Alice's client. It's now up to the UAs to establish the call. The SIP server is now out of the signalling channel, while Alice and Bob exchange SIP messages directly. This corresponds to the "directed - routed" mode in H.323.

Registrar Server:

The role of this server type is to accept and check the credentials of incoming UA registrations. If the client is allowed to use the SIP service, the Registrar will store it's IP-address and by this to make a SIP username <-> IP - address mapping. For this reason a Proxy or Redirect server are able to contact the Registrar, in order to find out the IP-address under which a UA can be reached.

Innovaphone combined the SIP Proxy server as well as the Registrar server functionality in the PBX software. As the PBX is also H.323 capable (Gatekeeper), it can communicate simultaneously with both H.323 and SIP clients.

SIP - Signalling

SIP is a request - response structured protocol, very similar to HTTP. The SIP requests are used to initiate an action, for example a phone conversation. The SIP Responses indicate whether a request succeeded or failed, and in the latter case, why it failed.

The most important request are INVITE and ACK used for call establishment, respectively BYE for call termination. The number of responses is rather vast, however the most used is 200 OK to successfully confirm a request.

Have a look at these wikipedia articles for a complete list of the wikipedia.ico SIP requests methods and wikipedia.ico SIP response codes.

SIP Call Establishment and Call Termination

screenshot.png SIP_-_Signaling

Call Establishment

As shown in the picture above, Alice initiates the call by sending an Invite message (1) to the Proxy Server of the domain The server answers Alice Invite using a 100 Trying message(2), indicating that the requests is being processed. In order to route the call to the correct IP - endpoint, the call server must request Bob's current location (3) at the Registrar responsible for the domain. After successfully completing the IP - address lookup, the proxy forwards the Invite message (4) to Bob's phone.
The called UA will respond with a 180 Ringing message(5), indicating that the call request was accepted and the phone is ringing. Upon receiving the 180 Ringing message, the proxy server will forward it to Alice UA. Finally a 200 OK message (6) is sent to the server, when Bob picks up the phone. As with the previous 180 Ringing, this response is also forwarded to Alice. The receipt of the 200 OK is confirmed using an ACK message (7) by Alice.

As shown in the graphic above, there are certain messages (Invite, 180 Ringing & 200 OK) which have an additional encapsulated SDP packet. The SDP part is used to negotiate the RTP parameters for this session.
The Invite SDP message contains a list of the UA's supported codecs and the IP-address and port number used to receive RTP packets. It's up to the called UA to choose the codec to use for the session. The codec selection is done by comparing the received codec list with the own list, and selecting a preferred codec from the subset of both lists.

Using the exchanged SDP information, two RTP channels (one for each direction) are set up between the UAs. The RTP packets do not pass through the SIP proxy but go directly between the communicating endpoints.

Call Termination

Eventually one of the conversation participants (in our example Alice) will hang up, resulting in a BYE message (8) being sent to the SIP proxy. The message is forwarded to Bob and confirmed using the 200 OK (9) response. Both UAs will now clear their RTP channels and return to idle mode.

H.323 vs. SIP

As we know by now SIP and H.323 are two concurrent signalling protocols. You will find lengthy debates in the Internet, discussing which protocol is better and will prevail over the other .

The H.323 specification was written by the ITU -T (International Telecommunication Union), a group of telecommunications specialists who also developed the ISDN standard. Therefore H.323 heavily relies on message formats and structures used in ISDN (i.e. Q.931) and offers a good interoperability to PSTN networks.

The SIP standard was introduced by the IETF (Internet Engineering Task Force), a group of network protocol specialist responsible also for other famous network protocols like HTTP. This is also the reason for the similarity between HTTP and SIP request and response messages.

As it is now, both protocols have reached a mature state and are constantly developed and improved. It is very probable that both protocols will also coexists in the future.

For a complete comparison of H.323 and SIP, please have a look at this www.png article.

When using innnovaphone device however, one should always prefer H.323 over SIP. The main reason for this is that the innovaphone's H.323 stack implements more features than the SIP stack.

The main features only available in H.323 are:
  • Group indications (e.g. Park, Pickup, Chef/Secretary)
  • Enhanced/simplified phone registration using Hardware-ID or Gatekeeper - Discovery
  • Call Completion
  • Call Intrusion
  • innovaphone PBX Master-Slave relationship
SIP should be used in this cases:
  • connection to a SIP provider
  • connection to a third party application/phone supporting only SIP (e.g. OCS/Lync, Samwin CBC, Bria)


VoIP protocols are divided in two types, signalling protocols and media protocols.

Signalling protocols:
  • H.323
  • SIP
Media Protocols:
  • RTP
Signalling protocols are used for call establishment and termination, while the media protocol transports voice or video data.

The recommended signalling protocol in most innovaphone environments is H.323.

recommended Web links:

wikipedia.ico H.323
wikipedia.ico H.225
wikipedia.ico H.245
wikipedia.ico SIP
www.png H323 vs SIP
Personal tools