Saturday, April 28, 2012

RealTime Multimedia in Presence of Firewalls and Network Address Translation

Our purpose in this blog is to understand how RealTime Multimedia technologies work under the limitations caused by Firewalls and NAT devices. To give an example, you would like to connect to your laptop (which is in another network, using another rooter)  by your smartphone to send something.  However the firewall of your laptop blocks all the connections coming from the network where your smartphone is in. Assume that this kind of problem is occurred in real time applications such as that you would like to be able to be called anytime regardless of what infrastructure you are behind while you are outside home and changing your position so fast.  Therefore the application you and the ones you are interacting with needs to adapt itself somehow and overcome the limitations of firewall and NAT devices so that no package is dropped and quality of the conversation or service could stay as planned.

First of all, for introduction, let's recall the terms we need to know to cover this topic.


 

What is Firewall  ? 

 

I can basically say that firewall is a wall that protects your computer's ports ( they say that a hacker can reach thousands of ports in your computer )  so that it is defined what kind of connections would be possible through your network.

http://static.ddmcdn.com/gif/firewall.gif

I will not go into particulars but for the ones who want to know more about firewall can check this blog ; http://howto-hsk.blogspot.com/2012/03/what-is-firewall.html

But briefly guys, do not destroy your firewall !  we do not want to mess up with hackers :)


Basically  Firewall ;
 - Blocks all incoming traffic except established connections.
 * All communication must be initiated from inside.
 - Might block certain protocols
 *Some vendors consider UDP dangerous.
 - It keeps the track of established communication in a table located in NAT device.
* For TCP connections it is possible to take them into consideration as ESTABLISHED because TCP is a connection based protocol while UDP is connectionless meaning that UDP communication paths are considered as ESTABLISHED only if it has been responded to .

What is NAT  ( Network Address Translation )  ? 

Cisco says that : Network Address Translation allows a single device, such as a router, to act as agent between the Internet (or ""public network"") and a local (or ""private"") network. 



Imagine that there are lots of computers, devices conected to Internet behind FIREWALL . By NAT device, let's say located on the firewall ;   these devices use only one shared, public  IP address to reach outside.  As far as I know using NAT devices was introduced because of lack  of IPv4 adresses so that many devices could use one public IP adress for ouside the firewall. I believe that most of you have a NAT router in your home. When you use your laptop + smartphone + desktop at home, this is what is going on in your router.


NAT is divided into two groups; Source NAT and Destination NAT.

Source NAT :  When you initaite a connection from inside, destination hosts out of yor firewall will see some public IPs assigned by Source NAT.
- Only receiver can detect a sender's port.
* May vary between destination hosts.  ( when you communicate with a destination host you get an IP address,  while for another destination host you end up with another IP address) .

Destination NAT:  When you have a service that you want to allow outside world to benefit from your service which is behind a firewall, you need to tell NAT to map a port with this service so that people on the other side of the wall can access this service.



It should be stated that, firewall and NAT devices are in the same box meaning that they are related to each other in many aspects.

What do we mean by Real Time Multimedia ? 

We mean a system that can work real time (e.g  people can call you any time when you are out )  irrespective of behind what network infrastructure you are.

If you want a better definition here it is :
Definition: Real-time multimedia refers to applications in which multimedia data has to be delivered and rendered in real time; it can be broadly classified into interactive multimedia and streaming media. http://encyclopedia.jrank.org/articles/pages/6877/Real-Time-Multimedia.html

 

However,  Firewalls and NAT devices cause some problems in this aspect .
Thus, I will try to explain what those problems are and what are the mechanisms being developed to overcome those problems .

When to talk about Real time multimedia you might imagine the services below ;

Types of services : Unified communications
- Voice
- Video
- Chat / presence
- Application sharing

Unified communications have some characteristics such as ;
- Real time properties needed
- We tend to find shortest / best path for media because we do not want to face with delay and jitter.

 * Delay  : Delay is basically latency in the communication. When you say something in Skype ,  your girlfriend hears it 10 seconds after when you actually say it . It might cause some problems :P .

 * Jitter : Jitter is the amount of variation in latency/response time, in milliseconds.
 Let's give an example for this ;  let's assume that you are expecting your girlfriend to meet you everyday  to have a dinner together at 7:30.  However the time your gf shows up is not stable meaning that she sometimes shows up around 9:oo  sometimes 5:59 ( waits for you and then gives up and leaves :P I do not what kind of girlfriend does that)    what would you do in this case ?
The point here is that this situation of variation in meeting time of your girlfriend will result in some decrease of the quality of your relationship for sure .


I believe you got my point :)  You might adapt this scenario for Skype . If the jitter is low, then the quality of your realtime voice communication will be better ;)



So what about firewalls ? Let's talk about it by giving some examples . 

Given that I would like to call my gf by Skype and she has a firewall that does not allow my network to reach through hers . What happens in this case is that she will not even be able to know that I made a call because the package has been dropped by her firewall.

And assuming I have no firewall, if she makes the call ( she knows my port and IP no) the call works fine because I can use the same link - connection that she opened to call me.

However what happens if we both have firewall ? This is where 3rd party application comes into the picture.  This can be a registrar that registers the parts' IP and port number stating to others that "you can reach this registered guy by using this port and IP ".   For instance,  in order for me and my girlfriend to communicate, we both need to use registrar because we have firewalls that do not let us have a direct communication, right ?   By using registrar ,  I register myself in registrar's database with my IP and port, so my gf does.   By this way, Registrar works as a relay service letting us communicate without a problem  .

You might ask,  why do we need to use registrar while my firewall somehow could be able show them to my girlfriend's firewall ?   In this scenario , it simply does not work. Because the IP address and port number that registrar registers are probably seen only by register while in case that her firewall tries to access these port and IP directly ( NAT device might translate them into different numbers for outside of my network) ,  and they might simply be shown with different IP and port numbers to my gf's firewall.


So what is the connection of Firewalls and NAT devices with Multimedia ?

To recap;

- Firewall/NAT devices interfere and block communications.
* Cannot send packets to a private address from Internet.
* Firewalls only accept outbound communications initiated from inside.
* NAT : packets from the same port may be seen with different src by different hosts.



What about multimedia ?
 - Most protocols for realtime multimedia
* uses multiple ports for a single application.  ( 1 port for audio + 1 port for video and so on )
* Most protocols have not been designed for NAT devices. They can send IP+port numbers in the payload. (Upps this is no good) The schema breaks down in NAT devices meaning that NAT violates the system. 

We will be talking about TCP/UDP protocols, so we had better have a review of them at this point. 

http://www.it20.info/misc/pictures/TCP-clouds-UDP-clouds-design-for-fail-and-AWS1.jpg

IF we imagine that we use TCP for a video conference, it is basically no good because sequence is important in TCP , assume that packet 1, packet 2 is received by your device while sender somehow could not send the packet 3 ( router-switch might decide to drop it because of too much traffic) then incoming messages will be hold in receiver's stack without sending them to the device in order to keep everything in order because as we stated the order of messages is important meaning in our scenario that, packet 4-5-6 and so on cannot be sent before the receiver takes packet 3 . After the sender is able to send packet 3 , the receiver stack will know that now the order is ok ( from packet 3 till last packet )  , and then the whole message packets will be delivered once. It means that the interval of receiving packets by the receiver  is ruined. And this means jitter which we cannot tolerate in real-time apps.

Thats why we prefer UDP over TCP for media transfering, although we might lose some packets in UDP. IF we have to use TCP for some reason in some hopes, then for the rest of the hops, UDP should be prefered.

Note: I will talk about UDP and TCP comprehensively in my future posts.

Mechanisms to enable realtime multimedia services work well in presence of firewalls and NAT devices. 

They are actually to detect and handle Firewalls and NAT devices.

 

1) STUN ( Simple Traversal Utilities for NAT) 

2) TURN (Traversal Using Relays around NAT )

3) (N)ICE ( Interactive Connectivity Establishment ) 

4) Tunneling

5) Modifying / involving the firewall. 

4 and 5th solutions are generally not possible to apply .  We cannot ask a company to access its firewall configurations, right ? :) we can, but the answer will be "No" unless you are working for that company as a network admin or something .


Before we outline these protocols, let's first talk about typical Firewall / NAT configurations. These might be the cases on the firewall.

  1. All TCP/UDP allowed + NAT is used on the firewall
  2. All TCP allowed but UDP is only allowed from add:port that was sent to- and No NAT
  3. All TCP allowed from inside
  4. All UDP / TCP blocked
  5. All direct access to outside is forbidden. (internet access is available by web proxy ) 

 It might also be beneficial to talk about the categorization of NAT devices. They are three categories.

  1. Endpoint independent mapping ( the most open one )  meaning that when you as a source start a connection with a destination host,  this connection can be reused by other destination hosts.
  2. Address dependent mapping meaning that when you have a connection with a destination host, because it is only address dependent, the ports on that host can use the available connection. 
  3. Address and port dependent mapping  is as its name shows that a connection between source and destination host is unique ( source address port + dest. address port)  , by this way no other host or  ports in the same host can reuse this connection.

1) STUN ( Simple Traversal Utilities for NAT)  

* Client/Server based protocol.
* Designed to allow detection of firewall / NAT properties
(detects If we can have the connection directly or if we need to have a 3rd parties)
* What properties NAT has , discover public addresses assigned to address on the private network
* Discover if the public address seen by one destination host/port can be used another host.


Thanks to STUN servers, each part knows the public IP and port of each other. The devices behind the firewall use STUN servers to figure out these public IP and PORT. No translation of addresses are done in STUN servers. 

 What happens if STUN server fails ? 

For instance my firewall-NAT might block all UDP for some specific purposes let's say.  In this case STUN server fails and tries to communicate with my device through TCP.  It succeeds because my NAT device accepts TCP connections. After communicating via TCP ,  STUN server then keeps communicating with the destination host via UDP ( because the connection is already established and UDP has less links compared to the links that are supposed to be created for TCP) .


However in this case of using TCP as alternative for UDP , we might have many problems due to TCP such as packet loss or some distortion in sound or video. Because TCP when packet gets dropped, TCP retransmits it and we come up with delay and jitter that we dont want to have in real time apps .

2) TURN (Traversal Using Relays around NAT ) 

 * Let's say that my device behind the firewall asks a TURN server to give a public port.  Then the server directs all to traffic by this port to my device.  However this wastes twice more bandwith because imagine that there are two devices and a STUN server in the middle.  1 traffic is caused by while sending something to the Server by another device , and 2nd traffic happens when the server directs this traffic to my device.

Of course it is much better always if direct communication is possilbe. But in case it is not possible, TURN is one of the solutions.

With TURN, UDP and TCP communication is possible .


In this point, before we go through the remaining part of the mechanisms, let's talk a little bit about the differences between guaranteed delivery and real time applications.

Guaranteed delivery is something we face with when you download something (like mp3, or media). In this case , we do not care about latency that much. What we care is only the completion of download, right ?

What about a video conference ? or a call from your family by skype ? Can you show some tolerance to jitter or delay ?
Answer should be "NO" , because even a slight jiiter, burst or delay in sound packages, it might distort the conversation meaning that parts may not understand each other.


3) (N)ICE ( Interactive Connectivity Establishment )

It is another recent alternative protocol for exchanging channels  for communication.
It uses STUN, TURN and other technologies under the hood. 
It basically gathers as many addresses/ports pairs as possible for the parts that want to communicate with each other . This pairs are kept in a prioritized  order so that sender and receiver can try these ports and addresses until the communication is set up. 

Thanks to ICE, we can avoid a situation in which one part thinks that other part sees him ( video conference ) but actually there is no stream taken in other part.  ICE deals with this kind of similar miscommunication problems. 


- End to end protocol between clients
- Servers are not direclty involved ( but Turn and Stun servers can be used when needed) 


4) Tunneling (RealTunnel)

Client on the local network (let's assume we have a mobile phone) finds network connectivity by the help of STUN and some more.  This application, RealTunnel, works together with RealTunnelServers located in different geographical locations to find available transport mechanism for the peers in a call.

RealTunnel is a product that can be used when everything else is failed .

5) Modifying / involving the firewall. ( Control the FIREWALL) 

Actually we all are familiar with modifying our firewalls from the case when we would like to play some games ( after installation you are asked to change your firewall settings to enable the application to do some stuff, right ? )  . However we know that for game players, there is no more important thing than just starting to play the game which can be a big mistake.  It might have its the consequences ;)

Doing such operations are not allowed in corporate network.

Another way of modifying the firewall is for voice and video which is for example to use a Session Border Controller. Session Border Controller bypasses the firewall so that actually Session Border Controller works as a firewall, it handles some protocols that it already knows.

Session Border Controller is an  Application level gateway which will mask the fact that  there are different addresses inside and outside by converting all the payload.


About Application Level Gateways ; Check wiki for more info about Session Border Controller
Wiki says : " It allows customized NAT traversal filters to be plugged into the gateway to support address and port translation for certain application layer "control/data" protocols such as FTP, BitTorrent, SIP, RTSP, file transfer in IM applications etc. In order for these protocols to work through NAT or a firewall, either the application has to know about an address/port number combination that allows incoming packets, or the NAT has to monitor the control traffic and open up port mappings (firewall pinhole) dynamically as required."


However Application Level Gateways ;
* Gateways interfere at application level. it means it breaks firewall principle .  Each time the high level protocol changes,  application level  also needs to adapt to new conditions.  It is not always possible to change encrypted data. In some scenarios app. level gateway solution might not be good .


Some protocols having difficulties with NAT and FIREWALLS

  1. SIP (Session Initiation Protocol) is the preferred protocol for VoIP   = The protocol can be used for creating, modifying and terminating two-party (unicast) or multiparty (multicast) sessions. Sessions may consist of one or several media streams. (WIKI SAYS so )  TURN , STUN, ICE protocols are made by SIP community.
  2.  H323 and FIREWALL/NAT =  Old fashioned protocol that has similar functionalities with SIP .
  3. OSCAR- AIM, ICQ- and FIREWALL / NAT = Supports most types of firewalls and NATs.   Has similar mechanisms as used for SIP
  4. Skype and Firewall/NAT =  Developed for KAZAA . Skype is encrypted (we do not know how it works )  It uses your connection to relay others' calls meaning that they use your bandwidth.   When it crashed once in 2007, it was thought to be due to the reason that people (users) had used firewalls and relaying was not that possible in that condition. 
  5. MSNP (MSN) will be replaced by Skype !  :Microsoft bought Skype ! 
  6. XMPP
  7. IPsec is also subject to this problem .  Encyrpted VPN solutions use IPsec . IPsec uses AH ( Authentication Protocol) which is an additional header that authenticates the information in the IP header. Ensures that packet header information is not changed by firewall or any other mechanisms . However it is the whole deal of NAT :)    and IPsec uses ESP (Encapsulating Security Payload protocol) that encapsulates and encrypts a checksum of the IP payload meaning that port modifications are not possible (because port numbers are in the payload) so if someone tries to modify it , the package gets dropped.



What about IPv6 and FIREWALL / NAT
- Do you think will we need NAT in IPv6  ? Because the reason why we use NAT is we have few IP addresses.  Therefore we will not need NAT in this sense.

However we can make use of NAT for ;

*  Security measure : Topology hiding to prevent host counting by attackers.
* Backward compatibility
-  If you have large private network with one million devices with static IP addresses , when changing Internet provider,  you dont to modify those 1 million machines. Briefly we will most probably need NAT devices.


To SUM UP ! (Finally :)


There are still no specific standard to support all these scenarios .

However discovery and traversal tools can help find the best path !
STUN, TURN, ICE. 

Basically , when to ends want to communicate and if they have firewalls blocks them to communicate directly , a third party mechanism such as registrar is needed for registration.  After that , they will know what port and IP to use for direct communication if possible.

Same problems will most probably occur in IPv6.






PS: Almost all of the content of this post has been utilized from the presentation that was given by Knut OMANG in University of Oslo.









No comments:

Post a Comment