Monday, May 14, 2012

Overlay Networks

Let's talk a little bit about overlay networks that we generally hear when the topic is distributed systems.  An overlay network is a network which is located (virtually) on another network. It is actually kind of similar to multi layer concept. You have another layer to help other layers do some operations.

Wiki has again a clear definition for this concept which is " . Nodes in the overlay can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network" . 


Cloud computing is one of the examples for overlay networks. The nodes run on top of the Internet.


< more details will be outlined soon as more particulars will be covered >

Thursday, May 3, 2012

IP Multicast and CacheCast

Single Source Multiple Destinations Data Transmission in the INTERNET

Let's first mingle with the terms that we are supposed to know before going into particulars of this topic.

There are 3 ways of addressing with IPv4 standard.

Unicast : One sender -> One destination
In computer networking, unicast transmission is the sending of messages to a single network destination identified by a unique address.
 

 What is Broadcast ?
Let's say that we have 1000 receivers, and our bandwidth is 30mbps. What happens if we want to send a package to all of these destinations ? Let's do the math in case of broadcasting.  We need to have 30 gbps in order to send this packet to 1000 destinations. What the heck :) , right ?  Content is same, though.


"Broadcast is when a single device is transmitting a message to all other devices in a given address range. This broadcast could reach all hosts on the subnet, all subnets, or all hosts on all subnets. Broadcast packets have the host (and/or subnet) portion of the address set to all ones. By design, most modern routers will block IP broadcast traffic and restrict it to the local subnet." http://www.inetdaemon.com/tutorials/internet/ip/addresses/unicast_vs_broadcast.shtml


What is Multicast ?  
Wiki has a clear definition as usual :
In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source.
http://upload.wikimedia.org/wikipedia/commons/3/30/Multicast.svg

Ok so we have one source and purpose is to send a message from that source to multiple destinations. .




"This allows for communication that resembles a conference call. Anyone from anywhere can join the conference, and everyone at the conference hears what the speaker has to say. The speaker's message isn't broadcasted everywhere, but only to those in the conference call itself. A special set of addresses is used for multicast communication." http://www.inetdaemon.com/tutorials/internet/ip/addresses/unicast_vs_broadcast.shtml


What about CacheCast?

Note that as you can understand from the existence of  each new technology, there is always a problem with the current system, maybe some optimizations need to be done for system to behave in a better manner such as speed, security and so on, and as a result of this need a new solution is produced.

Here, CacheCast is a solution for the conditions that today's internet causes when dealing with Multicast. However it does not solve the problems in the network layer, it does so in the link layer by caching.


Here is the outline of this post .

  1. Multicasting in the Internet
    • Basis and critique
  2. CacheCast
    • Internet Redundancy 
    • Packet caching systems
    • CacheCast Design
    • CacheCast Evaluation
      • Efficiency
      • Computational complexity 
      • Environmental impact




Multicasting in the Internet

In the beginning of the internet, in order to send a package to multiple destination it had to be in unicast style meaning that a source needs to send its message to multiple destinations separately in different times.  

We used to waste our link capacity in unicast. As you can see from the picture above, our source has 6 outgoing links in this scenario. This is absolutely no good. That's why many mechanisms have been developed to make "one source multiple destinations" scenario more efficient.


The solution is to send one link and then it is replicated and destinations receive it.
By this way we can save bandwidth and link capacity.

There has been two solutions for doing this multicast. The first proposed one was to add the list of receivers' name into the packet so that it will be replicated in the internet and destinations will receive it as shown in the picture below.

However we might have a problem here when we would like send this postcard to many people. What happens then ?  We cannot avoid redundant transmission in this case.

As you can see list is so long that there is a small space for the actual message "Hi".
The best thing to do with this approach is to have a small list of receivers and sending them separately.  This approach briefly does not scale well.


The second approach is to create groups and putting individuals into that groups. Network is responsible for replicating the data among groups and individuals.



To compare these two approaches, we can point that adding each individual into the list like in the first approach and sending it to the network makes the packet size huge as the size of the list grows bigger. However the second approach does not have such problems.


Ok now we focus on the second approach (group based one) . We have one source, we specify which group we want to send the package to, and network handles remaining parts of the operation such as replication.  However network replicates the data regardless of knowing who the sender is. It might be dangerous if the sender is malicious.

In the beginning, when if a node needs to join a channel (where there exists a group and individuals of this group are able to benefit from that channel) , there was no verification or authorization,. Anyone could join, transmit. It was open .

Then multicast evolved, only sender became able to send packages. It was the one who has the total control of everything.  It was called single source multicast.

Afterwards, many protocols showed up to improve however deployment of Multicast did not grow because of the problems discussed before. Security, congestion control issues have not been solved.

Multicast has some inherited problems in its architecture.  Extremely distributed protocol might be developed that handles group management, but deployment of such protocol is not that possible since it involves the change of routers in the internet in terms of software updates.

Did IP multicast fail ? 
This is a type of network that Internet provider has the total control like IPTv, no one else can use this IPmulticast from outside of the network.

Single source multiple destination transport mechanism becomes fundamental !
However at present, Internet does not provide efficient multi-point transport mechanism.


In conclusion,  we mentioned 2 approaches.  First one is "Datagram routing for internet multicasting"  ( explicit list of destinations in the IP header ) which did not scale well and died.

Second one is "Host Groups" . It is now taken as a reference and new improvements are being tried on that area.

These two approaches take place in network layer.

Moreover a third approach has showed up which is type of peer to peer network. Clients receive input from server and give it to another clients to reduce server's burden. But this solution is on the application layer which creates redundancy on the links.  However even though we reduce the burden of server, clients use unicasting to send the data to others. It creates redundancy on the links.

CacheCast


Let's first talk a little bit about redundancy.  The source of redundancy is that the server transmits the same data over and over again to multiple destinations.

The same data traverses the same path multiple times.  It wastes link capacity.
The idea to solve this problem is packet caching.  Let's first check the graph below.

Our source tries to send the payload to destination A as you can see in the graph.  Until the payload reaches destination A,  this payload is saved on the way.

And next time our source wants to send the payload to B, next hope knows that payload has been saved and already sent to next hop, so no need to send it again. Each hope does so, that's why it only sends the packet header.

Basically we cache payload locally.  However you might ask some questions such as;
- How to determine  whether a packet payload is in the next hop cache ?
- How to compare packet payloads ?
- What size should the cache be ?

CacheCast is example of packet caching system answers these questions in the following way.

Let's remember some terms such as link and router.
Link is a logical point to point connection.
Its throughput is limited in bit per second. So it is beneficial to avoid redundancy.

Router is a switching node. It performs three main tasks per packet.
- TTL update
-  Checksum recalculation. ( considering check sum of IP header )
- Destination IP address lookup.

These tasks do not depend on packet size.
For router it does not matter how big the packet-payload is.
The burden for router is to determine the destination IP address lookup.

So what is Link Cache about ? 



Caching is done on per link basis.
Cache management unit (CMU) removes payloads that are stored on the link exit.
Cache store unit (CSU) restores payloads from a local cache. It is like a remote memory.

Link cache size must be minimized. After approx. 72ns on a 10 Gbps link, the link gets in idle mode that's why. And modern memory r/w cycle is appr. 6-20 ns. It means that link processing must be simple to take operation in this limited time.

And to talk about link queue, at present it is scaled to 250 ms of the traffic meaning that if we want to build a link queue for 10 Gbps link, we need already 315 mb. The size of the memory is not the problem, read memory speed is .


The point is that a source of redundant data must support link caches.

Server Support in CacheCast
- Server can transmit packets carrying the same data with a minimum time interval
- Server can mark its redundant traffic
- Server can provide additional information that simplifies link cache processing.

Server support aims to batch same request for same data and send it in minimum time interval.

 Moreover, CacheCast packet carries an extentsion header describing packet payload.
* Payload ID ( Unique payload ID - comparsion of payloads is done by this)
* Payload Size ( Size of the redundant part of the packet )
* Index  ( for administration purposes )




When multiple package arrive, the redundancy is removed and other part is added into the first payload (Packet Train Concept) .


Briefly source (Server) 
  • Batches transmissions of the same data to multiple destinations
  • Builds CacheCast Headers
  • Transmit packets in the form of a packet train. 


I have to period here due to the limited time. I will get it complete when I have the chance.