Wednesday, September 05, 2007

It's All About the Traceability

I just come across a very interesting blog Log Everything All the Time talking about logging information in a production environment.

As a software developer, I just want to further elaborate on this:

1. It is all about the Traceability!

  Since in the real production environment, there is always some failure somewhere: router/switch could die or rebooted, connections will timeout if the firewall is jammed, SSL Certificates will expire, and there'll be DB upgrade, OS Patching, ... ..., the failure could happen just at the time a new merchant customer just send out his second credit authorization request, or approval of a financial transaction with $10,000 just arrived, ... ... finally, you'll grab all the forensic information you could find to diagnose.

2. All these logging are neither for Operation Team nor for Support Engineers, they are actually for the developers themselves.

  Borrowing a word our Infrastructure Architect often said: "if there is anything you wish to see when you get called at 2AM, you should log it for yourself"

3. Strong Infrastructure Support

  Obviously, to make "Log Everything All the Time" work, you need a very powerful Logging Server/Bus to consume all these generated log info, you also need some powerful Log Miner or Search Tool to help you query, filter and correlate. Fortunately, in my current working enrionment, we have these kind of infrastructure available so we have no excuse but write logging code.

  Given a production environment with firewall, load balancer, front service gateway, Application Server, and backend DB, if it is not feasible to really log everything, you should at least log any request/response coming in and out any system/component, any critical state change in any system/component, and any state change in any system/component if it is possible. Again, it is all about the traceability.

Thursday, August 16, 2007

Can you design a good benchmark to compare Apache Mina and Python Twisted?

There is an interesting blog entry Mina and Twisted Matrix benchmark with update, it claimed that Twisted out-perform Apache Mina with the blogger's benchmark.

It actually raised a very interesting question: can we design a good and fair benchmark to evaluate Apache Mina and Python Twisted?

The following are some of my rough thoughts on a might-be good benchmark to evaluate Apache Mina and Twisted for TCP based Networking Application:

1. The benchmark should run on a platform which both Mina and Twisted can utilize its advanced Non-blocking Networking feature, such as Linux with epoll, otherwise we are not really benchmark two frameworks on a fair playground

2. Since both Apache Mina and Twisted have protocol processing component built in, we can define a simple Text Protocol with a simple format like "[Header][PayLoadLength][PayLoad]" and a binary protocol sending image for protocol processing. Based on these, we can create a group of sample protocol messages with different size/length for benchmark

3. We can create a test driver which will open a TCP Connection to the server, send a sample protocol messages to the server, the server will decode the message, construct a response with a different format like "[Header][PayLoad][PayLoadLength]", and echo back the payload received and close TCP Connection, the driver can process the response and make sure that the payload is correctly returned by the server.

We can increase concurrent connections to compare total connections established, the processing time and total payload size measured at driver side

4. We can create a test driver which will open a persistent TCP Connection to the server, and keep sending protocol messages to the server and server keep echoing back the payload received in a similar fashion as 3.

We can increase concurrent connections to compare total connections established, the processing time and total payload size measured at driver side

5. Since both Apache Mina and Twisted support building Networking Client and Server, the test actually can be done in the following combination:
-- Apache Mina Client against Twisted Server
-- Twisted Client against Apache Mina Server
-- Apache Mina Client against Apache Mina Server
-- Twisted Client against Twisted Server

Friday, April 20, 2007

Python Twisted and Apache Mina, Two Great Asynchronous Network Application Frameworks

For last 10 months I have been working on a high scalable asynchronous network messaging gateway project using Apache MINA(Multipurpose Infrastructure for Network Applications), which is a Java Network Application Framework based on Java NIO.

One challenge to develop a high scalable asynchronous network messaging gateway is how you test it for functionality, load and performance. I eventually developed several Test Simulators and Drivers using Python Twisted, "an event-driven networking engine written in Python".

Twisted obviously is THE python framework if you want to develop a scalable asynchronous network application using Python. Personally, I think that Apache Mina has a very good chance to become the De facto java framework for developing NIO based network applications.

If you compare Twisted with Mina, they both adopt the "reactor pattern", they both have the goal to encapsulate/hide the complexity of Non-blocking network IO and provide a simple and elegant framework for developer to use. Actually, they are very similar at concept level as the following table shows:

Python TwistedApache MINA
ProtocolProtocolEncoder/ProtocolDecoder
LineRecevierTextLineEncoder/TextLineDecoder
protocol.ClientFactoryIoConnector
protocol.ServerFactoryIoAcceptor
DeferredIoFuture
callback chain/errback chainIoFilterChain
reactor ThreadPoolExecutorFilter