Tuesday, April 12, 2011

Google backups all the gmails with Tape

There is a very interesting article Google goes to the tape to get lost emails back published by Fortune magazine, it reported that Google did use tape to backup all the gmail messages and the author estimated that google might use up to 200,000 tapes for all the gmail accounts.

You can also find related info at this Gmail Blog post "Gmail back soon for everyone"

Wednesday, February 23, 2011

Oracle Database 11g R2 Direct NFS Clonedb

This feature is not currently documented and most noticeably it did not use any file system snapshot technology. You can find more info at Kavin Closson's blog and presentation.

A step by step set up procedure can be found at this blog Direct NFS (DNFS) Clonedb in Oracle Database 11g Release 2 (Patchset 11.2.0.2)

Saturday, November 20, 2010

Use ZFS Snapshots to Backup and Restore Oracle DB without using Backup Mode

To backup 24*7 Oracle Production Databases, usually you have to put Oracle Database in backup mode before starting the backup and stop the backup mode at the end of backup; one side effect of Oracle Database running in backup mode is that heavy Redo Log Activities will increase the load to production sever and Redo Log will grow very rapidly at the same time; it also adds operational complexity if you want to create a point-of-time backup across Federated Databases, such as, multiple oracle databases used by SAP.

Snapshot technology has long been used by Storage Administrators to create instant point-of-time backup for storage file systems. As a technical breakthrough, Oracle has published My Oracle Support Notes Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1], officially supports creating crash-consistent snapshot of Oracle Database, which is not in Backup Mode, as a valid Oracle Backup.

If you set up Oracle DB in ARCHIVELOG Mode and put all the Oracle Control Files, Data Files and Redo Log Files under a snapshot file system, such as NetApp Data ONTAP or Solaris/ZFS, which supports preserving the Write Ordering for each file under a snapshot, you can simply create a point-of-time snapshot from the file system as a valid Oracle Backup without putting Oracle DB in backup mode, and you can restore Oracle Database back to the snapshot time by restoring the backup snapshot. It is also true if you put Federated Oracle Databases under one parent snapshot file system, you can create a snapshot as a point-of-time backup for all the Federated Oracle DBs under the file system.

If you set up an Oracle Database running on OpenSolaris/ZFS, you can use ZFS Snapshot as Oracle Backup in the following way:

1. Create "oracledb" ZFS file system with the following sub-directories:
    -- controlfiles, storing all the oracle control files
    -- datafiles, storing all the oracle data files
    -- redologs, storing all the oracle archive redo log files;

2. As Oracle Database is running, you can simply create a
point-of-time snapshot using ZFS Snapshot Command:
    #zfs snapshot oracledb@12am-11-22-2010

3. Later, you simply restore Oracle DB back to snapshot time
using ZFS Rollback Snapshot Command:
    (1). Shutdown Oracle Database
    (2). Rollback "oracledb" ZFS to snapshot time
        #zfs rollback oracledb@12am-11-22-2010
    (3). Startup Oracle Database

No matter how large your oracle database, ZFS can create snapshot instantly and guarantee the write ordering of the created snapshot, this snapshot is called Crash-Consistent Snapshot/Image, which is equivalent to a Oracle DB Server being powered off by accident, when restarting Oracle DB, Oracle DB will do an automatic instance recovery and recover database to the last committed transaction recorded in redo log file. As you can see, how simple it is to backup and restore Oracle Database with ZFS Snapshot.

(Note: If you want to learn more about My Oracle Support Notes Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1], you can read a good white paper Using Crash-Consistent Snapshot Copies as Valid Oracle Backups published by NetApp)

Saturday, August 21, 2010

Building New Storage Applications beyond BC/DR with Copy-on-Write Snapshot File System

Backup, Restore and Disaster Recovery are usually tedious IT Operations with slow tape devices; but, as Mr. Backup described in his blog post "Can you have a backup system based solely on snapshots and replication?", the new Copy-on-Write Snapshot File Systems, such as NetApp Data ONTAP and Solaris ZFS, will enable us to develop a new type of Storage Application to support fast backup, restore and recovery.

A Snapshot based Data Protection System usually takes snapshot copies of a production system periodically to protect data loss and corruption, and snapshot copies can be replicated to a remote system for BC/DR. There are several advantages using snapshots : fastest recovery since there is no data conversion, space saving since only the data changes are stored in incremental snapshots, and easy to use.

Using Snapshot Data Protection System, we can start to build new Storage Applications beyond BC/DR:

1. Managing Snapshots across local and remote storage systems, including scheduling, retention, history, monitoring and reporting

2. Integrate Snapshots with Business Applications, this combined with a snapshot manager will enable us to build a type of always-available storage application.

For example, in his post "Creating database clones with ZFS really FAST", Ronny Egners demonstrated that how easy to create a clone Oracle Database Instance when original Database Instance is running on ZFS; if you apply this idea to run Oracle Stand-by Database on ZFS, you can easily start a cloned Stand-by Oracle Database doing reports in the meantime the original database would keep itself in Stand-by mode with no disruption.

Updated on Oct 10: You can find Oracle Database Cloning Solution Using Oracle's Sun ZFS Storage Appliance And Oracle Data Guard Technical White Paper from Oracle MAA Best Practices web site, also Oracle Fusion Middleware Disaster Recovery Solution using Oracle's Sun ZFS Storage Appliance.

Saturday, July 03, 2010

Reliable Video Streaming and Distribution through Multisource CDN Delivery

For the past two years, I had been working at Netflix Streaming Infrastructure Engineering Team building two back-end servers, Watch Now Middle-tier Server (WNS) and Bravia TV Streaming Server, to support streaming Netflix Movies to PCs, XBox, PS3, STBs and TVs, such as Sony BRAVIA TV.

One key functionality of WNS is to generate dynamic Movie Stream URL pointing to CDN (Content Delivery Network), at the scale of Netflix Movie Streaming Service, Movie Content has to be distributed to CDN for devices to download and play; and since the failure of CDN, even a short period of time, would cause the problem even outage of Netflix Movie Streaming Service, Netflix has to distribute its Movie Content to multiple CDNs, and this is the key to archive reliable movie streaming:

1. This allow Netflix to shift movie streaming traffic among different CDNs for a device category if it has problem to download the movie from a CDN as mentioned in Netflix Blogs Streaming Performance and Netflix Trying for Consistent Excellence on Streaming

2. And, since the Movie Contents are available on multiple CDNs, practically you don't have to keep an in-house backup copy of all the Movie Content, cutting the cost of in-house storage system.

In general, utilizing multiple CDNs would allow people to create Content Vault Service for small internet content providers to save, distribute and deliver content reliably and it is great to know some companies have already been working on this as mentioned in Former CDN Founder Launches Plan to Disrupt the CDN Market at NewTeeVee.com

Saturday, September 20, 2008

Migrate Web Service From XFire to Apache CXF

Recently, we migrated our Production Web Services from XFire 1.2.6 to Apache CXF 2.1.1, and it went very well. So I want to share some good experience using Apache CXF.

Why We Want to Migrate:

1. XFire Project itself has been discontinued

Accoring to XFire and Celtix Merge FAQ, XFire Open Source Project has been discontinued, Apache CXF is a continuation of the XFire project and is considered XFire 2.0, which has many new features and a ton of bug fixes.

2. Using Spring 2.5 to develop Java Middle-tier Server

Spring 2.5 has become a de facto standard for Java Middle-tier Server Development, and it is also adopted internally as standard framework to build Java Middle-tier Servers. But, XFire is built and bundled with old Spring 1.0, which becomes a road block preventing us from using some new Spring 2.5 features, such as Spring AOP. Migrating to Apache CXF 2.1.1 will immediately solve this problem since Apache CXF is built on top of Spring 2 and can work with SPring 2.5 without any problem.

3. Using new CXF Features, which are not available in XFire:

  • Since Apache CXF is JAX-WS compliant, it is safe to use CXF JAX-WS Front-end, developing Web Services using JAX-WS Annotations

  • We want to inject certain HTTP Headers at both Client and Server side for network routing and load balancing, and this should be separated from the business logic of Web Services. CXF Interceptor Framework is a perfect tool to implement this kind of features

  • CXF supports both dynamic Javascript WS Client and statically generated Javascript client, this actually enables us to build a simple soapUI-like tool inside web browser and is working perfectly with our light-weight jQuery based Admin Tool/Console for monitoring and debugging

Issues We Encountered:

Obviously, there were some CXF issues we have to resolve:

1. Light-weight SOAP Message Validattion

For the backward compatability with existing XFire Client/Server, we are using CXF JAX-WS Front-end with Aegis Data Binding, we want to have a simple light-weight SOAP Message Validation as alternative to full-fledged Schema Validation. This actually is achived by adding a "XML Validation Interceptor" to the right place in the CXF Interceptor Chain. You can find more detail here from the CXF User Mailing List

2. WS Client Connection Pool

XFire WS Client is built on top Commons HTTP Client 3.0, which allows us to set up a connection pool to control the concurrent requests and the load at the client side. But, since CXF is not using Commons HTTP Client as Client Transport Layer, this is not available from CXF anymore.

This problem was partially resolved by building a simple CommonsHttpClientTransportFactory with a simple CommonsHttpClientConduit. It would be nice that CXF can provide a full-fledged CommonsHttpClientConduit. More detail can be found here and here from the CXF User Mailing List

Apache CXF Actually Performs Better:

To evaluate the performance of CXF, We did one round of load testing by puting two WS Servers behind one Load Balancer, one using XFire and another using Apache CXF, the results show that the performance of CXF is compatable with XFire.

We also run the similar deployment in the real production environment to compare the performance, and our new servers using CXF are wroking very well. We actually see the CPU Utilization on the Servers using CXF is less than the existing servers using XFire. This is a little good surprise for us since we originally expect CXF should consume more resources given the fact that CXF has to build two intercept chains, "in" and "out", for each request.