Site icon IT World Canada

A look at Facebook’s efficient information architecture

Facebook is truly a success story not only in the social networking space, but in the technology sector. It dwarfs IBM by market capitalization (US $212 billion versus $161.6 billion for IBM). It keeps itself relevant through acquisitions, and is constantly innovating. It is the innovation in the back-end that are valuable lessons for enterprises. By keeping the backbone running efficiently and cheaply, Facebook ensures it meets its long-term infrastructure needs. What is Facebook doing?

Meeting transactional needs

Facebook processes 930 million daily photo uploads daily, from 1.35 billion users. There are 6 billion likes and 12 billion messages sent daily. There is clearly a high load on its data centers, which means high heat creation. Facebook lowers its cooling costs by using cold outdoor air.

Open Compute Project

In 2011, Facebook started the Open Compute Project. Though by 2013 the designs were not yet for live data centres, the goal of the project is about improving energy efficiency. Another way Facebook drove efficiency goals is through HHVM. HHVM works through just-in-time (JIT) compilation approach to obtain superior performance. Together with HPHP, which optimizes PHP source code, Facebook caches database and web data, with code running very efficiently. Facebook generates a five- to six-fold efficiency advantage with HHVM over Apache. Overall, at peak demand, data centers are still able to perform when servers are under full load.

Efficiency with News feed rack

Facebook keeps recent activity stored in RAM. The leaf aggregator, which has the data, takes into account the ranking algorithm results, then consolidate the information when responding to a request. When the aggregator makes a query, it is sent out in parallel to 40 servers in a rack. The system gathers data from the subset of data, ranked by user interest, then displayed.

Data storage

Facebook uses HADOOP for the data warehouse involved with data processing. Since data I/O (input/output) is intense for functions like graph search, Facebook has flash sled storage. Each unit has between 256 – 512 GB RAM. A 20 computer server could then have 320 CPU cores, 3 terabytes of RAM, and 30 TB of flash.

 

Image courtesy of Shutterstock.com

 

No virtualization

Facebook does not appear to use virtualization in its architecture. The company’s site is constantly active, so there are hardly ever times servers are idle. Virtualization makes sense when there is idle workload. In that case, the system could compact server count by eliminating or shutting down the idle ones. Conversely, Facebook builds with scale in mind. Balancing server utilization results in greater efficiency.

Future expectations

There is a trend for faster NICs (Network interface controller). Network bandwidth will become a limiting factor for data centres. Network speeds will need to advance. This will transform the way data centers will run. For Facebook, faster network speeds will mean the firm changes the way it writes the code for services.

 

Exit mobile version