Next week Montreal’s Yellow Pages Group will welcome its first data scientist.
It will mark the end of a frenzied first phase of overhauling the advertising network’s IT infrastructure to handle big data applications to give advertisers a better look at the return on their investment, particularly its online site.
But despite having spent “multiple millions” so far consolidating 18 data centres into three, bringing an outsourced business intelligence application in-house and constructing Hadoop clusters to house it, the company’s IT director says going into big data doesn’t mean CIOs and IT managers have to re-invent the wheel.
“You want to add big data to the capabilities you have now,” Richard Langlois told a big data conference in Toronto this week. “Big data is something you address when you do data architecture, network architecture, application architecture.”
“What you know about business intelligence, data centre architecture, development, strategic planning … we use all of that and big data.”
Yellow Pages is transitioning from a largely print medium to online. Ads are not only static, he said in an interview. “We need to provide location-based content (maps, user recommendations) so you bring business to the advertiser,” he said — and it brings in revenue.
Langlois was hired at the end of 2012 to transform the company by accelerating the move to digital. Despite competition for eyeballs from Google and other online sources, today it has some 276,000 Canadian advertisers out of a potential pool of 1.2 million.
The company has two analytics applications: Anametrix, used internally; and Yellow Pages Analytics, used by advertisers to crunch the usual metrics (visitors, page views, key performance indicators, calculate potential return on investment). The trick is using big data — one table has 52 billion records — with millisecond response times.
When he was hired Yellow Pages Analytics was outsourced to a California firm. But it was decided that to bring the company fully into the digital age the application would have to be brought in-house. Langlois was given a blank cheque and 18 months to do an infrastructure overhaul, part of a four year plan.
A flow chart of Yellow Pages Analytics he showed at the conference looks like the innards of a nuclear reactor. Briefly , there are 16 servers plus two primary nodes for the Hadoop cluster; there are five data base machines in production for availability and load balancing. There will be a analytics services layer for business intelligence dashboards, to make things easier for small and mid-size companies to do analytics and so the IT department doesn’t have to do extract, transformation and load (ETL).
This is a very simplified explanation of what was done in some 14 months and doesn’t include improving the collection of data, creating new applications such as unified tracking tool and methodology. Despite the complexity — or, rather, the long list of things that had to be done — Langlois maintains big data roadmap is no different from an enterprise architecture roadmap.
It is important, he added, to decouple projects from architecture so when there’s a project change the architecture doesn’t alter.