With enterprises trying to squeeze more out of every bit of data they collect, making sure data from various sources is consistent is crucial to avoid duplication and aid machine processing.
To help expand knowledge in the field business and legal news provider Thomson Reuters is expanding its relationship with the University of Waterloo by funding a research chair in data cleansing, to be held by professor Ihab Ilyas of the university’s Cheriton School of Computer Science.
His work includes investigating new methods for storing, cleaning, and curating data. By holding a research chair Ilyas will continue focusing on integrating and curating data in an effort to overcome the problem of data silos, and help businesses make better use of their data.
The research chair is an extension of work on structuring unstructured data the company and the university have already started, said Brian Zubert, director of Thompson Reuters’ Waterloo Lab, which is located in the city’s Communitech startup incubator. The year-old lab, one of three TR has in the world — the others are in London and Boston, with two more coming this year — has seven full-time data scientists with a mandate to get close to TR customers and academic and startup talent.
Some corporate datasets are “dirty,” Zubert explained — text may refer to Microsoft as MS, for example, or use “U” as a shorthand for “you.” This is a particular problem in older databases where data fields were shorter than they are today and users had to use abbreviations.
For machine processing data has to be well-structured, clean, consistent and accurate, he said, otherwise it’s impossible for users to make decisions from it. “Accuracy becomes extraordinarily important, and that’s where research like this really plays well not only at Thomson Reuters but also our customers.”
It was one of several announcements the two institutions made Wednesday to extend their partnership. Other moves include:
·Thomson Reuters is providing a deal on software licences so over 600 students and researchers at the faculty of mathematics and the school of accounting and finance at Waterloo can use its Eikon financial market transaction platform. That platform will give students with the opportunity to apply financial theory to practice, Zubert said;
·Joint collaboration with the Problem Lab, a new program run by Larry Smith, author and adjunct associate professor with the Waterloo’s department of economics and the Conrad Centre. The Problem Lab helps students find and understand real-world problems, which can aid those who want to create startups;
·Four positions for undergraduate and graduate level UWaterloo students (three co-op: UX design, data science, and startup engagement, and one PhD internship: data science) will be created in the TR Waterloo lab. In addition, it will create a position for a full-time position for a master’s or PhD graduate.
All this adds to TR and UWaterloo collaborations in research of e-discovery, and helping students in masters’ programs.
In addition to its lab in Waterloo, Thomson Reuters recently expanded its research and development operations in Canada by opening a centre for cognitive computing in Toronto. It’s now looking for experienced data scientists and technology researchers.
“By tapping into some of the brightest talent from the university at earlier levels you’re able to build your own talent pipeline and at the same time help improve the skills of students while they are in school so they have what you need when you want to hire them full time,” Zubert said.