By Precise
We spoke by phone with Julie Lockner, senior analyst and vice president with ESG, to get her take on the latest trends in data management. Lockner is a leading industry expert in the structured data management market, focusing on database and database management, data warehouse and business intelligence, and transaction processing solutions. She shares her views on the top data management challenges for IT today, and why CIOs need to plan ahead for “Big Data.”
Precise: What are some leading trends in data management right now?
JL: Data analytics is among the top five priorities for IT as businesses are looking to be more competitive, lower costs and drive revenues. I am particularly interested in how Big Data will have an impact on existing data analytics processes and platforms, and how companies that have already invested in Oracle and other architectures plan to integrate Big Data analytics systems. Newer frameworks like (Apache) Hadoop are about processing massive volumes of data using standard server-grade compute clusters. Another trend of interest is the rising importance of performance and the need for more real-time analytics. In surveys, when we ask about IT challenges for managing databases, performance is one of the top three issues. When you are handling and processing more data, performance often suffers.
Precise: What’s your view of “Big Data” and what does this mean for CIOs?
JL: The term originates back to the open-source search engine project called Nutch that was intended to better index big data volumes. Today, it means much more. Vendors began to take advantage of the term and have applied it to storage, networks and analytics. At ESG, we define Big Data as a data set that exceeds the boundaries and size of normal computer processing. The boundary is vague, because today it might be 2 terabytes (TB) of data but in time, that will change. If you’re trying to run a process and the data set is large enough so that you cannot get the results when you need them or if you have really complex data sources or content with complex or non-relational structure, you may need a different approach. If you just add CPU to solve the problem that can get expensive very quickly. Using a grid-based or cluster based compute environment is much cheaper from a capital expenditure perspective. CIO’s need to recognize when their data volumes are starting to affect performance and IT’s ability to meet business’ requirements in a timely and cost effective manner.
Precise: What’s the role of Hadoop in Big Data?
JL: Hadoop is open source software comprised of a map-reduced programming framework and a distributed file system storage foundation, HDFS, which takes large data sets and stores it across a cluster. MapReduce takes a program written to run on Hadoop and runs it where the data is located rather than moving the data to the computer. You don’t need any special hardware. The advantage is that you can do multiple tasks at once. I like to give the example of a tour bus that arrives at McDonald’s. If there is only one person taking orders, it’s going to take a while before you get served. With more people working the counter, the tour bus of people will get served faster. If you brought the servers onto the tour bus, the people never have to leave their seats. Along the same lines, using Hadoop, you can serve many “orders” at once without moving the data, which is a lot more efficient when data volumes are massive. A common characteristic of organizations which adopt Hadoop is that they have hundreds of databases and hundreds of terabytes or more of database data.
Precise: Hadoop is a free open source development platform. So what are the associated costs of using it?
JL: It can be much more cost effective than adding databases and hardware if the application and data is suited for a Hadoop-based application. If the data needs to be transformed unnaturally and processed or analyzed all at once, you could spend $1 million on an Oracle system versus something like $300,000 on a Hadoop cluster. You will still incur development costs to build the application and support it, and there is risk associated with developing an application that is based on an emerging platform. Finally, the number of skilled developers experienced with large-scale Hadoop deployments are still low, compared with the population of Oracle DBAs.
Precise: How is Big Data and Hadoop affecting the vendor community?
JL: Just as with Linux, vendors will start to develop their own flavors. IBM, Oracle, and EMC will build their own versions of Hadoop. Server vendors will end up making a big play here and storage vendors will have integrated solutions.
Precise: What will be the relationship between Big Data and application performance management?
JL: If you put more data on the same infrastructure, performance suffers first. APM will help identify applications that are starting to buckle under processing demands, and which may need a different architecture. As Big Data frameworks become more pervasive, APM vendors will need to provide extensions to these frameworks in their dashboards. In the short term, IT may need another monitoring tool for Hadoop, but ideally, they’re going to want a single performance management solution. APM vendors will need to adapt to those requirements and support technologies such as Hadoop. My next research project will be about how IT will incorporate Big Data strategies into their existing infrastructure. How will they manage these environments and what new tools will they need? Right now, it could incur significant costs for an organization to take on one of these projects. Research and government organizations have been doing this for a long time, but it’s all new to businesses. Will there be a payoff? That’s what I’d like to find out.

By Assaf Sagi, Product Manager for Composite Applications, Precise
By Tim Gifford, Federal Account Executive, Precise




