Friday, 4 July 2014

THE EVOLUTION OF ANALYTICS

THE EVOLUTION OF ANALYTICS - PART 1

An analysis of the Business Intelligence Industry; past, present and predictions for the future.        

History of analytics

There are many examples throughout history that showcase how others have presented their data and findings to the world. The old adage that a picture is worth a thousand words rings true. For as long as people have been analysing data, they have been using visualisations to share their findings. Before the dawn of the computer age, these hand-drawn graphs were highly influential in the political and national arenas of their time. Florence Nightingale’s graphical illustration of the key causes of mortality during war, for example, showed at a glance the deaths of soldiers from preventable causes and led directly to improvements in military hospitals:



Charles Joseph Mindard’s graphic depicting the Russian campaign of 1812 showed the relationship between number of soldiers, falling temperatures and distance by soldiers, allowing for better planning on the part of military advisers:                      

                                                                 

William Playfair’s chart of 1821 compared weekly wages of a good mechanic with the price of a quarter of wheat, showing the decline in buying power of the labour force as part of his book showcasing the causes of the fall of powerful and wealthy nations:




With the onset of the industrial age, data analytics became a vital tool for business. From the first time management exercises conducted by Frederick Winslow Taylor in the late 19th century to the analytics utilised by Henry Ford’s assembly line to measure pacing of production, this field began to command more and more attention.

As computers became more prevalent in business, further developments here led to the creation of systems that would capture and make use of business data such as Enterprise Resource Planning Systems, Customer Relationship Management systems, data warehouses and a variety of hardware and software tools to further aid the cause.[i]

The Evolution of Analytics

Analytics has since grown more and more prominent. Today nearly all organisations have some sort of methodology to track and utilise their data as well as dedicated roles responsible for sustaining and managing this growth.

Analytics 1.0: Highly Scientific and only for the larger players

As more and more businesses embraced the power and competitive advantages that analytics could bring, it became obvious that a deep understanding of important business phenomena gave management a better ability to make decisions about the various processes in the organisation. It was during this era that the Enterprise Data Warehouse began to be used to capture information, with BI software developed to query and report on it.[ii]

BI has been the mainstream word used to describe the organisational intelligence software packages that are used by many companies to connect to their data. These packages come in all sorts of varieties from the simple to configure and implement to the more complicated and powerful. Out of this space a few market leaders emerged including IBM, SAS, SAP, Cognos and Qlikview.

Traditionally, the characteristics of BI environments in organisations were such that:
  • The software was configured, maintained, and administered by IT
  • Few users had broad flexibility to customize or create their own reports (most users were generally limited to pre-defined reports and prompts)
  • The vast majority of the reports generated contained some combination of grid-style data points and basic visuals such as line graphs, bar charts, and pie charts
  • The logic behind the reports was limited to what could be generated through standard SQL programming language constructs

By today’s standards, these limitations would not be acceptable to most users, but the restrictions were mainly a consequence of the power and availability of technology of the time. As the available technology has become more powerful, the demand for the outputs of BI systems has also grown, increasing the need to change how the environment works.

Analytics 2.0: The Social Media revolution and more break-throughs

The next evolution came with the advent of what is commonly referred to as “Big Data”. Firms began to amass large amounts of internet-based social media information in addition to their own internal data,[iii] giving them further insight into their customers. For the organisations that were equipped to properly analyse this data, it proved to be a valuable source of competitive advantage. During this time innovative technologies were created, acquired and mastered and revolutionary ways to handle the data volumes came about in both hardware and software technology.[iv] This includes the creation of Hadoop open software framework, cloud-based software environments, in-memory engines and NoSQL databases to name a few.

Today, the BI system has changed and is no longer governed by IT. Now there is more flexibility, and more users than ever are able to explore the data, discover new insights and share the results.

·         New, powerful BI tools breaking the restriction that users could only access the reports they were given
o   In terms of being able to experiment with new metrics and views of data, users are no longer constrained as they once were. Users now have the ability to explore their data (the term “data discovery” was coined to describe this process). All the while there are controllable limits on what data is accessible due to hardware and security limitations.

·         Advanced visualisations and interactive dashboards
o   No longer just bar and pie charts with static drill-downs – charts, graphs and maps are now explorable and can be linked to real-time data updates enabling faster insights

·         BI tools no longer constrained to just standard SQL programming language logic
o   More and more tools now have advanced analytic techniques including predictive analysis and are not restricted to standard SQL logic.  The use platforms like Hadoop and Teradata expanded the types of processing that can be applied and utilised to expand the avenues that can be explored with data.

This means the modern BI environment is no longer limited to just standard reports provided by a small team (usually connected to an IT department) but rather BI is becoming more of a self-service space, where visuals and interactivity are the norm. It leads to a blurring of the lines between IT and business users.[v]

Analytics 3.0: Empowering more users

Analytics 3.0 is seen as the next stage of the evolutionary chart for this industry and comes about when analytics becomes ingrained in almost all of an organisation’s actions. Regardless of whether that organisation makes or moves or consumes things, or produces or provides services, it will have access to information and data to report and analyse. With Analytics 3.0, the organisation can use the power of data analytics to create more valuable products and services.[vi]

In this era, the concepts of data discovery and exploration become even more important factors for organisational success, leading to greater empowerment for both internal and external users who can now “quickly plug-in, model, and analyse new data sources while still leveraging enterprise metadata and data”.[vii]

Much of the progression has been a consequence of better and faster hardware support systems, the advent of cloud computing and the move towards the availability of highly capable infrastructure via offerings such as infrastructure as a service (IAAS) and platform as a service (PAAS). With the new cloud hosted, browser-based software model, users are no longer reliant on the responsiveness of their organisation’s IT department, something that has traditionally been a source of great consternation for many users.[viii]

BI tools in general have improved and now feature self-service capabilities. The potential benefits of a self-service model include:
  •           Analysts have more time to concentrate on analysing reports as opposed to preparing them
  •           Users are empowered to discover data themselves, rather than relying on a reports team who might not have full understanding of the data
  •           Usability of reports has improved, especially for non-traditional BI users
  •           IT workload is reduced so they can concentrate on addressing any data requests more quickly and efficiently
Examples of self-service BI tools from tools like SuperDataHub, Tableau and Qlikview are shown below:









Data volumes increase dramatically

Recently there has been not only a staggering uptick in the volume of data produced and collected by businesses, but also a steady increase in the awareness of the power of data analytics.

The combined effect is that users are increasingly dissatisfied with prescriptive reports and dashboards that are handed down to them and that do not evolve. This is especially evident where the reports raise new questions, and users are then unable to obtain the answers quickly enough to take advantage of a market condition or situation.[ix]

Self-service analytics can thrive if it can keep up with user demand and is the key towards changing the perception of software from being a cost centre to being a fundamental underpinning to organisational success.

Lessons from the National Statistics Offices and the Open Data Movement

Before moving on it is important to take a look at how the providers of the biggest self-service platforms in the world have handled the growing demand for data and the lessons that can be learned.  Governments around the world have been leading the way in self-service analytics for some time, with the data they release via their National Statistical Organisations (NSOs).

Many of these NSOs have been using some sort of self-service portal to serve up the statistics they collect. This includes the Australian Bureau of Statistics, US Census, Office of National Statistics (UK) and many others. These portals mostly provide large amounts of data for dissemination purposes and use by researchers, statisticians and the public, who can access datasets if they have the appropriate accreditation.

Lessons can be learned from the way these organisations organise the protection of private information and balance that responsibility with their mandate to ensure that enough data is released so that valuable insights can be gained by researchers and the like.

A newcomer to the discussion is the Open Data Movement which, at its very essence, is about the release and dissemination of government datasets. It stems from the ideas of “Government 2.0”, which is defined as “the use of technology to encourage a more open, transparent and engaging form of government, where the public has a greater role in forming policy and has improved access to government information.”[x]

The Open Data Movement encourages the notion that government data should be freely available to everyone to use and republish as they wish, without restrictions from copyrightpatents or other mechanisms of control. The goals of the Open Data Movement are similar to those of other "open" movements such as open sourceopen hardware, open content, and open access.

As John Wilbanks, VP Science at Creative Commons, says “numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery, we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge.”[xi]

Open data sites are beginning to become more and more prominent around the world. They include the Data.Gov websites seen in many countries including the USA, Australia and UK.

There are great benefits from this model of openness for society and government. These include economic benefits from innovations made using the data as well as social benefits from more transparent governments.

The ability for data to help empower users is important but the potential for inappropriate use must be also considered.  This is especially true where the release of confidential data can lead to severe repercussions. 

To combat this, companies will usually prepare aggregated data to hide any confidential data but often this aggregated data cannot deliver the required abundance of information needed to gain insights.

In most environments, the time taken to protect and confidentialise data is time consuming, so a balance must be struck between what can safely be released and the time and costs required to prepare it.

Back to Analytics: What does all this mean?

Bearing in mind the trend towards self-service analytics, and the lessons learned from the Open Data Movement the next question we might ask is “where do we go from here?”

To attempt to answer this, we might ask what are the concerns of those involved in deploying analytical or BI solutions, and what limitations still exist with current generation software? How would these need to be addressed before it really does become the norm in all businesses and all industries?



Limitations with Current Generation technologies and options to solve them

Even with all the software advances that have improved modern analytics tools, there are still limits on the level of interaction available to users in this community.
Barriers are in place that, in varying ways, prohibit greater take up of self-service analytics and BI solutions. We look at a few here in detail and offer potential solutions based on existing products and current day practices that may not have been self-evident:

  • Problem: Insights  have been mainly the domain of specially trained staff

There is still the notion that data analytics is the domain of only a handful of specially trained staff coming from either specific educational backgrounds or having received specialised training to operate the tools needed to gain insights from the data. This notion can be traced back to the fact that a lot of the current technologies require extensive technical expertise to operate and even the simpler ones still require some coding knowledge.                                                                   


The belief is that not enough of these types of people exist so there is high demand for them but not enough supply.

A report by E-Skills (UK) and SAS sees the need for more big data specialists over the coming years. The demand for big data specialists will grow over the next 5 years by 243% to 69,000 in the UK alone.[xii]


The above graphic comes from Gartner’s 2012 report on how to deliver Self-Service BI[xiii]. It sees a divide between information consumers and power users. This line in the sand propagates the image that there are distinct differences between users in terms of involvement with data and data tools and the skill level necessary to interact with data. This leads to problems like the supply and demand of users with the necessary knowledge to do research on the data a company holds.

”The first mistake we made was in the organisational model.  Centralised, IT-dominated BI teams are not conducive to empowering end users.”[xiv]

A team that blends IT and business skills is in a much better position to service this need than a strictly IT focused one.

Solution:

Instead of waiting for users to mature via traditional methods (ie create more and more specialised users) in the new suite of analytics applications there will no longer be the need to divide users into distinct groups of power users and information consumers. With the right tools, the majority of staff members become empowered enough to be able to call themselves power users too.

However, it is important to have the right tools in place before this can happen. If the tools are still complex then users will still need to be trained to find the answers they seek. On the other hand, if the tools are intuitive, easy to use and require little training then users are more likely to become involved and start getting real benefits from data analytics.

Whilst providing technical training to users is in some way beneficial, doing so takes time and with software technologies constantly evolving, going down this path means that ongoing training and development is likely required. This will ultimately prove costly to organisations in terms of their time, money and efforts, all of which could be better spent elsewhere.

It is both more efficient and cost-effective to give users access to tools that require little to no training because they are intuitive and simple to use allowing the user to focus on more value adding tasks in the day to day running of the business.

Whenever a user starts working with a new software tool, there is always a divide between their starting state and their ability to use the software, as they would typically need training, reading and course materials and practice time to even be able to start using and finding the answers they want or need:


However, self-service models that are intuitive and easy to use can help reduce the gap and the time to reach benefits and make that divide much smaller: 



The full move to having the complexity divide much smaller is an ongoing piece of work and involves the use of further techniques like smarter predictive analytics and augmented intelligence (topics I will discuss in other blogs).

Search-based BI tools with “Google-like” interfaces allow users to get started right away exploring data with little training. Analysts do not spend substantial amounts of time preparing reports but rather, can create reports with a few clicks and provide value by spending more time providing insights into the data.

This type of environment also means changes for the typical IT service staff. They are no longer required to be heavily involved in the report building process when the right tools are in place. The important thing to remember here is that whilst business users may start to do what traditionally was the role of IT resources, there is still going to be a need for IT resources. It’s just that their role will evolve from being report providers or creators to being solely focused on data management, from custodianship, security and privacy and efficiency in getting the data to the right people.


  •  Problem: Aggregated datasets to answer organisational questions


Another issue is that although the current tools may have started to move towards a self-service model, they are only doing so over limited datasets. A lot of solutions in this space have to aggregate the information available, either for security reasons or due to the amount of time it takes to prepare large datasets for self-service dissemination.

These tools will serve up analytics to the limit of what can be achieved within hardware and software capacity. Tools like this do not necessarily connect to all the data to begin with and may require a lot of configuration in the build process rather than just being able to plug and play. Hardware, software, implementation or timing constraints mean that even with all the right accessibility and authorisations in place, a user might still be limited to looking at only a portion of the available data.

This is especially true where a program claims to be self-service for its end users but is really looking at a small sample of the data or a pre-aggregated report. The user can explore the information in the view, but if the view is limited it cannot really be considered a full self-service option. Others have already made a decision about what data is made available and what is not. And the end user might not even be aware that information is missing.

Of course an organisation needs to control what it can show but doing this in collaboration with users allows the users to set the agenda. This leads to a better user experience and less time iteratively creating and updating reports. By giving users access to most of the data available rather than a small amount, the organisation can also ensure that there is future proofing against having to create new reports when additional information is needed.

Solution:

To solve this problem, we need a software tool that automatically gives the full set of data to users so they can decide what is important. The tools must be capable of looking at entire datasets and have the ability to give this power to all users not just a select few. Any restrictions on who can see what data should only be imposed due to business rules, not hardware or software limitations.

In this system, end users create the reports they want to see. The data providers can build a few pre-packaged reports as a guide if they want, but they no longer need to handle all the report building, freeing them up for work that provides other value added benefits to the data.

The tool also needs a feedback loop for end users, to understand their data needs and ensure those needs are met in building any future self-service capabilities. 


  • Problem: Privacy Concerns


There are serious privacy challenges faced by organisations that collect and disseminate personal and business information.

While statistical information can lead to insights into trends, growth and demographics, organisations dealing with this information must be careful not to disclose private information.

In the past official statistics providers have given external researchers and analysts limited and tightly controlled access to the microdata from their censuses and surveys because of their duty to protect the privacy of their survey respondents.

Typically this controlled access takes the form of in-house or remotely accessed data laboratories or research centres, or the provision of pre-confidentialised sample files. All of these scenarios typically involve a statistics provider’s staff having to do some form of manual review and vetting of the information generated in response to a data query before it is delivered back to the researcher.

The demands to release greater volumes of data with increasing levels of detail are becoming more and more the norm, especially in light of open data policies of federal and state governments.

Experiences that were usually felt by National Statistical Organisations (NSOs) are now being felt by a lot more private and public organisations.

Ensuring confidentiality of the data gathered by an organisation is a necessity to ensure that individuals and organisations are not reluctant to provide information, and to maintain their trust.

Solution: 

There are a variety of disclosure control methods that play an important role in helping companies achieve a certain level of confidentiality. For example:

-          Aggregation – creating summary tables (“cubes”)
-          Confidentialisation of microdata - sampling or perturbing the values of data records so that an anonymous set can be safely released
-          Confidentialisation of tabular data - concealing or adjusting values in aggregate data before being released
-          Business rules - controlling the level of detail in queries using pre-defined rules
-          Trust and access control - providing more detailed information to trusted groups
            -          Monitoring - recording and reviewing the types of queries executed by users

When selecting the appropriate disclosure control methodology, organisations need to strike the right balance between making information available and meeting their privacy obligations. The ideal solution will be one that conceals just enough data to meet those obligations. Perturbation is typically the best method for achieving this.

More on perturbation can be found here: http://www.spacetimeresearch.com/s=perturbation&Submit=Search

This is a topic I will discuss in further detail in later blogs.

  • Problem: Information overload and over reliance on machine based rules

Information overload problem

Current generation technology can now capture data faster than ever before. There is a danger that users might become overwhelmed by all this information. If the ability of users to understand the abundance of reports and data out there cannot keep up with the amount of information collected, then there is as much chance of burying the useful information as of uncovering it.[xv] However this is not necessarily a problem of too much information, as long as the right tools are in place to help users manage the information.

Machine based rules problem
Additionally, in the current climate, there are a limited number of users with the capability and know-how to traverse these huge databases. This leads to another part of the information overload problem: an over-reliance on machine driven analysis.

For example, the National Security Agency in the US has a separation step for its Big Data repository that strips out “noise”. But it’s possible that what the software perceives as noise is in fact a signal; a signal that could have been seen if there was human intervention in the process.

Solution:
Software becomes part of the solution here – but it is vital that the software is easy to use. If it is easy to use it can help to create an information economy, where all members of a company have the potential to mine data. They can all add value by becoming managers of information, data miners, data analysers, and data explorers.

It simply becomes a numbers game. In the past, users struggled to understand the wealth of information because there were not enough users. By adopting tools that are easy enough for all employees to use, user numbers can increase dramatically.

Whilst it appears to be useful to create smarter systems and algorithms that can automatically find the relevant correlations in data, an over reliance on software algorithms can bring its own problems.

By increasing the numbers of competent users we can create an environment where the rules written into any data dissemination engines are reviewed and re-reviewed by many human eyes. This vastly reduces the chances that important data will fall through the cracks.


  •  Problem: BI User discussion

This last section showcases the problems noted from the “2014 Analytics, BI, and Information Management Survey”. This survey was conducted with 248 respondents answering questions on organisations using or planning to deploy data analytics, BI or statistical analysis software[xvi].

  1. 59% said data quality problems are the biggest barrier to successful analytics or BI initiatives
  2. 44% said "predicting customer behaviour" is the biggest factor driving interest in big data analysis
  3. 47% listed "expertise being scarce and expensive" as the primary concern about using big data software
  4. 58% listed "accessing relevant, timely or reliable data" as their organisation's biggest impediment to success regarding information management


Solutions:

Data quality problems

Organisations can go a long way towards eliminating data quality problems by implementing a single source of truth, and apply and maintain proper metadata practices.

To ensure a single source of truth, data captured by the business must be recorded only once and held in a single area, accessible by the different enterprise software systems. Whether those software systems are across geographic areas or not, reading from the one system ensures that everyone looks at the same consistent data and issues.

Metadata is data that serves to provide context or additional information about other data, such as information about the title, subject, or author of a document. It may also describe the conditions under which the data stored in a database was acquired, such as its accuracy, data, time, method of compilation and processing.

Proper metadata practices mean that users of the data know exactly where it is from as well as any understanding any contextual information that is necessary for analysing the data.

Furthermore, usage of the Generic Statistical Business Process Model (GSBPM), an international best practice model, will help ensure that data lifecycle management process guides organisations to get the most from their data and metadata while determining appropriate use and retention to safely navigate the various legislative minefields.[xvii]


Predicting behaviour

In order to use data to predict customer behaviour, users need to have access to appropriate analytical tools. 

For example, statistical methods and functions that have traditionally been solely available in specialised statistical software packages. As BI tools develop, these techniques are becoming more mainstream.

Of course, it is important to involve the users of the data in this process, to understand how they currently use and manipulate data to gain insights and then work out ways that software can automate all or part of that process.

Later posts will talk about the exciting world that is predictive analytics so stay tuned to those.


Expertise limitations

We have already looked at the problem of expertise limitations. As the tools become easier to use, more and more users will be able to take advantage of the power of data analytics, no longer having to rely on a select few individuals with the software expertise.


Accessing relevant, timely or reliable data

Software can help to solve the problem of reliable and up to date information. It is vitally important that the analytical tools make it as easy as possible to update the data. It must become an easily automated process, as opposed to a time consuming and highly manual exercise.

It is also important that data only needs to be updated once, in a single source of truth, rather than  having to update many different databases.


Conclusion

Being aware of the gaps and limitations of current generation technology allows software developers to look at creating the capabilities that end users will start demanding in the future.

It is clear that disclosure control will become increasingly important: data must only be made available to users with the right credentials, and the system must automate this process as much as possible, making it easier to protect data and distribute it.

Finally, the tools must be easy to use and intuitive, helping to build a smarter user base from the ground up, and increasing the number of insights that can be gained as the number of users looking at the data grows.


References 

Davenport, Thomas H.; Harris, Jeanne G. (2007). Competing on analytics : the new science of winning. Boston, Mass.: Harvard Business School Press. ISBN 978-1-4221-0332-6.


Analytics 3.0 Article Harvard Business Review - http://hbr.org/2013/12/analytics-30/ar/1



SuperDataHub – www.superdatahub.com











http://spacetimeresearch.com/products/superstar-platform/


No comments:

Post a Comment