An analysis of the Business Intelligence Industry; past, present and predictions for the future.
Limitations with Current Generation technologies
and options to solve them
Even with all the software advances that have improved modern
analytics tools, there are still limits on the level of interaction available
to users in this community.
Barriers are in place that, in varying ways, prohibit
greater take up of self-service analytics and BI solutions. We look at a few here
in detail and offer potential solutions based on existing products and current
day practices that may not have been self-evident:
- Problem: Insights have been mainly the domain of specially trained staff
The belief is that not enough of these types of people exist
so there is high demand for them but not enough supply.
A report by E-Skills (UK) and SAS sees the need for more big
data specialists over the coming years. The demand for big data specialists
will grow over the next 5 years by 243% to 69,000 in the UK alone.[xii]
The above graphic comes from Gartner’s 2012 report on how to
deliver Self-Service BI[xiii].
It sees a divide between information consumers and power users. This line in
the sand propagates the image that there are distinct differences between users
in terms of involvement with data and data tools and the skill level necessary
to interact with data. This leads to problems like the supply and demand of
users with the necessary knowledge to do research on the data a company holds.
”The first mistake we made was in the organisational
model. Centralised, IT-dominated BI
teams are not conducive to empowering end users.”[xiv]
A team that blends IT and business skills is in a much
better position to service this need than a strictly IT focused one.
Solution:
Instead of waiting for users to mature via traditional methods (ie create more and more specialised users) in the new suite of analytics applications there will no longer be the need to divide users into distinct
groups of power users and information consumers. With the right tools, the majority
of staff members become empowered enough to be able to call themselves power
users too.
However, it is important to have the right tools in place
before this can happen. If the tools are still complex then users will still
need to be trained to find the answers they seek. On the other hand, if the
tools are intuitive, easy to use and require little training then users are
more likely to become involved and start getting real benefits from data
analytics.
Whilst providing technical training to users is in some way
beneficial, doing so takes time and with software technologies constantly
evolving, going down this path means that ongoing training and development is
likely required. This will ultimately prove costly to organisations in terms of
their time, money and efforts, all of which could be better spent elsewhere.
It is both more efficient and cost-effective to give users access
to tools that require little to no training because they are intuitive and
simple to use allowing the user to focus on more value adding tasks in the day
to day running of the business.
Whenever a user starts working with a new software tool, there
is always a divide between their starting state and their ability to use the
software, as they would typically need training, reading and course materials
and practice time to even be able to start using and finding the answers they
want or need:
However,
self-service models that are intuitive and easy to use can help reduce the gap
and the time to reach benefits and make that divide much smaller:
Search-based
BI tools with “Google-like” interfaces allow users to get started right away
exploring data with little training. Analysts do not spend substantial
amounts of time preparing reports but rather, can create reports with a few
clicks and provide value by spending more time providing insights into the
data.
This type of environment also means changes for the typical
IT service staff. They are no longer required to be heavily involved in the
report building process when the right tools are in place. The important thing
to remember here is that whilst business users may start to do what
traditionally was the role of IT resources, there is still going to be a need
for IT resources. It’s just that their role will evolve from being report
providers or creators to being solely focused on data management, from
custodianship, security and privacy and efficiency in getting the data to the
right people.
- Problem: Aggregated datasets to answer organisational questions
Another issue is that although the current tools may have
started to move towards a self-service model, they are only doing so over
limited datasets. A lot of solutions in this space have to aggregate the
information available, either for security reasons or due to the amount of time
it takes to prepare large datasets for self-service dissemination.
These tools will serve up analytics to the limit of what can
be achieved within hardware and software capacity. Tools like this do not
necessarily connect to all the data to begin with and may require a lot of
configuration in the build process rather than just being able to plug and
play. Hardware, software, implementation or timing constraints mean that even
with all the right accessibility and authorisations in place, a user might
still be limited to looking at only a portion of the available data.
This is especially true where a program claims to be
self-service for its end users but is really looking at a small sample of the
data or a pre-aggregated report. The user can explore the information in the view,
but if the view is limited it cannot really be considered a full self-service
option. Others have already made a decision about what data is made available
and what is not. And the end user might not even be aware that information is missing.
Of course an organisation needs to control what it can show
but doing this in collaboration with users allows the users to set the agenda. This
leads to a better user experience and less time iteratively creating and
updating reports. By giving users access to most of the data available rather
than a small amount, the organisation can also ensure that there is future
proofing against having to create new reports when additional information is
needed.
Solution:
To solve this problem, we need a software tool that
automatically gives the full set of data to users so they can decide what is
important. The tools must be capable of looking at entire datasets and have the
ability to give this power to all users not just a select few. Any restrictions
on who can see what data should only be imposed due to business rules, not
hardware or software limitations.
In this system, end users create the reports they want to
see. The data providers can build a few pre-packaged reports as a guide if they
want, but they no longer need to handle all the report building, freeing them
up for work that provides other value added benefits to the data.
The tool also needs a feedback loop for end users, to
understand their data needs and ensure those needs are met in building any
future self-service capabilities.
- Problem: Privacy Concerns
There are serious privacy
challenges faced by organisations that collect and disseminate personal and
business information.
While statistical information can
lead to insights into trends, growth and demographics, organisations dealing
with this information must be careful not to disclose private information.
In the past official statistics
providers have given external researchers and analysts limited and tightly
controlled access to the microdata from their censuses and surveys because of
their duty to protect the privacy of their survey respondents.
Typically this controlled access
takes the form of in-house or remotely accessed data laboratories or research
centres, or the provision of pre-confidentialised sample files. All of these
scenarios typically involve a statistics provider’s staff having to do some
form of manual review and vetting of the information generated in response to a
data query before it is delivered back to the researcher.
The demands to release greater
volumes of data with increasing levels of detail are becoming more and more the
norm, especially in light of open data policies of federal and state
governments.
Experiences that were usually
felt by National Statistical Organisations (NSOs) are now being felt by a lot
more private and public organisations.
Ensuring confidentiality of the
data gathered by an organisation is a necessity to ensure that individuals and
organisations are not reluctant to provide information, and to maintain their
trust.
There are a variety of disclosure
control methods that play an important role in helping companies achieve a
certain level of confidentiality. For example:
-
Aggregation – creating summary tables (“cubes”)
-
Confidentialisation of microdata - sampling or
perturbing the values of data records so that an anonymous set can be safely
released
-
Confidentialisation of tabular data - concealing
or adjusting values in aggregate data before being released
-
Business rules - controlling the level of detail
in queries using pre-defined rules
-
Trust and access control - providing more
detailed information to trusted groups
When selecting the appropriate disclosure control
methodology, organisations need to strike the right balance between making
information available and meeting their privacy obligations. The ideal solution
will be one that conceals just enough data to meet those obligations. Perturbation
is typically the best method for achieving this.
More on perturbation can be found here: http://www.spacetimeresearch.com/s=perturbation&Submit=Search
This is a topic I will discuss in further detail in later blogs.
- Problem: Information overload and over reliance on machine based rules
Information overload problem
Current generation technology can now capture data faster
than ever before. There is a danger that users might become overwhelmed by all
this information. If the ability of users to understand the abundance of
reports and data out there cannot keep up with the amount of information
collected, then there is as much chance of burying the useful information as of
uncovering it.[xv] However this is not necessarily a problem of too
much information, as long as the right tools are in place to help users manage
the information.
Machine based rules problem
Additionally, in the current climate, there are a limited
number of users with the capability and know-how to traverse these huge
databases. This leads to another part of the information overload problem: an
over-reliance on machine driven analysis.
For example, the National Security Agency in the US has a
separation step for its Big Data repository that strips out “noise”. But it’s
possible that what the software perceives as noise is in fact a signal; a
signal that could have been seen if there was human intervention in the process.
Solution:
Software becomes part of the solution here – but it is vital
that the software is easy to use. If it is easy to use it can help to create an
information economy, where all members of a company have the potential to mine
data. They can all add value by becoming managers of information, data miners,
data analysers, and data explorers.
It simply becomes a numbers game. In the past, users
struggled to understand the wealth of information because there were not enough
users. By adopting tools that are easy enough for all employees to use, user
numbers can increase dramatically.
Whilst it appears to be useful to create smarter systems and
algorithms that can automatically find the relevant correlations in data, an over
reliance on software algorithms can bring its own problems.
By increasing the numbers of competent users we can create an
environment where the rules written into any data dissemination engines are
reviewed and re-reviewed by many human eyes. This vastly reduces the chances that
important data will fall through the cracks.
- Problem: BI User discussion
This last section showcases the problems noted from the “2014
Analytics, BI, and Information Management Survey”. This survey was conducted with 248 respondents answering questions
on organisations using or planning to deploy data analytics, BI or statistical
analysis software[xvi].
- 59% said data quality problems are the biggest barrier to successful analytics or BI initiatives
- 44% said "predicting customer behaviour" is the biggest factor driving interest in big data analysis
- 47% listed "expertise being scarce and expensive" as the primary concern about using big data software
- 58% listed "accessing relevant, timely or reliable data" as their organisation's biggest impediment to success regarding information management
Solutions:
Data quality problems
Organisations can go a long way towards eliminating data
quality problems by implementing a single source of truth, and apply and
maintain proper metadata practices.
To ensure a single source of truth, data captured by the
business must be recorded only once and held in a single area, accessible by the
different enterprise software systems. Whether those software systems are
across geographic areas or not, reading from the one system ensures that
everyone looks at the same consistent data and issues.
Metadata is data that serves to provide context or
additional information about other data, such as information about the title,
subject, or author of a document. It may also describe the conditions under
which the data stored in a database was acquired, such as its accuracy, data,
time, method of compilation and processing.
Proper metadata practices mean that users of the data know
exactly where it is from as well as any understanding any contextual
information that is necessary for analysing the data.
Furthermore, usage of the Generic Statistical Business
Process Model (GSBPM), an international best practice model, will help ensure
that data lifecycle management process guides organisations to get the most
from their data and metadata while determining appropriate use and retention to
safely navigate the various legislative minefields.[xvii]
Predicting behaviour
In order to use data to predict customer behaviour, users
need to have access to appropriate analytical tools.
For example, statistical
methods and functions that have traditionally been solely available in
specialised statistical software packages. As BI tools develop, these
techniques are becoming more mainstream.
Of course, it is important to involve the users of the data in
this process, to understand how they currently use and manipulate data to gain
insights and then work out ways that software can automate all or part of that
process.
Later posts will talk about the exciting world that is predictive analytics so stay tuned to those.
Expertise limitations
We have already looked at the problem of expertise
limitations. As the tools become easier to use, more and more users will be
able to take advantage of the power of data analytics, no longer having to rely
on a select few individuals with the software expertise.
Accessing relevant, timely or reliable data
Software can help to solve the problem of reliable and up to
date information. It is vitally important that the analytical tools make it as
easy as possible to update the data. It must become an easily automated
process, as opposed to a time consuming and highly manual exercise.
It is also important that data only needs to be updated once,
in a single source of truth, rather than
having to update many different databases.
Conclusion
Being aware of the gaps and limitations of current
generation technology allows software developers to look at creating the
capabilities that end users will start demanding in the future.
It is clear that disclosure control will become increasingly
important: data must only be made available to users with the right
credentials, and the system must automate this process as much as possible,
making it easier to protect data and distribute it.
Finally, the tools must be easy to use and intuitive, helping to
build a smarter user base from the ground up, and increasing the number of
insights that can be gained as the number of users looking at the data grows.
References
NB: Numbering to be fixed up and numbered later
No comments:
Post a Comment