Confessions of an Analytics junkie: May 2014

Friday, 30 May 2014

Art of Knowledge - Episode 4

The title of this video stood out to me as I did my daily trawl through YouTube. There's so much content out there in the online space that I have doubts that I will ever be able to learn all of it. Much like the knowledge that we are in a cosmos that is vast and infinite and we do indeed look insignificant but I think that it is this insignificance that is special and that must be embraced. If life really does look insignificant then take the good out of that. The limited nature of the world we occupy, the life we live and the time we have should push us to realise that a life spent worrying is a wasted one. Focusing on negatives is a waste and what we thrive on should be the positive, even if they are just silver linings in a cloud of negativity.

Anyway, this little video I'd like to share is all about your life's purpose. It's an amazingly simple idea that uses questions as the tool by which you can understand who you are and where you're going.

Check the video out here:

So, if after watching that video there is a disconnect between where you think you should be heading and what your answers tell you, then maybe it's time for a re-think and re-focus on what will give you the most joy.

As a bit of fun I even did it and am more than happy to share:

Q: Who am I?
A: Mark Monfort

Q: What do you love to do?
A: I love to teach people about analytics

Q: Who do you do this for?
A: Mainly businesses but also anyone involved in data and more importantly anyone willing to learn

Q: What do they want or need?
A: This one is a bit tougher and I think the answer will change over time especially as I gain more experience and learn. For now I have a strong sense that the people I do this for want the ability to get better insights out of their data and find the meanings and patterns there that will deliver the most value to them.

Q: How do they change or transform as a result of what they get from you?
A: I believe they can take better actions, make better decisions, ask better questions when it comes to the decision to buy analytics tools, re-think processes they are doing now and kick-start the learning process that so many businesses need if they are to survive.

In doing something like this it all boils down to why, why, why. We become less and less enthused and curious with the world around us as we grow up and it's a shame. The status quo doesn't need to be if we can find a better way and most importantly data can teach us this.

This isn't to say that this is all about how we find life's purpose in the business world or in analytics particularly but more about how we find purpose in whatever it is that we do.

Not sure if people here are familiar with the term "Elevator pitch", but it's a useful thing to know and a hard thing to create. Whenever people ask you about what you do and you have to stop and think or even if you do say something it is long-winded (and boring), then the elevator pitch is something for you. Essentially it's a way of boiling down what you do into it's very essence and in that raw state it's quite clean, simple and surprisingly effective.

Videos like this help you create that pitch. So what's mine?

"I help people get the best insights from their data so that they can enrich their lives and the lives of others"

Something like this in your line of work, whatever it is, will definitely beg the asker to question some more. They ask you how do you do that. Here's what I say.

"I do this through helping them transform what they do from being people who work tirelessly on the data to being people who have the data work for them"

Again this begs a further question, perhaps one of why do you do this?

"I do this because I am passionate about solving problems and I believe that technology and a command over data through leading edge analytical software is the way forward"

Whether it's pitching your company, going to a job interview, or telling someone you've just met what you do, this video will definitely help.

The most important thing to do with all this though is to share, share with others and get them to share with others further. If we as a society can say what we do, and do it clearly and with passion then the world will indeed be a better place.

Monday, 26 May 2014

Art of Knowledge Series - Episode 3

In this video we see General Stanley McChrystal discuss the need for active sharing of knowledge and how it can lead to better outcomes than keeping this information behind closed doors.

It's a lesson that the US Military learned the hard way but is useful in many other facets of business especially as more and more, data becomes an asset for organisations.

General McChrystal shows how they had to change the culture, significantly and move from knowledge is power to where sharing is power. Instead of 'need to know' they moved to 'who needs to know and how do we get information to them'.

This is true for other organisations too and ultimately the benefits of sharing outweigh the danger of withholding information.

I see a huge need for more sharing of data in the private and public sector we have today and of course there is no blanket approach to doing this. Each organisation needs to work out their own system and their own audience. Information is only valuable when it gets to the people who can take the right course of action because of it.

Technology is making it easier to share knowledge and the ideas that come out of this sharing will be because organisations took those steps to make the data available.

Predictive Analytics

In this post I look at Predictive Analytics from it's history till now and discuss some of the rules that should be used when assessing potential solutions. This particular topic is important because more and more I am seeing opportunities for this type of analysis to help not just these firms but the community at large.

History of Predictive Analytics
The field of Predictive Analytics is one that is seen as a next step in the evolution of Business Intelligence software and capabilities. However, it is not a completely new idea as it was seen as a sub component of the Expert Systems arena of the 70's, 80's and 90's. Let's have a look at that history for a moment.

According to Wikipedia an Expert System is an "a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, represented primarily as if–then rules rather than through conventional procedural code....Expert systems were among the first truly successful forms of AI software" http://en.wikipedia.org/wiki/Expert_system

Some examples of expert systems and the problems they addressed are shown here

Category	Problem Addressed	Examples
Interpretation	Inferring situation descriptions from sensor data	Hearsay (Speech Recognition), PROSPECTOR
Prediction	Inferring likely consequences of given situations	Pretirm Birth Risk Assessment^[34]
Diagnosis	Inferring system malfunctions from observables	CADUCEUS, MYCIN, PUFF, Mistral^[35]
Design	Configuring objects under constraints	Dendral, Mortgage Loan Advisor, R1 (Dec Vax Configuration)
Planning	Designing actions	Mission Planning for Autonomous Underwater Vehicle^[36]
Monitoring	Comparing observations to plan vulnerabilities	REACTOR^[37]
Debugging	Providing incremental solutions for complex problems	SAINT, MATHLAB, MACSYMA
Repair	Executing a plan to administer a prescribed remedy	Toxic Spill Crisis Management
Instruction	Diagnosing, assessing, and repairing student behavior	SMH.PAL, Intelligent Clinical Training,^[38] STEAMER^[39]
Control	Interpreting, predicting, repairing, and monitoring system behaviors	Real Time Process Control,^[40] Space Shuttle Mission Control^[41]

http://en.wikipedia.org/wiki/Expert_system

So what happened to these systems? In as much as they provided huge amounts of competitive advantage they also had disadvantages at the time. One of the most common problems known as the of knowledge engineering problem. Expert systems, especially the ones solving complex problems such as for credit card fraud detection required huge investments in the form of experts and their time. These experts would build the required rules so that these expert systems could be effective. Not every organisation could afford to maintain such a resource. Additionally, the ability to maintain and integrate such large systems along with limited technologies proved to be the limiting factor on these systems that were really ahead of their time.

In terms of making a comeback though, the expert systems can assist in a variety of areas especially with networked systems in place that can help overcome the limitations of the past.

Predictive Analytics Providers
There are a number of Predictive Analytics tools available coming from all sorts of IT software vendors large and small including those from SAS, SAP, IBM to Azavea and even open-source software R.

More here - http://en.wikipedia.org/wiki/Predictive_analytics

What can Predictive Analytics Solve?
There are a number of areas where Predictive Analytics is already being used or could prove to be quite useful. A few examples of where this can help various industries are shown below:

Law Enforcement
Companies such as IBM or Azavea (www.azavea.com) have created tools that process historical crime information along with geographic, demographic and other data such as weather information to build a forecast of criminal patterns into the future. In practice this would mean a system that tells officers to be at a certain location at a specific time frame as there is a high statistical probability of criminal activity during that time. This helps better prioritize the missions of Police Officers on the front line and has been proven to reduce crime rates in cities they've been deployed in.

Emergency Services
Similarly, Emergency Services can use models much like in Law Enforcement to provider better ability to plan for where their assets need to be located and at what times. Load forecasting capabilities in Azavea allow these services to be forecast to show volumes of incidents expected in various areas of a jurisdiction. Early Warning and Risk Forecasting can also be achieved in models such as from Azavea.

Education
In the Education sector, universities can use predictive analytics to give them forward forecasting capabilities over targeted markets such as international students. Feeding in information that can effect the choice of students from overseas to study here is important as it means that fluctuations in the Australian dollar versus foreign currencies today can have repercussions months into the future. Having tools that provide these capabilities mean better ability for universities to target various market segments and improve their load and retention capabilities.

Health
In the Health sector, Predictive Analytics can be used to improve patient care and reduce costs. Risks to patients or healthcare providers is better forecast and per-patient predictions can lead to better treatment decisions.

Furthermore, there are some more general applications of predictive analytics.

Predictive Search
Examples include personalisation of advertising to the point where it is based on your social media relationships or location or even your in-store or buying behaviour. Companies that are involved in this type of learning include the likes of Facebook, Evernote and Google. Google in this case, provides contextual information that relates to the way you interact online.

Transaction profiling
This sort of technique relies on compiled datasets of user behaviour over time and enables the software to accurately assess for fraud or credit risk within transaction systems. This is quite useful at large bank or lending organisations and is highly complex..

The goal of Predictive Analytics
The point of any report is what question it is answering and in the case of Predictive Analytics, that's all about answering 'What will happen next?'.

Vendors that have the capability to do this and more will ultimately create more value for clients than those who don't.

Furthermore, the following diagram shows the 5 stages of reporting capability. Whilst Predictive technologies are purported as the next wave of Business Intelligence features they are merely the next step in the evolution.

As shown above, the real goal is the activation and application of these predictions and embedding and employing these capabilities to the rest of the organisation. This ensures a greater ability to deliver decision
based actions thanks to better tools.

The combination of better systems and abundant data should lead to better ability for analysts to actually analyse the data than spend most of their time chasing it.

10 simple rules for getting the most out of Predictive Analytics
These have been built based on my own experience in the Business Intelligence community and from work and knowledge of very capable vendors like Azavea (www.azavea.com).

Some of the rules can relate to more mainstream Business Intelligence issues as well and a lot are common sense but as I've learned throughout my career, documentation is king!

1. Garbage in, garbage out
This is the starting point and the most important part of the whole idea of Predictive Analytics, the data you will use.

It is not simply a matter of the more data you have the better. Just because you have a Big Data source does not mean it is useful.

The quality of the data is of utmost importance because with any system, the bad data you put in becomes the bad results you see at the end.

Systems that handle this well are the ones that have a good handle on their metadata

2. Control
"A good biz decision trumps a good algorithm" - http://abbottanalytics.blogspot.com.au/2013/11/a-good-business-objective-beats-good.html

The algorithm is only as good as the analyst controlling the software. The user cannot be taken out of the picture completely because no amount of algorithm complexity can account for the complexity of human interactions and for every 1,000 rules built into a system there are bound to be those cases that were not accounted for.

Human intervention must be possible in any system as a fail-safe of control.

3. Study the greats, and adapt
A great place to start for any firm looking to dive into Predictive Analytics is to look at how other companies and industries are solving problems using this capability. If you can find a common theme that their model addresses and relate it to yours it can save hours of time spent trying to build something from the ground up.

For example, the components that make up a Predictive solution for Law Enforcement can work equally well in an Emergency Services situation or even something else like Education. The tenet that holds them together is the idea of loading various factors of human behaviour into a solution that finds the required correlations.

Adaptive technologies are well suited to this as it would be quite restrictive to choose a model that is not configurable to work in other industries.

4. Pictures are worth a thousand words
It's very important to visualise the data and the innovations over doing Predictive Analytics coupled with geographic mapping is highly important.

Whilst trendlines on graphs also give key insights, mapping the forecasting outcomes can provide clearer pictures to those who need it such as officers in a Law Enforcement.

The predictions used in these areas whilst not necessarily needing mapping, could still be better served with other optimisation techniques such as enhancements that help provide better decision analysis -

5. Think long term fix, not short term band-aid
Organisations looking at this should have a long term goal in mind for their use of Predictive Analytics and tie the goal to their overall business strategy.

This helps gain buy-in from rest of the organisation as this field of work is done best drawing on the expertise of others both within and outside of the organisation.

6. Create a feedback loop
This is necessary as it allows for the system to continuously improve itself. The feedback loop from those who create gather the data and create the data outputs to those who use it is important. This is because these interactions highlight the importance of human behaviour which is essentially what a lot of predictive solutions rely upon. It's about using human behaviour to solve the puzzle that is human behaviour and no amount of system can do this without having a proper feedback system.

In the Law Enforcement example this works when Commanders, Analysts and Officers all work together towards a common goal on certain crime types and communicate regularly to update each other of new datasets to look out for or feedback when certain predictions are not effective.

Continuously performing these checks and balances is important to be able to assess the validity of the system.

7. Data, data everywhere, not a drop to drink
The more data that can be obtained the better the predictions and this improved accuracy is important.

Newer resources of information are being created everyday and some organisations are looking at the vast amounts of data provided by collecting information from the Internet of Things (http://en.wikipedia.org/wiki/Internet_of_Things)

As this becomes prevalent and more and more items are being measured and data collected this becomes a new source of insight into behaviour.

8. Measure, measure, measure... the data and the vendor
Measuring the data is important because it relates to how accurate the system is and can lead to being able to calculate the Return on Investment (ROI) of the solution in place. It can also provide a broader idea of the opportunity cost associated with not putting a Predictive Analytics capability in place.

Equally, the software vendors must be measured too. When talking to vendors, organisations should see if they have analysis examples that show the efficacy of their solution. This proves how serious a vendor is and also shows the motivation of the vendors.

Why is this important? Because it is best to be aligned to a vendor that has similar goals whether that be public safety, or another. This shows whether they are thinking of long term solutions or just short term fixes that will eventually prove costly.

9. Connections, abundant connections
The systems you work with must be able to take advantage of new forms of connections to databases and files. So architecture styles like RESTful and open standard formats like JSON are highly important because the more connectible the Predictive Analytics software is, the more data it will be able to connect to.

Furthermore, data from various custodians or providers such as ESRI for mapping, the ABS for statistical data or even the Bureau of Meteorology for weather data must be brought in to make some of these analysis useful and more accurate. Some of these agencies already provide their own types of connections so systems that can incorporate these will be able to help your organisation get to your solution faster than anyone that has to build things from the ground up.

10. Privacy, it matters
So the big elephant in the room is the idea of privacy and when does predictive get too much.

There is the story of Target in the US that sent out unsolicited advertisements to a woman they predicted was pregnant because of their system sighting a change in buying behaviours. http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html

Privacy will always be a key theme and one that must be balanced delicately. The safety of a community and the safety of personal information must be of utmost importance to any firm that goes into this field of analysis and intelligence.

Proper procedures must be put in place to help avoid issues with Predictive Analytics and this will allow organisations

Conclusion
The evolution of analytics has certainly come a long way but it is from ideas of the past that a lot of these ideas stem. The restrictions on hardware and software may have held them back but the pioneers of the 70's - 90's certainly knew the potential for success that smarter systems and Predictive Analytics could offer.

So as technologies continue to evolve and become more aligned to human needs the ability to better network the various sources of information to provide faster and more accurate forecasts will help transform society.

I hope that organisations that do this, do it with the best of societal intentions in mind and maintain the long reaching goals that will help them and the community they serve.

The world is changing because of analytics and I can certainly see it's for the better.

Special thanks to Jeremy Heffner of Azavea for his help and insights into this article.

Tuesday, 20 May 2014

Crunching the Budget

So the guys over at the Crawford School of Public Policy have been crunching the budget numbers and it's quite interesting to see how various family's will be affected (Click here). It definitely looks like the higher income earners aren't going to be affected all that much.

For more on the story have a look at their report in detail: Click here

Also have a look at this interactive graph of some of the data they collected:

I've also attached some more information from the ABS showing the number of households on various income brackets ($pw). Click the drop down on the top left hand side to see how various brackets have changed over time:

Monday, 19 May 2014

Statistics for a better world - Part 1

So my company Space-Time Research specialises in analytics tools for large data sets. We've partnered with many government agencies over the past few years including the Australian Bureau of Statistics (ABS).

One of our newer tools allows for this is called SuperDataHub and we've preloaded it with some ABS Statistics they've collected.

Have a look here:

Additionally, you can click on the "i" button on the top right hand side to see the meta data information about this chart.
It's interactive too. Click on each state name to turn it on or off. You can zoom into the data too by selecting specific sections.

NAPLAN.... we were warned!

I don't have kids. I do know some people who do have kids and I am sure that the subject of Naplan is a contentious issue.

I'll look at this from my point of view so please feel free to share your thoughts.

After reading something like this after seeing it on the 7:30 report it makes you wonder: http://www.abc.net.au/news/2014-05-19/naplan-study-finds-school-testing-program-not-achieving-goals/5463004

Have we heard this before? Yes indeed, when the initial reforms were being brought in: http://www.smh.com.au/national/education/naplanstyle-testing-has-failed-us-schools-20110501-1e395.html

Anyway, I've had a focus on the education sector recently, and a great opinion that I find highly logical comes from Sir Ken Robinson (renowned educator from the UK/US)

I bet he'd have a lot to say about NAPLAN. Standardised testing is needed but when it becomes the only thing that kids (and their creativity) is measured by then we are failing our kids.

You'd probably think to yourself, why would I have any interest in this space especially coming from a data and analytics point of view. My interest stems from the learning perspective, the data helps create and ignite the connection to learning and we need to take advantage of that. It means not using the same yardsticks to measure our kids but using data to better identify ways that each can be successful.

We have to be ruthless about this and do something now, the future is on the line.

Blogs I follow

Been having issues getting these blogs I follow up on the right hand side panel so I'll post about them for now.

Below is a list of blogs and websites I highly recommend especially if you're interested in technology, government or both.

eGovAu Blog - Craig Thomler, well renowned voice in the government space is behind this blog and it's a great source of what's going on in and around technology and social media within government

Asian Efficiency - Aaron my good friend from university set this business up with a mate of his. Great at helping you get control of your time again.

Delimiter - Great blog on Government Technology news. Former ZDNet Editor, Renai Lemay is behind this one.

Lifehacker Australia - check these guys out as well as their sister sites for all things efficiency, gaming, technology etc

AGIMO Industry Consultations Open

AGIMO (Australian Government Information Office, the agency that takes care of www.data.gov.au) has opened up a survey on the establishment of a panel for cloud computing services.

This is a great chance to get your ideas out there on what the government panel here should look like especially in light of the 'digital first' agenda promoted throughout the CeBIT eGovernment conference as well as throughout the Council of Audit findings.

Inside were questions on things like what obstacles are in the way of providing cloud computing to government and what lessons that can be shared in the provision of these services.

I see a number of areas that need attention especially around better systems with self-service capabilities, shared platforms to alleviate costs and agreement on common standards. There's a few key things that my company, Space-Time Research can provide further around protection of data so hopefully we get a further say in the matter.

If you're interested in responding the link is here: Click here

Friday, 16 May 2014

Art of Knowledge Series - Episode 2

Body Language... it effects more than you think.

Here Amy Cuddy raises the hypotheses of our outer self affecting the inner self. This fake it till you make it exercise might seem like a wrong thing to do but put in the right context, the right circumstance I believe it can be of utmost help.

There are a number of social queues we look out for that relate to our mindset of who has power and who does not. Whoever can master and control these, no matter what they feel inside, can affect and manipulate the thoughts of others.

The most amazing thing here is how she had to overcome her own adversity to get to where she is (her personal story starts at 16:10).

So if you ask does it work? She's the proof in the pudding.

Thursday, 15 May 2014

Higher Education Business Intelligence Conference Recap

So the conference is over and cards exchanged (photos to come later).

It was really great to see how universities are all going on their journey towards bettering their universities with better systems.

A few key themes came out of the event which I'll share here:

Universities are all at different parts of their path towards BI systems with some more advanced than others. However the overall industry is definitely further back than the commercial or public sector.
Building a Business Intelligence capability isn't easy. It's even less so with rigid, traditional systems in place and involves creating buy-in from other departments.
The process of building a practice of pervasive analytics across universities should not just be the job of one team. It takes a concerted effort to change and reap the rewards.
The embracing of the cloud by some and keenness to do so by others is a positive step - it will mean less time worrying about infrastructure and resources and more time spent analysing.
Universities will need easy to use tools that are intuitive by design and powerful.
Predictive Analytics will be needed in this sector. Especially as it is the next level on the planning pyramid. No longer is it enough to just report or report intelligently (business intelligence) but to actively compete with the growing international market for students, our Australian universities need tools to better forecast their actions to bring more students to our shores.
The Federal Budget occurred smack bang in the middle of the conference and it's going to be ever more important for universities to understand their data and utilise it to get the best results for their students.
All of these tools, ideas and actions should be geared towards improving the work of administrators, academics and most importantly, the students. The economy of the future will be better than it's predecessors if we treat these kids like the investments that they are.

I'll leave you with this talk from the esteemed Sir Ken Robinson on Education's death valley.

After watching that you'll see it's not exactly the same thing going on here in Australia but it's good to be aware of the path not to take. The most important thing out of all this is learning that takes place at each institution and the ones that can do so unencumbered because they have systems that save them time, money and give them better insights are the ones that will succeed.

EDIT: The following has been added after since I feel it highly important and related to the changes that will take place in the university education sector going forward.

This talk from Ken comes from 2010 and whilst it seems dated, there still is a need to change mindsets, take action and bring on the revolution.

Monday, 12 May 2014

Official CeBIT photos - eGovernment sessions

They've put up some photos from the official website - good to see myself in a few below.

That's me in the light red shirt

Some of the speakers:

For all eGovernment photos: Click here

For all other albums: Click here

All photos shown here are copied under Creative Commons license 2.0 Click here

Sunday, 11 May 2014

Higher Education Business Intelligence Conference

So tomorrow I'm jetting off again - this time to Sydney for the Higher Education Business Intelligence Conference. It runs from Tuesday to Wednesday at the Sheraton on the Park;

The schedule is here: Click here

Details from Liquid Learning (hosts) are here: Click here

I'm looking forward to hearing from the various speakers about their various tools and strategies. I think education is a huge potential sector of growth in terms of data especially as there are more courses being offered in fields like Business Analytics or Business Intelligence but also because the wealth of data that the education sector takes care of. It's already been proven in many other areas how better management of that data can lead to better performance of an organisation so if we help improve educational data management then one could deduce that there will be better outcomes for teachers and students.

Anyway, my company Space-Time Research are partly sponsoring the event and we'll have a booth down there so if you're in the area feel free to pop by and say hello.

Saturday, 10 May 2014

ACO Virtual and Virgin Australia collaborate

In my position as a Business Development Manager I get to travel around Australia quite a bit. Unfortunately as it's for work I don't always get time to relax

Hanging out in the Virgin Lounge @ Melbourne Airport recently and they have a new installation. It's between the Australian Chamber Orchestra and themselves and is an interactive piece of art that's in place for a short period of time.

Check out the video and images below:

Feeling refreshed it's time to head off to Canberra!

Thanks to Emily @ Virgin for her help showing me the details of this room. Hopefully we'll see more and more things like this helping the weary traveller relax and reinvigorate before the endless meetings and conferences they attend.

Friday, 9 May 2014

Art of Knowledge Series - Episode 1

I'm going to aim to put aside a section of my posts to the art of knowledge and expanding on thoughts and ideas. There is a bevy of information out there and not much time to consume it all. This is both troubling and pleasing at the same time. Troubling that I cannot in my life time ever know as much as I want to but pleasing in that I can help enhance the collective knowledge of mankind by sharing my thoughts and sharing the very thing that organisations like TED stand for, "ideas worth sharing".

This talk by renowned economist Larry Smith is one that resonates with me quite well. I'm the kind of guy that didn't quite understand where I was going in life early on and needed to dive into things and experience them before realising where my passion lies.

The title of the talk might be a little deceiving but please view this anyway. For the parents out there you'll most certainly be challenged by his talk at the 10:00 minute mark.

Anyway, feel free to share thoughts and ideas on this in the comments section.

We'll continue this next Saturday.

Larry Smith - Why you will fail to have a great career

Open Data strategies in government

1. Goals of Government ICT Strategy

The strategy and goals Government ICT Strategy's are usually quite well developed. Understanding the changing ICT environment, and adoption of web and cloud will enable delivery of increased services at lower cost. Increased collaboration with industry specialists, will enable each party to play to their strengths, and provide the opportunities for faster, lower cost delivery of services to the public. Unlocking Government Data is a core element of any government strategy and should be a key area of focus for any reputable specialist in this area.

2. Driving more value from government data

Some of the more common goals in relation to engaging consumers of government data are as follows:

Provision of easier access to government services and information
Unlocking government data and working to innovate and build new solutions
Providing interoperable systems allowing easier interaction with government

A number of methods to do this should be considered.

2.1 Access to unit record data, rather than aggregated data or predefined reports

In general, data made available both within government and with citizens is either aggregated data, or pre-defined reports. Certainly, these reports and summary data are highly valuable.

However, as statistics providers around the world know only too well, the same piece of data can mean something different to each of their varied stakeholders. To the subject, it is a detail about their private life; to the individual, a key to learning about their community and beyond; to the media, it is a small component of a story; for analysts, it combines with millions of similar records to provide insight about our community and to

government agencies it helps plan for future services.

For government agencies collecting enormous quantities of data, the mission is to get information to those varied audiences in a way that gives the most value.

To fully unlock government data and harness its full potential, it is important to enable end-users to have access to the unit record data (the individual records of information) via the web. This enables end-users anywhere to ask any question of the data through browser-based ad-hoc self-service query and answer, rather than being limited to predefined summary information and views.

2.1.1. Confidentiality

In order to provide access to unit record data, government agencies need to protect subjects of that data by preventing breaches of confidentiality. Merely anonymising unit record data is insufficient, leaving open the possibility of calculating identity from aggregated data and failing to entirely protect the privacy of individuals.

In our view meeting the challenge of confidentiality should never be an optional extra; rather, we see it as an essential that is integrated in the software platform. The critical task of protecting privacy is far too important to leave to manual control when human error can be catastrophic both for the government agencies and their end-users. Aside from the financially costly legal outcome, the damage to public confidence and credibility can be near-impossible to repair.

Important methods of confidentialisation include cell disturbance via random rounding or perturbation, or suppression of sensitive data cells. In implementing this confidentialisation, it is important to ensure that sensitive information remains private, while retaining data utility.

Other controls that need to be considered include group and individual level user access controls that allow for easily administered and robust disclosure control options.

To maximise value of the data, permission-based access control, confidentialised tabulation on the fly and customised confidentiality routines should be considered to provide the breadth of functionality required of government agencies sharing their data.

2.1.2. Interactive, intuitive, programming-free and shareable analysis

In order to make informed decisions from vast volumes of data, it is important that end-users have access to powerful interactive, programming-free analytics software tools that are intuitive and require little to no training to use.

These tools should support motion charts, maps and other interactive visualisations enabling users to access and compare a wealth of information that is sourced from unit record data and aggregated to provide readily understood insights. The tools should also allow users to provide commentary within the tool and group together these images, charts, maps and interactive visualisations into dynamic and interactive reports that can be embedded into websites easily or shared via web links.

For government agencies and other data providers, these tools should provide the option to have cloud-based management and control and automate much of the data production process so that there is more likelihood of data published being up-to-date and requiring less effort to produce. This allows agencies to create pre-loaded government datasets that are easy to access for their users.

Furthermore, consideration should also be given to allowing users to provide feedback on the tool, the data or the insights they create via a community where users, researchers, data owners and developers can share information. Coupling this with usage tracking software will ensure that a well-monitored and highly valuable feedback loop is created that allows a community driven approach to the continual improvement of government agency’s online data dissemination portals.

3.1. Inter-agency sharing

The considerations identified above in relation to sharing of unit record data apply equally to sharing between government agencies, as it does to sharing with citizens.

However, it is important to consider the specific needs of inter-government agency data sharing. Creating a software platform that supports easy exchange of unit record information between agencies, avoids the costs associated with multiple handling of the same data by different agencies, with the added advantage of access to the full unit record data.

To achieve this, the government ICT Strategy's should give consideration to:

An inter-operable system that provides a robust, secure and high performance environment
Businesses processes underpinning the data sharing, and automation of these processes; and
The ease with which agencies can publish data for sharing with other agencies

3.1.1. Inter-operable Platform

In order to optimise the value of data use across government agencies, it is important to have an inter-operable platform that allows robust, secure, high-performance data exchange and communication between agencies, allowing data and insights to be easily shared across various government agencies.

End user Benefits

An inter-operable platform as identified above, gives end users the following benefits:

Common interfaces and tools provide users with a familiarity with how to obtain and extract relevant information;
Users will be able to obtain more data on a more timely basis because inter-operable systems reduce the time taken by agencies to release data;
Feedback mechanisms with role based security allows citizens, other government agencies and government data providers to interact with the data and share their insights;
Improved accessibility to a broader range of data without duplication from different sources leads to more informed decision-making.

Agency Benefits

From the agency perspective, an inter-operable platform:

Reduces duplication in data collection across agencies and enables implementation of a single source of truth;
Provides easier transfer of data between government agencies whether federal, state or local, enabling better utilisation of existing data;
Enables reduced costs of maintenance, improved security and easier knowledge transfer from agency to agency through common platform;
Provides economies of scale bargain power through multi-agency deals.

4. Conclusion

“Knowledge is power.” Sir Francis Bacon, 1597

Government agencies should, if they aren't already doing so, recognise the value of open data, and the importance of unlocking data and working with citizens and business to innovate and build new solutions.

Execution of this strategy will deliver numerous benefits:

With robust confidentiality systems in place, agencies will have the confidence to open up their unit record data, to other government agencies and citizens. There are a few notable absentees from current Data.Gov website lists who could be encouraged to open their data if privacy could be assured.
Easy, intuitive, self-service ad-hoc query and answer of the vast volumes of data, together with shared feedback, provides the opportunity for new insights to be unearthed which in turn could deliver untold benefits to society.
Common platform with standard data exchange mechanisms and best practice data management methodology, will enable implementation of solutions across different agencies quickly and at lower cost, and avoid duplication of effort and costs associated with data collection.

Government agencies that can deliver this strategy, and create data driven economy in their jurisdictions will be the envy of all around the world.

Wednesday, 7 May 2014

THE EVOLUTION OF ANALYTICS - PART 2

THE EVOLUTION OF ANALYTICS - PART 2

An analysis of the Business Intelligence Industry; past, present and predictions for the future.

Limitations with Current Generation technologies and options to solve them

Even with all the software advances that have improved modern analytics tools, there are still limits on the level of interaction available to users in this community.

Barriers are in place that, in varying ways, prohibit greater take up of self-service analytics and BI solutions. We look at a few here in detail and offer potential solutions based on existing products and current day practices that may not have been self-evident:

Problem: Insights have been mainly the domain of specially trained staff

There is still the notion that data analytics is the domain of only a handful of specially trained staff coming from either specific educational backgrounds or having received specialised training to operate the tools needed to gain insights from the data. This notion can be traced back to the fact that a lot of the current technologies require extensive technical expertise to operate and even the simpler ones still require some coding knowledge.

The belief is that not enough of these types of people exist so there is high demand for them but not enough supply.

A report by E-Skills (UK) and SAS sees the need for more big data specialists over the coming years. The demand for big data specialists will grow over the next 5 years by 243% to 69,000 in the UK alone.[xii]

The above graphic comes from Gartner’s 2012 report on how to deliver Self-Service BI[xiii]. It sees a divide between information consumers and power users. This line in the sand propagates the image that there are distinct differences between users in terms of involvement with data and data tools and the skill level necessary to interact with data. This leads to problems like the supply and demand of users with the necessary knowledge to do research on the data a company holds.

”The first mistake we made was in the organisational model. Centralised, IT-dominated BI teams are not conducive to empowering end users.”[xiv]

A team that blends IT and business skills is in a much better position to service this need than a strictly IT focused one.

Solution:

Instead of waiting for users to mature via traditional methods (ie create more and more specialised users) in the new suite of analytics applications there will no longer be the need to divide users into distinct groups of power users and information consumers. With the right tools, the majority of staff members become empowered enough to be able to call themselves power users too.

However, it is important to have the right tools in place before this can happen. If the tools are still complex then users will still need to be trained to find the answers they seek. On the other hand, if the tools are intuitive, easy to use and require little training then users are more likely to become involved and start getting real benefits from data analytics.

Whilst providing technical training to users is in some way beneficial, doing so takes time and with software technologies constantly evolving, going down this path means that ongoing training and development is likely required. This will ultimately prove costly to organisations in terms of their time, money and efforts, all of which could be better spent elsewhere.

It is both more efficient and cost-effective to give users access to tools that require little to no training because they are intuitive and simple to use allowing the user to focus on more value adding tasks in the day to day running of the business.

Whenever a user starts working with a new software tool, there is always a divide between their starting state and their ability to use the software, as they would typically need training, reading and course materials and practice time to even be able to start using and finding the answers they want or need:

However, self-service models that are intuitive and easy to use can help reduce the gap and the time to reach benefits and make that divide much smaller:

The full move to having the complexity divide much smaller is an ongoing piece of work and involves the use of further techniques like smarter predictive analytics and augmented intelligence (topics I will discuss in other blogs).

Search-based BI tools with “Google-like” interfaces allow users to get started right away exploring data with little training. Analysts do not spend substantial amounts of time preparing reports but rather, can create reports with a few clicks and provide value by spending more time providing insights into the data.

This type of environment also means changes for the typical IT service staff. They are no longer required to be heavily involved in the report building process when the right tools are in place. The important thing to remember here is that whilst business users may start to do what traditionally was the role of IT resources, there is still going to be a need for IT resources. It’s just that their role will evolve from being report providers or creators to being solely focused on data management, from custodianship, security and privacy and efficiency in getting the data to the right people.

Problem: Aggregated datasets to answer organisational questions

Another issue is that although the current tools may have started to move towards a self-service model, they are only doing so over limited datasets. A lot of solutions in this space have to aggregate the information available, either for security reasons or due to the amount of time it takes to prepare large datasets for self-service dissemination.

These tools will serve up analytics to the limit of what can be achieved within hardware and software capacity. Tools like this do not necessarily connect to all the data to begin with and may require a lot of configuration in the build process rather than just being able to plug and play. Hardware, software, implementation or timing constraints mean that even with all the right accessibility and authorisations in place, a user might still be limited to looking at only a portion of the available data.

This is especially true where a program claims to be self-service for its end users but is really looking at a small sample of the data or a pre-aggregated report. The user can explore the information in the view, but if the view is limited it cannot really be considered a full self-service option. Others have already made a decision about what data is made available and what is not. And the end user might not even be aware that information is missing.

Of course an organisation needs to control what it can show but doing this in collaboration with users allows the users to set the agenda. This leads to a better user experience and less time iteratively creating and updating reports. By giving users access to most of the data available rather than a small amount, the organisation can also ensure that there is future proofing against having to create new reports when additional information is needed.

Solution:

To solve this problem, we need a software tool that automatically gives the full set of data to users so they can decide what is important. The tools must be capable of looking at entire datasets and have the ability to give this power to all users not just a select few. Any restrictions on who can see what data should only be imposed due to business rules, not hardware or software limitations.

In this system, end users create the reports they want to see. The data providers can build a few pre-packaged reports as a guide if they want, but they no longer need to handle all the report building, freeing them up for work that provides other value added benefits to the data.

The tool also needs a feedback loop for end users, to understand their data needs and ensure those needs are met in building any future self-service capabilities.

Problem: Privacy Concerns

There are serious privacy challenges faced by organisations that collect and disseminate personal and business information.

While statistical information can lead to insights into trends, growth and demographics, organisations dealing with this information must be careful not to disclose private information.

In the past official statistics providers have given external researchers and analysts limited and tightly controlled access to the microdata from their censuses and surveys because of their duty to protect the privacy of their survey respondents.

Typically this controlled access takes the form of in-house or remotely accessed data laboratories or research centres, or the provision of pre-confidentialised sample files. All of these scenarios typically involve a statistics provider’s staff having to do some form of manual review and vetting of the information generated in response to a data query before it is delivered back to the researcher.

The demands to release greater volumes of data with increasing levels of detail are becoming more and more the norm, especially in light of open data policies of federal and state governments.

Experiences that were usually felt by National Statistical Organisations (NSOs) are now being felt by a lot more private and public organisations.

Ensuring confidentiality of the data gathered by an organisation is a necessity to ensure that individuals and organisations are not reluctant to provide information, and to maintain their trust.

Solution:

There are a variety of disclosure control methods that play an important role in helping companies achieve a certain level of confidentiality. For example:

- Aggregation – creating summary tables (“cubes”)

- Confidentialisation of microdata - sampling or perturbing the values of data records so that an anonymous set can be safely released

- Confidentialisation of tabular data - concealing or adjusting values in aggregate data before being released

- Business rules - controlling the level of detail in queries using pre-defined rules

- Trust and access control - providing more detailed information to trusted groups

- Monitoring - recording and reviewing the types of queries executed by users

When selecting the appropriate disclosure control methodology, organisations need to strike the right balance between making information available and meeting their privacy obligations. The ideal solution will be one that conceals just enough data to meet those obligations. Perturbation is typically the best method for achieving this.

More on perturbation can be found here: http://www.spacetimeresearch.com/s=perturbation&Submit=Search

This is a topic I will discuss in further detail in later blogs.

Problem: Information overload and over reliance on machine based rules

Information overload problem

Current generation technology can now capture data faster than ever before. There is a danger that users might become overwhelmed by all this information. If the ability of users to understand the abundance of reports and data out there cannot keep up with the amount of information collected, then there is as much chance of burying the useful information as of uncovering it.[xv] However this is not necessarily a problem of too much information, as long as the right tools are in place to help users manage the information.

Machine based rules problem

Additionally, in the current climate, there are a limited number of users with the capability and know-how to traverse these huge databases. This leads to another part of the information overload problem: an over-reliance on machine driven analysis.

For example, the National Security Agency in the US has a separation step for its Big Data repository that strips out “noise”. But it’s possible that what the software perceives as noise is in fact a signal; a signal that could have been seen if there was human intervention in the process.

Solution:

Software becomes part of the solution here – but it is vital that the software is easy to use. If it is easy to use it can help to create an information economy, where all members of a company have the potential to mine data. They can all add value by becoming managers of information, data miners, data analysers, and data explorers.

It simply becomes a numbers game. In the past, users struggled to understand the wealth of information because there were not enough users. By adopting tools that are easy enough for all employees to use, user numbers can increase dramatically.

Whilst it appears to be useful to create smarter systems and algorithms that can automatically find the relevant correlations in data, an over reliance on software algorithms can bring its own problems.

By increasing the numbers of competent users we can create an environment where the rules written into any data dissemination engines are reviewed and re-reviewed by many human eyes. This vastly reduces the chances that important data will fall through the cracks.

Problem: BI User discussion

This last section showcases the problems noted from the “2014 Analytics, BI, and Information Management Survey”. This survey was conducted with 248 respondents answering questions on organisations using or planning to deploy data analytics, BI or statistical analysis software[xvi].

59% said data quality problems are the biggest barrier to successful analytics or BI initiatives
44% said "predicting customer behaviour" is the biggest factor driving interest in big data analysis
47% listed "expertise being scarce and expensive" as the primary concern about using big data software
58% listed "accessing relevant, timely or reliable data" as their organisation's biggest impediment to success regarding information management

Solutions:

Data quality problems

Organisations can go a long way towards eliminating data quality problems by implementing a single source of truth, and apply and maintain proper metadata practices.

To ensure a single source of truth, data captured by the business must be recorded only once and held in a single area, accessible by the different enterprise software systems. Whether those software systems are across geographic areas or not, reading from the one system ensures that everyone looks at the same consistent data and issues.

Metadata is data that serves to provide context or additional information about other data, such as information about the title, subject, or author of a document. It may also describe the conditions under which the data stored in a database was acquired, such as its accuracy, data, time, method of compilation and processing.

Proper metadata practices mean that users of the data know exactly where it is from as well as any understanding any contextual information that is necessary for analysing the data.

Furthermore, usage of the Generic Statistical Business Process Model (GSBPM), an international best practice model, will help ensure that data lifecycle management process guides organisations to get the most from their data and metadata while determining appropriate use and retention to safely navigate the various legislative minefields.[xvii]

Predicting behaviour

In order to use data to predict customer behaviour, users need to have access to appropriate analytical tools.

For example, statistical methods and functions that have traditionally been solely available in specialised statistical software packages. As BI tools develop, these techniques are becoming more mainstream.

Of course, it is important to involve the users of the data in this process, to understand how they currently use and manipulate data to gain insights and then work out ways that software can automate all or part of that process.

Later posts will talk about the exciting world that is predictive analytics so stay tuned to those.

Expertise limitations

We have already looked at the problem of expertise limitations. As the tools become easier to use, more and more users will be able to take advantage of the power of data analytics, no longer having to rely on a select few individuals with the software expertise.

Accessing relevant, timely or reliable data

Software can help to solve the problem of reliable and up to date information. It is vitally important that the analytical tools make it as easy as possible to update the data. It must become an easily automated process, as opposed to a time consuming and highly manual exercise.

It is also important that data only needs to be updated once, in a single source of truth, rather than having to update many different databases.

Conclusion

Being aware of the gaps and limitations of current generation technology allows software developers to look at creating the capabilities that end users will start demanding in the future.

It is clear that disclosure control will become increasingly important: data must only be made available to users with the right credentials, and the system must automate this process as much as possible, making it easier to protect data and distribute it.

Finally, the tools must be easy to use and intuitive, helping to build a smarter user base from the ground up, and increasing the number of insights that can be gained as the number of users looking at the data grows.

References

NB: Numbering to be fixed up and numbered later

Davenport, Thomas H.; Harris, Jeanne G. (2007). Competing on analytics : the new science of winning. Boston, Mass.: Harvard Business School Press. ISBN 978-1-4221-0332-6.

Analytics 3.0 Article Harvard Business Review - http://hbr.org/2013/12/analytics-30/ar/1

Bill Franks, CAO Teradata http://tdwi.org/articles/2013/09/24/evolving-scope-of-business-intelligence.aspx

http://blogs.perficient.com/businessintelligence/2013/08/09/business-intelligence-future-trends/#!

SuperDataHub – www.superdatahub.com

http://www.tableausoftware.com/learn/webinars/verizon-gains-critical-insights-self-service-analytics?signin=ea7b19c48b0d3e8c1a6a50a7b71eacf5

http://www.information-age.com/technology/information-management/123457131/self-service-business-intelligence#sthash.EuK7laTy.dpuf

http://www.slideshare.net/AlexandrePerrot/qlik-view-how-to-deliver-self-service-bi

http://www.finance.gov.au/policy-guides-procurement/gov20/

http://en.wikipedia.org/wiki/Open_data

http://www.information-age.com/it-management/skills-training-and-leadership/123457434/demand-for-big-data-specialists-to-grow-by-243--in-next-5-years

http://www.slideshare.net/AlexandrePerrot/qlik-view-how-to-deliver-self-service-bi

http://www.techdirt.com/articles/20130909/12361124455/

http://spacetimeresearch.com/2013/07/spotlight-on-metadata/

http://spacetimeresearch.com/products/superstar-platform/