Confessions of an Analytics junkie: 2014

Sunday, 9 November 2014

New Job

So I've moved into a new role and will be leading up the Business Intelligence practice for an Equities Research firm here in Melbourne.

It's pretty exciting as I get to once again use my finance skills (more like brush the dust off of...) and get involved in the wonderful world of investments

For those of you who aren't sure, Equities Researchers have started forming up over the last couple of years since the GFC (Global Financial Crisis) as the rates of pay typically given in basis points has been in decline. It's only really those who offer great insights and ideas to fund managers who will be able to make any money in this current climate.

As for the company, QMG, we've got some pretty good insights into sector level analysis and we couple our information with data mashups which is what I've seen happening in other industries to get unique insights. Our insights will prove valuable to those not just in the investments sector but anyone really looking to make either strategic or market based moves.

Right now we are focusing on overseas markets as the ASX is not really the largest of all the global players (though we might add them on eventually however we do have some Technical Analysis provided on the local stocks).

In the meantime if anyone is interested in having a look at our products check us out here.

I'll be focused on creating an interactive and easy to use experience for anyone interested in the research we do.

It's going to be fun being on the buyer side for once!

Sunday, 17 August 2014

Graduate unemployment and salary information

Saw a great article in the Australian Financial Review today all about the decline of graduate employment in Australia (original article here). Edmund Tadros has pulled together some great data on the salaries of students and percentage of those still seeking jobs after finishing university.

The presentation of data in news is something that is always intriguing to me and another opportunity to find ways to visualise the data and make it interactive for consumers.

Here's my quick 5 minute hack-up of the data pulled together using SuperDataHub.

Enjoy!

Saturday, 9 August 2014

Surviving the Challenge

So I'm coming up to my last day as the inaugural IT Survivor for VMWare and it's definitely been a very interesting experience. Despite buggy internet at times, beautiful and sometimes distracting scenery and curious locals, we were able to film all the sequences required for the challenges set for me during the week. I even managed to get my own work sorted out at the same time which was definitely the more important of the 2 things.

In any case I definitely learned a lot whilst there and would do it again if given this kind of opportunity. Maybe next time they can do a 'versus' kind of challenge where hardened IT veterans do battle against each other where the winner gets fed and the loser gets fed.... to the sharks.

Anyway, remote working is something that in itself is a challenge because a lot of traditional business is done face to face. I've learned this week that more and more, the walls of being able to do so are being broken down and the level of engagement that you can have with customers in non face-to-face environments is ever increasing.

Check out the rough mobile phone videos that the guys from VMWare took:

https://www.youtube.com/watch?v=4g9sJS78Aqk

https://www.youtube.com/watch?v=KHRI_xa9crE

The actual film done by the professional crew is going to be cut and put together to coincide with VMWare's vForum across Asia Pac - http://info.vmware.com/content/virtualizationforum_WW

Can't wait to see how it turns out!

Photo courtesy of Allure Media / Lifehacker - http://www.lifehacker.com.au/2014/08/it-survivor-the-video-wrap-up/

Wednesday, 23 July 2014

MapBox

Interested in creating your own maps? MapBox have a set of tools online where you can sign up for free and create your own visualistions.

Here's a simple one I've done using polygons.

It's a little bit crude but at least there's stuff like this out there.

At Space-Time Research, we have mapping capabilities that work alongside our core products of data dissemination so if anyone out there is interested in improving their mapping capabilities then feel free to reach out to me.

Tuesday, 22 July 2014

AUSTRALIAN TRADE COMMISSION - AUSTRALIA: BENCHMARK REPORT part 2

Carrying on from part 1 of my look at the data available the Australian Trade Commission's Australia Benchmark Report, this time we build a report from the Merchandise Exports by Industry report. The original report is shown (here).

The great thing with SuperDataHub is that you can not only just embed single charts like you saw in the previous post but you can also create full reports like you see here and even add commentary that relates to your data.

For more on SuperDataHub you can register for free here.

You can also see more about the Australian Trade Commission here.

AUSTRALIAN TRADE COMMISSION - AUSTRALIA: BENCHMARK REPORT part 1

One of the clients I get to work with is Tourism Research Australia and being part of the whole Department of Foreign Affairs and Trade portfolio means they get to work with great agencies like the Australian Trade Commission (ATC) or Austrade.

The ATC have a great dataset of Benchmark Reports to showcase key investor Indicators in Growth, Innovation, Talent, Location and Business and then they compare how Australia performs in these areas with other countries. You can see the Australia Benchmark Reports (here).

I thought it would be a great idea to have look at some of these in our SuperDataHub tool to interact with the data.

This first example looks at the productivity of Australian industry sectors compared with the global average.

The original report is shown (here):

Using SuperDataHub, I can group some of these together for further analysis:

For more on SuperDataHub you can register for free here.

You can also see more about the Australian Trade Commission here.

Monday, 21 July 2014

The extremes of Remote Working?

So a few years ago a colleague of mine put me onto this website called Lifehacker. It's a weblog about lifehacks and software that covers a varying range of topics from general life tips and tricks to business related ones and has helped me in both personal and work related instances.

They recently ran a competition with VMWare who provide various remote working software products. The idea of it was to see how you could take remote working to the extreme and what better way than to channel your Castaway/Lost fantasy than to be working on an island off the coast of Australia.

In any case I never really think much of these competitions and will apply for a few here and there, but imagine my surprise when I find out that not only have I been short listed, but that I've actually won the major prize. Well that is exactly what's happened and later at the start of August I get to fly off to Townsville, get a boat to Magnetic Island and work with the guy at VMWare to write and video blog about my experiences.

Whilst it may seem like fun and games I'm actually pretty keen to see how much I can get done. I've done the remote working type gig in the past and actually can do that for my current role if I need to. I'm very much customer facing and am either on the phones a lot or emailing and web conferencing with clients a lot. I'm a big believer in creativity in the workplace and what better way than sitting on an island, sand in my feet and laptop in hand coming with new and innovative ways our products can deliver value to our customers. If we actually get some big sales out of it I may even recommend it at work as a regular thing (kidding....maybe).

The great thing is that you guys will get to see some more about the type of solutions we offer our clients and who knows, maybe it's something that can help you too whether you're looking for self-service analytics solutions or trying to find ways to get more value from your data. In the meantime check out my company Space-Time Research.

Anyway if you'd like to check out more about the competition you can see it (here).

The details about my win are (here).

I also did an interview with VMWare (here).

Anyone got any tips or tricks to surviving away from the office and keeping productive then please advise. I actually have some friends in this space, 1 of whom I went to uni with. Check out their company Asian Efficiency and blog (here).

So I'll be off soon but no doubt this blog, and the details about the experience will be up on VMWare and Lifehacker soon. I'll keep you all posted.

Saturday, 12 July 2014

The prevalence and use of R statistical software

The rise in use of R as a statistical product has become more and more prevalent and it is especially important to note for anyone interested in Analytics. Not only is it a free software tool with a great support community but despite the learning hurdles that might need to be overcome, it turns out to be quite a lot more robust and flexible compared to its more established rivals.

Bob Muenchen of http://r4stats.com has forecast that in the next few years R will overtake SPSS and SAS in this game and he has devised a simple measure of doing it. Simply looking at the use of statistical software in Google Scholar articles.

The table below shows the overall state of play

There is a distinctly high use of the more traditional products in SAS and SPSS however the number are clearly declining. The break even point is likely to be soon if not this year then definitely in the next few and Bob offers up a few reasons for this including:

The continued rapid growth in add-on packages
The attraction of R’s powerful language
The near monopoly R has on the latest analytic methods
Its free price
The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (IBM is loosening up on this a bit)

Additionally, I see R as a robust and flexible tool compared to the others. Sure they are quite powerful and easy to use but not every problem can be solved by the same hammer. Additionally, the R online user support community is one of the most powerful I've ever seen with users willing to offer up their insights for others to share quite freely.

All of this leads to the need for more and more software packages to integrate with R to find and solve real world problems. The flexibility of the package is key and organisations will know that their problems cannot just be solved by the software with the best name. Organisations that can work towards a solution that is fit for purpose will find themselves with less headaches and better short term and long terms outcomes.

Monday, 7 July 2014

Is Tourism important to your business? Check this out

These charts look at the ABS recorded travel movements of persons arriving in, and departing from, Australia across 2013 and 2014.

These statistics are important because they provide the input to wide range of other statistical collections which include the following:

Australia's official population estimates, through quality estimates of NOM;
the Australian Migration Planning Framework;
key national economic and tourism indicators;
forecasting NOM into the future;
International Trade & Balance of Payments statistics;
compiling the International Accounts and the Tourism Satellite Account;
estimating National Income and Consumption; and
creating benchmarks for the International Visitors Survey.

The data is available as an Excel download but I've loaded up some of the statistics here for you to play around with.

This first one compares the 3 main types of short terms visitor across that period. You can click on the drop-down to compare departures versus arrivals:

The next one looks at comparing the Arrivals and Departure in the same graph with the drop down being for different types of visitor:

Anyone involved in the tourism industry or dealing with seasonality fluctuations in their business models should take note of the movement. The ABS has a massive amount of data available online and if you need tools to visualise that there are plenty. I use SuperDataHub because it let's me publish that but you can definitely use what's easiest for you.

For more information on these ABS datasets look here.

For information information and to try for yourself the cloud analytics in SuperDataHub click here.

Saturday, 5 July 2014

2014-15 Federal Budget Analytics

The Data.Gov website has quite a few interesting bits of data on there especially and as it's the other side of budget season it's important to get an understanding of how the money is being spent.

They have the 2014-15 Federal budget details listed click here. However it is not very on the main website. Great if you have a tool to load this data into, you can use their API, however unless you want to download it as a CSV then read and make your own charts in Excel it's not very usable for searching.

I've created a chart in SuperDataHub (www.superdatahub.com) that allows you to filter and see what the budget spend was projected be like for the next few years.

Will update this once more information comes along but enjoy the ease of use of interaction on this one:

Friday, 4 July 2014

THE EVOLUTION OF ANALYTICS

THE EVOLUTION OF ANALYTICS - PART 1

An analysis of the Business Intelligence Industry; past, present and predictions for the future.

History of analytics

There are many examples throughout history that showcase how others have presented their data and findings to the world. The old adage that a picture is worth a thousand words rings true. For as long as people have been analysing data, they have been using visualisations to share their findings. Before the dawn of the computer age, these hand-drawn graphs were highly influential in the political and national arenas of their time. Florence Nightingale’s graphical illustration of the key causes of mortality during war, for example, showed at a glance the deaths of soldiers from preventable causes and led directly to improvements in military hospitals:

Charles Joseph Mindard’s graphic depicting the Russian campaign of 1812 showed the relationship between number of soldiers, falling temperatures and distance by soldiers, allowing for better planning on the part of military advisers:

William Playfair’s chart of 1821 compared weekly wages of a good mechanic with the price of a quarter of wheat, showing the decline in buying power of the labour force as part of his book showcasing the causes of the fall of powerful and wealthy nations:

With the onset of the industrial age, data analytics became a vital tool for business. From the first time management exercises conducted by Frederick Winslow Taylor in the late 19^th century to the analytics utilised by Henry Ford’s assembly line to measure pacing of production, this field began to command more and more attention.

As computers became more prevalent in business, further developments here led to the creation of systems that would capture and make use of business data such as Enterprise Resource Planning Systems, Customer Relationship Management systems, data warehouses and a variety of hardware and software tools to further aid the cause.[i]

The Evolution of Analytics

Analytics has since grown more and more prominent. Today nearly all organisations have some sort of methodology to track and utilise their data as well as dedicated roles responsible for sustaining and managing this growth.

Analytics 1.0: Highly Scientific and only for the larger players

As more and more businesses embraced the power and competitive advantages that analytics could bring, it became obvious that a deep understanding of important business phenomena gave management a better ability to make decisions about the various processes in the organisation. It was during this era that the Enterprise Data Warehouse began to be used to capture information, with BI software developed to query and report on it.[ii]

BI has been the mainstream word used to describe the organisational intelligence software packages that are used by many companies to connect to their data. These packages come in all sorts of varieties from the simple to configure and implement to the more complicated and powerful. Out of this space a few market leaders emerged including IBM, SAS, SAP, Cognos and Qlikview.

Traditionally, the characteristics of BI environments in organisations were such that:

The software was configured, maintained, and administered by IT
Few users had broad flexibility to customize or create their own reports (most users were generally limited to pre-defined reports and prompts)
The vast majority of the reports generated contained some combination of grid-style data points and basic visuals such as line graphs, bar charts, and pie charts
The logic behind the reports was limited to what could be generated through standard SQL programming language constructs

By today’s standards, these limitations would not be acceptable to most users, but the restrictions were mainly a consequence of the power and availability of technology of the time. As the available technology has become more powerful, the demand for the outputs of BI systems has also grown, increasing the need to change how the environment works.

Analytics 2.0: The Social Media revolution and more break-throughs

The next evolution came with the advent of what is commonly referred to as “Big Data”. Firms began to amass large amounts of internet-based social media information in addition to their own internal data,[iii] giving them further insight into their customers. For the organisations that were equipped to properly analyse this data, it proved to be a valuable source of competitive advantage. During this time innovative technologies were created, acquired and mastered and revolutionary ways to handle the data volumes came about in both hardware and software technology.[iv] This includes the creation of Hadoop open software framework, cloud-based software environments, in-memory engines and NoSQL databases to name a few.

Today, the BI system has changed and is no longer governed by IT. Now there is more flexibility, and more users than ever are able to explore the data, discover new insights and share the results.

· New, powerful BI tools breaking the restriction that users could only access the reports they were given

o In terms of being able to experiment with new metrics and views of data, users are no longer constrained as they once were. Users now have the ability to explore their data (the term “data discovery” was coined to describe this process). All the while there are controllable limits on what data is accessible due to hardware and security limitations.

· Advanced visualisations and interactive dashboards

o No longer just bar and pie charts with static drill-downs – charts, graphs and maps are now explorable and can be linked to real-time data updates enabling faster insights

· BI tools no longer constrained to just standard SQL programming language logic

o More and more tools now have advanced analytic techniques including predictive analysis and are not restricted to standard SQL logic. The use platforms like Hadoop and Teradata expanded the types of processing that can be applied and utilised to expand the avenues that can be explored with data.

This means the modern BI environment is no longer limited to just standard reports provided by a small team (usually connected to an IT department) but rather BI is becoming more of a self-service space, where visuals and interactivity are the norm. It leads to a blurring of the lines between IT and business users.[v]

Analytics 3.0: Empowering more users

Analytics 3.0 is seen as the next stage of the evolutionary chart for this industry and comes about when analytics becomes ingrained in almost all of an organisation’s actions. Regardless of whether that organisation makes or moves or consumes things, or produces or provides services, it will have access to information and data to report and analyse. With Analytics 3.0, the organisation can use the power of data analytics to create more valuable products and services.[vi]

In this era, the concepts of data discovery and exploration become even more important factors for organisational success, leading to greater empowerment for both internal and external users who can now “quickly plug-in, model, and analyse new data sources while still leveraging enterprise metadata and data”.[vii]

Much of the progression has been a consequence of better and faster hardware support systems, the advent of cloud computing and the move towards the availability of highly capable infrastructure via offerings such as infrastructure as a service (IAAS) and platform as a service (PAAS). With the new cloud hosted, browser-based software model, users are no longer reliant on the responsiveness of their organisation’s IT department, something that has traditionally been a source of great consternation for many users.[viii]

BI tools in general have improved and now feature self-service capabilities. The potential benefits of a self-service model include:

Analysts have more time to concentrate on analysing reports as opposed to preparing them
Users are empowered to discover data themselves, rather than relying on a reports team who might not have full understanding of the data
Usability of reports has improved, especially for non-traditional BI users
IT workload is reduced so they can concentrate on addressing any data requests more quickly and efficiently

Examples of self-service BI tools from tools like SuperDataHub, Tableau and Qlikview are shown below:

Data volumes increase dramatically

Recently there has been not only a staggering uptick in the volume of data produced and collected by businesses, but also a steady increase in the awareness of the power of data analytics.

The combined effect is that users are increasingly dissatisfied with prescriptive reports and dashboards that are handed down to them and that do not evolve. This is especially evident where the reports raise new questions, and users are then unable to obtain the answers quickly enough to take advantage of a market condition or situation.[ix]

Self-service analytics can thrive if it can keep up with user demand and is the key towards changing the perception of software from being a cost centre to being a fundamental underpinning to organisational success.

Lessons from the National Statistics Offices and the Open Data Movement

Before moving on it is important to take a look at how the providers of the biggest self-service platforms in the world have handled the growing demand for data and the lessons that can be learned. Governments around the world have been leading the way in self-service analytics for some time, with the data they release via their National Statistical Organisations (NSOs).

Many of these NSOs have been using some sort of self-service portal to serve up the statistics they collect. This includes the Australian Bureau of Statistics, US Census, Office of National Statistics (UK) and many others. These portals mostly provide large amounts of data for dissemination purposes and use by researchers, statisticians and the public, who can access datasets if they have the appropriate accreditation.

Lessons can be learned from the way these organisations organise the protection of private information and balance that responsibility with their mandate to ensure that enough data is released so that valuable insights can be gained by researchers and the like.

A newcomer to the discussion is the Open Data Movement which, at its very essence, is about the release and dissemination of government datasets. It stems from the ideas of “Government 2.0”, which is defined as “the use of technology to encourage a more open, transparent and engaging form of government, where the public has a greater role in forming policy and has improved access to government information.”[x]

The Open Data Movement encourages the notion that government data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the Open Data Movement are similar to those of other "open" movements such as open source, open hardware, open content, and open access.

As John Wilbanks, VP Science at Creative Commons, says “numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery, we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge.”[xi]

Open data sites are beginning to become more and more prominent around the world. They include the Data.Gov websites seen in many countries including the USA, Australia and UK.

There are great benefits from this model of openness for society and government. These include economic benefits from innovations made using the data as well as social benefits from more transparent governments.

The ability for data to help empower users is important but the potential for inappropriate use must be also considered. This is especially true where the release of confidential data can lead to severe repercussions.

To combat this, companies will usually prepare aggregated data to hide any confidential data but often this aggregated data cannot deliver the required abundance of information needed to gain insights.

In most environments, the time taken to protect and confidentialise data is time consuming, so a balance must be struck between what can safely be released and the time and costs required to prepare it.

Back to Analytics: What does all this mean?

Bearing in mind the trend towards self-service analytics, and the lessons learned from the Open Data Movement the next question we might ask is “where do we go from here?”

To attempt to answer this, we might ask what are the concerns of those involved in deploying analytical or BI solutions, and what limitations still exist with current generation software? How would these need to be addressed before it really does become the norm in all businesses and all industries?

Limitations with Current Generation technologies and options to solve them

Even with all the software advances that have improved modern analytics tools, there are still limits on the level of interaction available to users in this community.

Barriers are in place that, in varying ways, prohibit greater take up of self-service analytics and BI solutions. We look at a few here in detail and offer potential solutions based on existing products and current day practices that may not have been self-evident:

Problem: Insights have been mainly the domain of specially trained staff

There is still the notion that data analytics is the domain of only a handful of specially trained staff coming from either specific educational backgrounds or having received specialised training to operate the tools needed to gain insights from the data. This notion can be traced back to the fact that a lot of the current technologies require extensive technical expertise to operate and even the simpler ones still require some coding knowledge.

The belief is that not enough of these types of people exist so there is high demand for them but not enough supply.

A report by E-Skills (UK) and SAS sees the need for more big data specialists over the coming years. The demand for big data specialists will grow over the next 5 years by 243% to 69,000 in the UK alone.[xii]

The above graphic comes from Gartner’s 2012 report on how to deliver Self-Service BI[xiii]. It sees a divide between information consumers and power users. This line in the sand propagates the image that there are distinct differences between users in terms of involvement with data and data tools and the skill level necessary to interact with data. This leads to problems like the supply and demand of users with the necessary knowledge to do research on the data a company holds.

”The first mistake we made was in the organisational model. Centralised, IT-dominated BI teams are not conducive to empowering end users.”[xiv]

A team that blends IT and business skills is in a much better position to service this need than a strictly IT focused one.

Solution:

Instead of waiting for users to mature via traditional methods (ie create more and more specialised users) in the new suite of analytics applications there will no longer be the need to divide users into distinct groups of power users and information consumers. With the right tools, the majority of staff members become empowered enough to be able to call themselves power users too.

However, it is important to have the right tools in place before this can happen. If the tools are still complex then users will still need to be trained to find the answers they seek. On the other hand, if the tools are intuitive, easy to use and require little training then users are more likely to become involved and start getting real benefits from data analytics.

Whilst providing technical training to users is in some way beneficial, doing so takes time and with software technologies constantly evolving, going down this path means that ongoing training and development is likely required. This will ultimately prove costly to organisations in terms of their time, money and efforts, all of which could be better spent elsewhere.

It is both more efficient and cost-effective to give users access to tools that require little to no training because they are intuitive and simple to use allowing the user to focus on more value adding tasks in the day to day running of the business.

Whenever a user starts working with a new software tool, there is always a divide between their starting state and their ability to use the software, as they would typically need training, reading and course materials and practice time to even be able to start using and finding the answers they want or need:

However, self-service models that are intuitive and easy to use can help reduce the gap and the time to reach benefits and make that divide much smaller:

The full move to having the complexity divide much smaller is an ongoing piece of work and involves the use of further techniques like smarter predictive analytics and augmented intelligence (topics I will discuss in other blogs).

Search-based BI tools with “Google-like” interfaces allow users to get started right away exploring data with little training. Analysts do not spend substantial amounts of time preparing reports but rather, can create reports with a few clicks and provide value by spending more time providing insights into the data.

This type of environment also means changes for the typical IT service staff. They are no longer required to be heavily involved in the report building process when the right tools are in place. The important thing to remember here is that whilst business users may start to do what traditionally was the role of IT resources, there is still going to be a need for IT resources. It’s just that their role will evolve from being report providers or creators to being solely focused on data management, from custodianship, security and privacy and efficiency in getting the data to the right people.

Problem: Aggregated datasets to answer organisational questions

Another issue is that although the current tools may have started to move towards a self-service model, they are only doing so over limited datasets. A lot of solutions in this space have to aggregate the information available, either for security reasons or due to the amount of time it takes to prepare large datasets for self-service dissemination.

These tools will serve up analytics to the limit of what can be achieved within hardware and software capacity. Tools like this do not necessarily connect to all the data to begin with and may require a lot of configuration in the build process rather than just being able to plug and play. Hardware, software, implementation or timing constraints mean that even with all the right accessibility and authorisations in place, a user might still be limited to looking at only a portion of the available data.

This is especially true where a program claims to be self-service for its end users but is really looking at a small sample of the data or a pre-aggregated report. The user can explore the information in the view, but if the view is limited it cannot really be considered a full self-service option. Others have already made a decision about what data is made available and what is not. And the end user might not even be aware that information is missing.

Of course an organisation needs to control what it can show but doing this in collaboration with users allows the users to set the agenda. This leads to a better user experience and less time iteratively creating and updating reports. By giving users access to most of the data available rather than a small amount, the organisation can also ensure that there is future proofing against having to create new reports when additional information is needed.

Solution:

To solve this problem, we need a software tool that automatically gives the full set of data to users so they can decide what is important. The tools must be capable of looking at entire datasets and have the ability to give this power to all users not just a select few. Any restrictions on who can see what data should only be imposed due to business rules, not hardware or software limitations.

In this system, end users create the reports they want to see. The data providers can build a few pre-packaged reports as a guide if they want, but they no longer need to handle all the report building, freeing them up for work that provides other value added benefits to the data.

The tool also needs a feedback loop for end users, to understand their data needs and ensure those needs are met in building any future self-service capabilities.

Problem: Privacy Concerns

There are serious privacy challenges faced by organisations that collect and disseminate personal and business information.

While statistical information can lead to insights into trends, growth and demographics, organisations dealing with this information must be careful not to disclose private information.

In the past official statistics providers have given external researchers and analysts limited and tightly controlled access to the microdata from their censuses and surveys because of their duty to protect the privacy of their survey respondents.

Typically this controlled access takes the form of in-house or remotely accessed data laboratories or research centres, or the provision of pre-confidentialised sample files. All of these scenarios typically involve a statistics provider’s staff having to do some form of manual review and vetting of the information generated in response to a data query before it is delivered back to the researcher.

The demands to release greater volumes of data with increasing levels of detail are becoming more and more the norm, especially in light of open data policies of federal and state governments.

Experiences that were usually felt by National Statistical Organisations (NSOs) are now being felt by a lot more private and public organisations.

Ensuring confidentiality of the data gathered by an organisation is a necessity to ensure that individuals and organisations are not reluctant to provide information, and to maintain their trust.

Solution:

There are a variety of disclosure control methods that play an important role in helping companies achieve a certain level of confidentiality. For example:

- Aggregation – creating summary tables (“cubes”)

- Confidentialisation of microdata - sampling or perturbing the values of data records so that an anonymous set can be safely released

- Confidentialisation of tabular data - concealing or adjusting values in aggregate data before being released

- Business rules - controlling the level of detail in queries using pre-defined rules

- Trust and access control - providing more detailed information to trusted groups

- Monitoring - recording and reviewing the types of queries executed by users

When selecting the appropriate disclosure control methodology, organisations need to strike the right balance between making information available and meeting their privacy obligations. The ideal solution will be one that conceals just enough data to meet those obligations. Perturbation is typically the best method for achieving this.

More on perturbation can be found here: http://www.spacetimeresearch.com/s=perturbation&Submit=Search

This is a topic I will discuss in further detail in later blogs.

Problem: Information overload and over reliance on machine based rules

Information overload problem

Current generation technology can now capture data faster than ever before. There is a danger that users might become overwhelmed by all this information. If the ability of users to understand the abundance of reports and data out there cannot keep up with the amount of information collected, then there is as much chance of burying the useful information as of uncovering it.[xv] However this is not necessarily a problem of too much information, as long as the right tools are in place to help users manage the information.

Machine based rules problem

Additionally, in the current climate, there are a limited number of users with the capability and know-how to traverse these huge databases. This leads to another part of the information overload problem: an over-reliance on machine driven analysis.

For example, the National Security Agency in the US has a separation step for its Big Data repository that strips out “noise”. But it’s possible that what the software perceives as noise is in fact a signal; a signal that could have been seen if there was human intervention in the process.

Solution:

Software becomes part of the solution here – but it is vital that the software is easy to use. If it is easy to use it can help to create an information economy, where all members of a company have the potential to mine data. They can all add value by becoming managers of information, data miners, data analysers, and data explorers.

It simply becomes a numbers game. In the past, users struggled to understand the wealth of information because there were not enough users. By adopting tools that are easy enough for all employees to use, user numbers can increase dramatically.

Whilst it appears to be useful to create smarter systems and algorithms that can automatically find the relevant correlations in data, an over reliance on software algorithms can bring its own problems.

By increasing the numbers of competent users we can create an environment where the rules written into any data dissemination engines are reviewed and re-reviewed by many human eyes. This vastly reduces the chances that important data will fall through the cracks.

Problem: BI User discussion

This last section showcases the problems noted from the “2014 Analytics, BI, and Information Management Survey”. This survey was conducted with 248 respondents answering questions on organisations using or planning to deploy data analytics, BI or statistical analysis software[xvi].

59% said data quality problems are the biggest barrier to successful analytics or BI initiatives
44% said "predicting customer behaviour" is the biggest factor driving interest in big data analysis
47% listed "expertise being scarce and expensive" as the primary concern about using big data software
58% listed "accessing relevant, timely or reliable data" as their organisation's biggest impediment to success regarding information management

Solutions:

Data quality problems

Organisations can go a long way towards eliminating data quality problems by implementing a single source of truth, and apply and maintain proper metadata practices.

To ensure a single source of truth, data captured by the business must be recorded only once and held in a single area, accessible by the different enterprise software systems. Whether those software systems are across geographic areas or not, reading from the one system ensures that everyone looks at the same consistent data and issues.

Metadata is data that serves to provide context or additional information about other data, such as information about the title, subject, or author of a document. It may also describe the conditions under which the data stored in a database was acquired, such as its accuracy, data, time, method of compilation and processing.

Proper metadata practices mean that users of the data know exactly where it is from as well as any understanding any contextual information that is necessary for analysing the data.

Furthermore, usage of the Generic Statistical Business Process Model (GSBPM), an international best practice model, will help ensure that data lifecycle management process guides organisations to get the most from their data and metadata while determining appropriate use and retention to safely navigate the various legislative minefields.[xvii]

Predicting behaviour

In order to use data to predict customer behaviour, users need to have access to appropriate analytical tools.

For example, statistical methods and functions that have traditionally been solely available in specialised statistical software packages. As BI tools develop, these techniques are becoming more mainstream.

Of course, it is important to involve the users of the data in this process, to understand how they currently use and manipulate data to gain insights and then work out ways that software can automate all or part of that process.

Later posts will talk about the exciting world that is predictive analytics so stay tuned to those.

Expertise limitations

We have already looked at the problem of expertise limitations. As the tools become easier to use, more and more users will be able to take advantage of the power of data analytics, no longer having to rely on a select few individuals with the software expertise.

Accessing relevant, timely or reliable data

Software can help to solve the problem of reliable and up to date information. It is vitally important that the analytical tools make it as easy as possible to update the data. It must become an easily automated process, as opposed to a time consuming and highly manual exercise.

It is also important that data only needs to be updated once, in a single source of truth, rather than having to update many different databases.

Conclusion

Being aware of the gaps and limitations of current generation technology allows software developers to look at creating the capabilities that end users will start demanding in the future.

It is clear that disclosure control will become increasingly important: data must only be made available to users with the right credentials, and the system must automate this process as much as possible, making it easier to protect data and distribute it.

Finally, the tools must be easy to use and intuitive, helping to build a smarter user base from the ground up, and increasing the number of insights that can be gained as the number of users looking at the data grows.

References

Davenport, Thomas H.; Harris, Jeanne G. (2007). Competing on analytics : the new science of winning. Boston, Mass.: Harvard Business School Press. ISBN 978-1-4221-0332-6.

Analytics 3.0 Article Harvard Business Review - http://hbr.org/2013/12/analytics-30/ar/1

Bill Franks, CAO Teradata http://tdwi.org/articles/2013/09/24/evolving-scope-of-business-intelligence.aspx

http://blogs.perficient.com/businessintelligence/2013/08/09/business-intelligence-future-trends/#!

SuperDataHub – www.superdatahub.com

http://www.tableausoftware.com/learn/webinars/verizon-gains-critical-insights-self-service-analytics?signin=ea7b19c48b0d3e8c1a6a50a7b71eacf5

http://www.information-age.com/technology/information-management/123457131/self-service-business-intelligence#sthash.EuK7laTy.dpuf

http://www.slideshare.net/AlexandrePerrot/qlik-view-how-to-deliver-self-service-bi

http://www.finance.gov.au/policy-guides-procurement/gov20/

http://en.wikipedia.org/wiki/Open_data

http://www.information-age.com/it-management/skills-training-and-leadership/123457434/demand-for-big-data-specialists-to-grow-by-243--in-next-5-years

http://www.slideshare.net/AlexandrePerrot/qlik-view-how-to-deliver-self-service-bi

http://www.techdirt.com/articles/20130909/12361124455/

http://spacetimeresearch.com/2013/07/spotlight-on-metadata/

http://spacetimeresearch.com/products/superstar-platform/

Tuesday, 1 July 2014

Average Income where you live

This post relates to an older article I saw on Lifehacker about the Average income in certain Australian regions. It's interesting to see how certain regions

Thanks to SuperDataHub I can chart those figures pretty easily.

The data is from the ABS and the original article is here: Click here

This table could be enhanced by either adding an interactive chart or making the table searchable.

I’ve create a link to an example of a single chart I’ve created using our SuperDataHub product with the data you had in your article. This chart shows the average income over time for various regions. The regions can be filtered on the top left hand side.

See here:

A more detailed breakdown is shown here with some commentary:

To get your own data and create charts like this in seconds, check out www.superdatahub.com

Monday, 30 June 2014

Game of Charts

Here's one for all those Game of Thrones fans. Gizmodo had an article (click here) which talked about the rise and fall of popularity of the various groups that readers can assign themselves to. There are about 214 of these noble houses, characters or groups in the series.

The popularity is shown here from prior to Episode 1 through the week 7-8 Break and also the week after the final episode. Basically it's a popularity contest. But definitely a fun one to plot.

Enjoy!

This first chart compares the 7 Noble Houses in the books against one another:

Also, here is a breakdown of all the 214 Groups, Characters or Noble Houses you can explore via the dropdown:

To get your own data and create charts like this in seconds, check out www.superdatahub.com

Tuesday, 24 June 2014

Lightning never strikes the same place twice.... or does it?

This is pretty amazing. It's real-time lightning strikes all over the globe. The website Blitzortung has been created by a community of people around the world using sensors to triangulate the occurences of lightning all across the globe. I suggest you check it out with your volume on. The 'clicks' you hear are all the lightning strikes occurring and with a latency of 3-6 seconds it's pretty impressive.

I've seen documentaries with images of the earth from space that show the occurrences of lightning and it is quite mesmerising. To get this kind of information normally is a paid service but the group of volunteers part of the Blitzortung community have pulled together to do something quite unique too. Whilst it does rely on sensors actually being available the capabilities of these are pretty powerful.

More importantly they keep data on these lightning strikes and this can prove useful in certain applications.