BI – Very Big data

Oracle BI – Thoughts (10) – Cadran publishes a series of articles about the ideas surrounding Oracle Business Intelligence in combination with Oracle JD Edwards. In these articles various considerations and reflections are discussed, which can be helpful in making the right decisions in the implementation and application of both systems. In this article we take a closer look at Big Data.

Big Data

Some time ago I came across a very interesting lecture on TED → about big data. Speaker Kenneth Cukier explains everything about this phenomenon in 15 minutes time. He discusses the benefits, but also the dangers and gives his vision on the future.

A nice part of this lecture is about the development of the chess computer. After a technician had made a first chess program and he tried it, it turned out that he won time after time. He then decided to let the computer “learn” from each game that he played. He continued to win. Then the developer thought that the amount of games was not big enough to always draw the best conclusions and make the best decision. He then had the computer play itself adding each game to the database. After a while the man started to play against the chess computer again. He never beat it once. Add all the information of every official games and there is virtually no chance to win from the computer. Big Data (and a form of ArtificiaI Intelligence) was born.

With Big Data we enter the world of large numbers. The broader and deeper the set of information, the more reliable the conclusions that are drawn. The intelligence in the form of algorithms, which can give us these conclusions and correlations in data to light, has rapidly improved in recent years and the road is endless.

Unstructured Data

Initially Big Data usually starts with Unstructured Data. This means data that has not (yet) been categorized and classified. An example of this is the content of an email or a document. There is no metadata, labels, keywords, classification or indexing. Structuring can take place through forms. A well-known example is a complaint from a customer about a product. As an email this is completely unstructured. The structure only arises when this email is read, assigned to a certain product, which the complaint is about and to a certain type of complaint. If the customer is presented with a form, with clear input fields, including selection lists, the complaint immediately receives structure in the form of metadata about this complaint.

The strength of Big Data is that with a huge amount of Unstructured Data correlations can be discovered in the information. The metadata is generated by the software and as long as there is enough data present the dots can be connected. The algorithms can thus recognize content such as dates and add a time dimension. Addresses and places can also be recognized providing a geographical structure. These are just two very simple examples. The technique goes much further.

Technology

Conventional relational databases fall short when it comes to the amounts of data and the logic of finding connections and applying structure to them. The rapid developments of Facebook and Google contributed to new techniques that are capable of this. This is where terms such as NoSQL and Hadoop come into play. These are storage structures, which are intended for quantities of unstructured data, exceeding our imagination. These databases can not answer the question of how much turnover was made last year exactly, but they can give an approximate answer to the question of which nice restaurants can be found within a radius of 10 kilometers.

The technology is also self-learning. The metadata is produced by the technology itself based on all kinds of similarities that are found in the data. The algorithms are clever in recognizing a date and / or time (and thus a structuring in time) and a location (and thus structuring in geography). And these are just two very obvious examples. As the volume of the data increases, these red wires become increasingly reliable, especially when a human occasionally confirms them.

Data Discovery

Having Big (Unstructured) Data is of no value when there are no analysts uing it. The technology assists and supports, but will not draw a final conclusion, at least not yet. At the moment a human is still needed to find the connections, to draw conclusions from the analyzes, and to further assist the algorithms with learning. This kind of analytical work is called Data Discovery. On the basis of sample sets, technology may provide a continuation in the structuring of the data, but human analysis is necessary to get the real conclusions. These rules can then be given back to the tooling, so that future data is increasingly better structured and analyzed. We call this the self-learning ability of the algorithms.

From Big Data to Small Data

At a high level we will rarely be interested in any underlying detail. Let’s use the temperature on earth as an exmaple. Let’s say that in every country a few dozen thermometers measure the temperature every hour. When we put all those measurements into a database, we create big data. Each meter has a location and a time of measurement. After ten years we have a nice set of information from which conclusions and correlations can be achieved. Ultimately, we are interested in the main lines, or in the exceptions and large deviations. If we feed this data with much more geographic information (such as CO2 values) then we will always be able to draw the correct conclusions from measurements and make the right predictions. This may only be a single answer to a single question. What is the temperature on earth in 50 years? A lot of data provides a single answer.

From Unstructured Data to Structured Data

Big Data can also guide us from Unstructured Data to Structured Data. If we go back to the earlier example of a complaint, those emails can be used as a start. When we have enough of those, technology can now carry out analyzes on these emails, so that labels like product, type of complaint, the date and the location (the metadata) can be found. This enables us to structure this information and to convert it into a clear complaint form that can be filled in with the correct fields and choice lists and stored in a relational database in tables and fields. This structured information can then be analyzed with Oracle Business Intelligence.

Practical applications

A well-known example is of a passenger who is waiting for his luggage at the airport. He complaints on Twitter about the very long wait. Ten minutes later an employee from the airport arrives with the man’s luggage and apologizes for the inconvenience. The man then posts a compliment to the airline on Twitter and immediately the negative tuns into positive.

Recently I heart about a supplier of digital weather stations. A device fun for home where information about the outside temperature, and also humidity and air pressure can be read. This device is connected to the servers of the supplier for software updates. With a certain frequency, however, the data that the device measures is also passed on to the servers of the supplier. Meanwhile, so many people have such a device making this data is very big. Thanks to this large amount of measuring points, a very good and reliable picture can be made of weather movements throughout the country. The supplier now also sells abroad … Anyway, the question could be how long the KNMI will still exist …

Big Data is gathered and provides the right information at the right time. It seems like a holy grail, but big data allows us to do this. With the rise of the Internet of Things (IoT), the possibility of providing everything with sensors and linking the internet is endless. The Big Data stream is growing exponentially in size and speed. The possibilities of Big Data are there. Now it is still up to us to give this data value and to make information from it. But for how long?

Want to read more about the role of BI at Machine Learning and Artificial Intelligence read my blog: Do it Yourself