Text analysis – effective utilization of unstructured data

The world becomes more and more digitalized, and the amount of unstructured data has exploded. It comes in the form of a news feed from LinkedIn, Twitter or other social media, debate forums or as open commentaries in a survey. Usually there is great value in the insight hidden in the open commentaries, whether it comes to employee or customer satisfaction or market analysis.

It used to be very time consuming to analyze the open commentaries because each comment had to be read and grouped, before an overall picture of information could be seen. Comments are expressed different, and two comments are rarely identical in a set of data. That creates a need for a method to gain useful information from unstructured data like open commentaries. By using text analysis, all comments are examined, and patterns and structures are mapped.

ag analytics use the following method in text analysis:

1. Word cloud technology creates a general view

Using word cloud is the best way to start your text analysis. The program goes through all the open commentaries and creates a visual presentation of the most used words and/or sentences in the data. The larger the font, the more relevant is the word in the data. This gives us a quick image of the most used words and sentences. After that, we classify the comments in groups and analyze the trends.

Figure 1: Example of a word cloud based on 3000 open commentaries made to a car dealer.
en wordcloud

2. Classification in relevant categories

The open commentaries now need to be classified in relevant categories, defined by questions and the most appeared answers. If we ask e.g. how a company can improve a product and the most used words are prize, size and colour, we would classify the open commentaries in those three categories. If a comment fits in more than one category, it will appear in those different categories.

3. Correlation shows connections and pattern

Now it is time to analyze the texts in connection with all of the survey. Open commentaries are often attached to a scale question. If a respondent e.g. has given a low score, he or she will be asked to clarify this in an open comment. By including cross tables based on e.g. demographic we can identify patterns within the groups. Furthermore, we establish which words that correlates to each other. If colour is mentioned as a proposal to product improvement, we will clarify which colour that are mentioned and how many times. Open answers could be e.g. “the colour should be red” or “I prefer a red colour” or “the colour gray is boring – it should be red” and so on. The text analysis will automatically be able to identify the connection between “colour” and “red”.

Unstructured data can help your company with valuable insight, and will usually be a really god supplement to a quantitative analysis.

Customer Satisfaction and Loyalty are crucial parameters. This is also the case for Peugeot.

Read more

The world we live in is more data driven than ever before, but how do we find the right connections in our data?

Read more

"What we find is a clear link between the employee motivation, and the resulting Share of Wallet"

Read more