AI for Customer Satisfaction and Happiness Tracking
A statistitian's view on positive and negative sentiments
Let me ask you, how often do you happen to say something like "I got fairly enough sleep today!" or "My tooth is perfectly healthy"? How about "Drivers on the road were very friendly and attentive this morning", "Internet works pretty stable", "Hey, boss, I would like to say I'm satisfied with my salary"? And how often do we say something exactly opposite? I put a lot of effort to compose the above positive statements. But they still sound slightly unnatural, don't they? At the same time each of us could easily express any opposite negative message in a dozen of different ways.
For some reason, our vocabulary perfectly reflects the many shades of all kinds of negative phenomena, but suddenly reduces in size when it comes to describing normal and/or desirable situations. For some reason, when having started to learn a foreign language, very soon we are able to remember a number of abusive or obscene expressions. But as to positive evaluative lexicon, we are satisfied for a long time with the only word "good".
Is positive mental attitude really so unusual for our society? Let me share my thoughts on this issue. Optimism, positive mental attitude and wishful thinking, for better or for worse, are quite typical for most mentally healthy people. And richness of the negative sentiment lexicon is just the dark side of these our features.
While the situation is normal and all goes well, we calm down and do not want to bother ourselves or anyone else with any thoughts and words on this subject. But if something gets out of balance, then the time comes for an action! And then we are at first need to express very accurately what we are not satisfied with. That's it: when we are well, we switch off our brains and tongues; when we are in trouble we speak in a loud and sophisticated manner, and force others to listen to us. Any other behavior, like exaltation or euphoria, causes reasonable suspicion of mental instability.
At this point, perhaps, you have a question -- the question whether the author recognizes that he is writing for an IT blog. Well, let us see how all these effects manifest themselves in professional reality of a programmer and a statistician.
Measuring client satisfaction in an IT company
Any large IT company is forced to communicate with their clients on a number of issues and on a regular basis. On a so regular one that a company just has a special department and employs special staff to proceed with this kind of communication. Some of them are excellent specialists, making crowds of satisfied customers. And others will only scare and annoy even mostly benevolent and loyal clients. How to measure such competence?
One can survive without any measuring it at all. But then sooner or later companies with more competent staff will squize their less courteous rivals out of the market. Not the best idea, apparently, particularly if the competitors will be targeted to improve their abilities in this issue.
One can keep track of what staff and teams make customers to continue and develop cooperation, and what employees make them scale down the collaborative activity. This looks better already, but still this method may cost pretty much for the company.
One can organize regular polls for the customers about their satisfaction or dissatisfaction with communicating to customer support teams, before this communication makes them really furious. Perhaps this way is even better. However, it can be a challenge, and the customers can get tired of frequent surveys, where they have to search for euphemisms to express their irritation. Many of them will answer in evasive manner, or lie a little, or even just redirect their money flows to the less troublesome competitors.
One can eavesdrop and see how well the communication runs with the client, and make conclusions. This method seems to be even more customer-friendly. But then the company will need yet another department for listening and evaluating the quality of customer support service. Looks expensive? Fortunately, modern technologies allow to automate this task.
How to manage the whole spectrum of customer emotions
Imagine that you have the full log of correspondence between the service staff and the customers. Some customers write thank-you letters, praise and glorify the quality of service, while others send emails full of desperation, bitter complaints and angry claims to the support team. But most of them, of course, say something more calm and moderate, and only some small slips and unintentional phrases can tell us something about their real mood. Сatching such vague fragments can be quite a daunting task: psychotherapists are professionally engaged in solving exactly this problem for many years, and even best of them sometimes make mistakes. What to say about not too perceptive and empathic artificial intelligence…
Wrestling with any problem always should be began with the tail, which seems possibly more simple and straightforward. In our case, it is natural to start with paying all attention to the evidently positive and evidently negative messages. It doesn't look too complicated. Here and there we see phrases like "great job", "many thanks", "very well", sometimes also positive emoticons occur, and they are telling us: "I am in a very good mood, guys, and it's all thanks to you!". There are also other letters, where we see words and collocations like "error", "too long", "missing", "ASAP". Well, the world is full of imperfections, but at least we are able now to solve a small part of our task.
By the way, at this point we already face a positive-negative imbalance of evaluative lexicon. Our words for approval are standard, well-established and known to everyone. "Everything is alright, no need to search for sophisticated words, just let us repeat some common thank-you template" -- says the client's hypothalamus. On the other hand, our vocabulary for description of different abnormal situations is much richer. But it makes even harder to identify them among the entire body of a message! When things don't go the way we want, we cannot relax and escape by pronouncing some common idiom. Instead we do our best to describe in detail, what disturbs us, what needs to be done, who should do it and what we shall perform otherwise! But there are not so many template phrases for expressing such an apparent discontent.
Okay, here go the numbers: I managed to find 30 template words and collocations, which indicate a good mood, and only a dozen of them which witness for a negative mood. And I must say, some of this dozen are rare and others are controversial. For example: "too long", "catastrophic" are likely to mean that something is wrong, but they are not very frequent. The words "why" and "problem" are more common, but can we really rely on such witnesses? And here is the result: in our text corpus consisting of 7500 emails, in 3500 cases we have found positive words and phrases, and only in 800 cases we have found some negative witnesses. Hereinafter these messages will be referred to as "positive" and "negative", although this notation is not fully correct.
But what should one do with messages that were left without an unambiguous evaluation? And, by the way, is our unambiguous primitive evaluation always adequate indeed? For example, a positive idiom "thank you" is also often hypocritically used by unhappy people… It is contextual statistics who can help to answer both questions!
Building statistics and database
In order to start building any kind of statistics, it would be great to know what words and collocations are generally contained in the messages. For example, the collocation "many thanks" was found in 120 of 3500 our positive messages and in only two of the 800 negative messages. And the phrase "don't want to" was found in just one of the 3500 positive emails and in 15 of the 800 negative emails.
What does all this mean? Imagine that I'm a bookmaker and I offer bets on a new message which recently came from the client; and you bet that it will be evaluated as positive. While there is no additional information, your chances to win can be reasonably estimated as 3500:800=35:8. Imagine now that you managed to spy in the text of the letter and notice there the fragment "…many thanks…". Of course, now your chances have increased significantly! Now imagine that you noticed yet another collocation there: "…don't want to…". Now, you cannot be so sure anymore. How should one evaluate your chance to win? I'll skip some details here and just provide you with a final solution: in the light of all available information, your chance is equal to 3500(120/3500)(1/3500):800(2/800)(15/800)=32:35. That is, after peeping into the text, you should conclude that your chances are less than 50%.
Roughly speaking, this is how the classification algorithm works; the only difference is that the algorithm looks and analyzes sequentially not only a few fragments, but literally all the words and collocations contained in a new message. It is learning by looking at results of rather clumsy and naive evaluation function, which in half of the cases keeps silence, and in the remaining half can return an incorrect answer in some 10% of cases. But surprisingly 90% of correct ratings for the half of messages is quite enough for learning! One just have to look through the context carefully in order to identify all important features, and then the result can qualitatively outperform the initial naive evaluation.
There are some other nuances associated with this method of classification. One of them is that not all words and collocations should be subjected to such analysis. For example, the word "the" is found virtually in all messages and obviously carries no information about the emotional state of the sender (the positive-negative ratio for this word is 3050:700 and thus does not affect the evaluation of a message). One could also process it indiscriminately, but usual practice is to filter such words out.
A more interesting issue occurs with the rare words. There are lots of them. Out of 850K of overall different words and collocations, 700K occur only once; out of the remaining 150K, 90K occur only twice. Statisticians call them outliers and often hate. Let us consider just two of them: "to speed them up" and "to minify the time". Both are roughly similar in their meanings and both occur only once. But the former occurs in a positive message, while the latter accurs in a negative one. This hardly can be explained other than by a pure chance. Mainly for this reason, such rare words and collocations usually are excluded from considerations when evaluating a message.
The question is, where should be the borderline between the too rare words and the fairly frequent ones. Usually, more of the rare words are neglected, when the corpus of analyzed messages is large enough, so that there is sufficient amount of information carried by the relatively frequent words. But if there is not so much data, then we have to take into account also fairly rare words. There is no generic method for determining this borderline.
Here's why all these details are of our interest: it turns out that the rare words are more common for positive messages - just exactly where the most frequent positive clichés are gathered. On the other hand, negative messages are often written in words of some average frequency; they are composed exactly like statistiians love: these words are neither meaninglessly frequent, nor they are too unreliably rare. We just have faced, for the second time, the imbalance between our positive and negative vocabularies, and let me show an interesting consequence:
Let us evaluate the average mood by analyzing all words and collocations that occur at least twice (there are 150K of them, if you remember); we get +30% (roughly speaking, it is the difference between the probability to meet a positive message and the probability to meet a negative message among all the messages).
Let us do the same, but consider only words that occur at least three times (there are only 60K of them); we get +25%. By removing the words encountered only twice, we neglected the connection between the frequent & positive "thank you"'s, and rare, almost unique, statistically unreliable words. The general evaluation of the overall sentiment, of course, has decreased.
Let us perform the same evaluation, but consider only words that occur at least four times (we have only 40K of them); the score is +20%!
Let us consider only words that occur at least five times (only 28K such words); the score is +16%.
Finally, let us consider only words that occur at least six times (only 23K such words); the score is +12%.
The overall lexical image of communication with the support team in a nutshell looks like the following: the overall mood is positive, but weak & statistically unreliable connections between positive words yield to the force of the tightly woven mesh of claims and complaints. Is it worth to make any conclusions about the psychology of communication with customers in the field of IT? Possibly, it could be worth, but I'll be careful and won't try to develop such ideas, at least not in public.
Support team at Scandiweb
Instead, Let me provide you with some facts and estimates concerning the department of support service at ScandiWeb:
The department consists of about 30 employees.
Most of them are divided into 6 teams.
The teams can vary in size and in the experience of its members.
Each client is assigned to only one of the teams and interacts with it merely.
Statistics, even though it could be built with different parameters and result in a significantly different evaluation of the overall mood, is fairly consistent in the most important details:
in the difference of evaluations between any two teams or any two employees, so that one can see which team or employee performs better and which does worse;
in the dynamics of the sentiment for any customer or group of customers, so that one can track the its progress in time.