4 TYPES of BOSSES WHO DO NOT UNDERSTAND analytics

When I wrote a blog about the Analyst Loneliness Syndrome a few weeks ago, I knew I wasn’t talking about isolated cases. However, the magnitude of readers’ responses have completely knocked me out. It is bitter-sweet mixed feeling of sadness and joy that you nailed something right, but you feel sorry that so many people are suffering from this syndrome. So I decided to talk to some of those who contacted me about that blog and to write extension of original post. This time about one of the three core factors of analytical loneliness: the management side of things.

The classic HR maxim says: “Employees don’t leave companies, they runaway from their superiors.” Although I do not quite agree with this generalization, I have to admit that in 4 out of 5 cases when I changed my job, it was true (this is my greeting, Rasto, you are the exception). One can leave boss behind for a variety of reasons, but in most cases it’s a combination of some of the following “evergreen’s“: He can’t appreciate my work; He does not understand the area and therefore I get mostly nonsense tasks; He does not believe me and hints me so; No inspiration or development from him, I only rot; His/her moral standards and deeds are in deep contrast with my beliefs.

Certainly, managerial superficiality and incompetence can affect you in almost every industry, but I would like to specialize in a typical example of this ailment in Data Analytics and Data Science. The traditional managerial characters have gained some additional spicy ingredients in this industry. After all, judge for yourself, here are 4 TYPES of MANAGERS that don’t understand analytics:

1] Don’t drag me into details

BOSS_type_1Managerial Profile: It’s incredible, but even today there are still many companies where data analysts is “stuck” under the Head of Business or Marketing. It is often a tragic consequence of widely spread belief that data can  significantly influence company’s revenue. As a result, analysts are moved under the Head of business or marketing to make this influence happen . However, these are usually managers who have a mathematics aversion, developed already at their (primary or secondary) school. Anything more complicated than percentages leave them restless. Simple numbers like sum and average (of course, they only mean arithmetic average) ok, but everything else is already far too complicated. Any statistics beyond the correlation are just “academic curls” (or crap). Their phobia from numbers and more sophisticated analyses comes from the fact that they have never understood this area, are not in control of it and thus are afraid of it. They don’t believe in the power of calculations nor AI, they solve everything intuitively and on the basis of proven approaches (read as: it worked once in the past). He prefers human speech as communication, simplifies every schema or spreadsheet into 2-3 sentences. However complicated the analysis is, at the end everything has to end up in Excel, which can be filtered by columns, and must not be more than 50 lines. When you try to “dive into” the results of your work, (s)he will tell you “Let’s do not get too technical” (just tell me the essence)

Implications for your work: If you work for such a manager, you will be probably having a very frustrating working life. Since this kind of managers never did any analytical work, (s)he doesn’t know what IS real and what’s NOT. Neither in terms of procedures and results, but especially not in terms of time you need. So get ready to receive ridiculous tasks in gallows deadlines (what could take so long, right?). He has intuitive expectations about every assignment he gives you. If you fail to match it with real data, a tough week is waiting for you. No matter how thoroughly you prepare your analysis output, (s)he will take one or two of the most obvious (= most primitive) conclusions, thereby gradually discouraging you from coming with more sophisticated procedures first place. Sooner or later, there will also come attempts to censor “illogical” analysis outcomes. If it is necessary to present to “seniors” conclusions, (s)he will let you do it (while throw half of the slides out of deck as useless). Because if the top management did not like it by chance, (s)he will drown you with “it not making sense even to him/her”, but it just happen to be the calculation result. In the area of expert or personal development, you are down to pure fate of Robinson Cruse.

What should you do about it: I hate to be an evil prophet, but if you are serious about your analyst career, run away from there. In fact, this kind of manager is unrealistic to improve, because he considers more complicated analytics to be a necessary evil that suits him only when it confirms his intuitive hypotheses. Otherwise it is unnecessary “trying to look smart” that has no support in (his/her) reality. The only alternative to fleeing would be to attempt a coup d’état (whistle-blow him to a higher level of control and they might exchange him). But honestly, this kind of managers have a stiffer root and they have more “merits” than you have of convincing arguments. So sooner or later you just leave (with great relief).

2] Scared rabbit

BOSS_type_2Managerial Profile: This type of manager stems from the first type and often represents a generational shift or personality development from “Don’t-drag-me-into- details” type. What remains the same, (s)he never did an analytical work himself, so things are not understood. The “move forward”, however, is that they do not reject a more sophisticated analysis because it has come upon them that they cannot do without it anymore. To this “improvement” he was pushed most likely by the CEO / shareholder attitude or the fact that he noticed all the competitors around already using analytics, so we must have it, too. However, as (s)he does not understand things him/herself, he is only trying to follow very elementary steps, often mimicked from professional conferences or buzzwords (anybody Big Data?). (S)he is stiff whenever you enter her/his office, because (s)he knows that the debate with you will revolve around an important subject (s)he doesn’t control. Nevertheless, in order to survive (s)he must feed to levels about him/her (who forced the analytics first place) the illusion that (s)he is not only interested in analytics but also orientates well in it.

Implications for your work: The consequences are similar to situation when you have to get out of a dark room filled with things. (S)he only progresses slowly through the familiar outlines, (s)he first gropes everything thoroughly to make sure we don’t bump into something hard.As a result you will only get elementary assignments, everything will have to be tested in a small pilot (= no effect anyone could notice). Concept that to you train a model first on 1,000 people and then scale to 100,000 doesn’t make sense, does not ring a bell with her/him. Therefore, most projects will die after the pilot. He’ll never fight for better software or a more powerful computing engine, “let’s try first with what we have. When we do, then we can ask for more money.” (S)he’s too soft, because (s)he can’t steer you by essence (since (s)he doesn’t understand it) and so (s)he will try to do it in a moderate way. You won’t get strong decisions or quality feedback from them. Do not expect a vision where to follow, you often will be firefighters of issues that fell from top (and which (s)he cannot conceptualize and prioritize). Since (s)he is uncertain in your area, he will explain everything from Adam (sometimes repeatedly, as it has been overwritten by other issue in his/her head). Most probably he will never let you present the results of your work, so that it is not revealed to leadership that he does not understand even half of what you do.

What should you do about it: If you do not mind (or even prefer) that this kind of managers isolate you from contact with the top management, you can survive in this setting quite comfortably. However, you will have to educate your direct superior continuously (sometimes repeatedly on the same topics). Do not expect any career growth or expert development, at most you will be left with the space to self-tune. As a intermediary station, this is not a completely unbearable. But primitive and repetitive tasks and professional stagnation will catch you up sooner or later. If you have lived with such a manager for more than 3 years, look around where your peers have moved. Your train might be running away.

3] When we in ’95 did this …

Boss_type_3Managerial Profile: It is a manager who once worked as an analyst. Of course, when data analysis meant OLAP and mainly SQL data reporting. (S)he didn’t get too wild with predictive models, Monte Carlo simulations, or neural networks. So (s)he did not realize that data analytics is done completely differently today. In addition, his/her abilities are more of a memory-optimism that is often transformed into “When we tried this way in ’95, it worked”. In a sense, this type of manager is more dangerous than the first 2 named types. If some tries to convince you of something that is true, it is always worse when (s)he thinks being right rather than being not sure about the issue. In addition, this type of manager wants to be involved in every detail because he remembers that it was exciting to reveal new connections (maybe (s)he is nostalgic about it even). In fact, (s)he does not realize that “is no longer playing the same league as the young ones”.

Implications for your work: Perhaps the biggest risk of this type of managers is micromanagement. By living in belief that they understand the area and by having nostalgic memories of the times when they did something real with the data, they will seize every opportunity to “engage in the project.” This can sometimes go so as far as to “volunteer to help” and take parts of the project on their shoulders. (what is to be avoided by far, if for nothing else at least to meet the project deadline). Speaking of those deadlines, the second major risk of working with such a manager is unrealistically optimistic time-frames. After all, when we did it in the 95s, it took just … The biggest risk in the long run is that it will slow down (or “torpedo” by expert “arguments”) your introduction of the modern trends (to keep up with you). Maybe (s)he won’t even do it consciously, but if you take two steps back, after a few years you’ll find that you are more or less spinning in a vicious circle.

What should you do about it: For some people, such a job can be comfortable and they let themselves to be fooled that it might have turned out much worse off (see the first and second type of manager). If you are at the end of a career or are among those who prefer traditional to innovative, just enjoy a comfortable life there. However, if most of your working life is still ahead of you, you need to foster space for professional growth. And the pace should at least match the market growth to avoid becoming “unnecessary junk in the labor market”. Therefore, I recommend that you sit down with such a manager and ask for autonomy: part of your working time (e.g. 1 day a week) to test new trends (which (s)he does not hints you to). If the manager does not agree, (s)he is probably well on his/her way to transform into Type 1], and so should your answer to it be in the spirit of advice for that type (see above).

4] Jules Verne

Jules_VerneManagerial Profile: To avoid wrong impression that a manager is a problem only when (s)he knows less about the issue than you do, there is also the opposite case. I personally hate the principle, when the best surgeon is nominated to be the hospital director, with the argument that others appreciate and respect him. Regrettably, even in analytics, the most skillful (or the most powerful) analyst becomes a team leader or department manager. It happens so often because the levels of control over it are some of the first three types, and so they need someone to cover the technical side of things. Jules Verne is a type of manager who once was Data Scientist or at least a sophisticated data miner. After (s)he stops officially being responsible for direct performance, and is charged with task to manage other analysts, often one of the following things usually happens: 1) Becomes lazy and realized that (s)he no longer wants to return to writing queries or code (resulting in a gradual loss of touch for analyst’s work) or 2) will finally take the chance to do those cool types of analysis that the nobility did not allow him to do before. Often both of these transform into a non-critical acceptance of “hype news” in the industry. After all, he also wants to brag on the beer with other data managers what cool things we are in our company. As the consequence the journey becomes a goal. Trying this-or-that is more important than making something easier to really work. While (s)he is no more responsible for time spent on the individual steps, rather (s)he already determines the strategy for future.

Implications for your work: The assignments become increasingly confusing, because “Try to plug in a neuron net and let’s see what it brings.” Of course, half-successes go into drawer immediately to free up the runway for yet another new approaches to try. The result is a frequent change of priority and a gradual absence of a sense of real effect. The absence of value added gets noticed soon also by the “those up”, as will the time pass working in the Jules Verne’s team also means an increased risk that some organizational change will wipe out the entire team from Earth’s surface (read org chart) without any warning. At the same time, this kind of managers push their people into the position of generalists rather than specialists, which must not necessarily suit everyone well. Projects’ track record might look impressive in CV, but when you gonna by interviewed by someone who really did that (and not just tried as your team did), you will badly grill on your own barbecue stick.

What should you do about it: If you are JUNIOR in this area, it is paradoxically more advantageous for you to stay for a few years. Getting a broad (and shallow) outlook at the beginning of a career is not necessarily a bad choice. However, do not take too high a mortgage so that you do not bleed when your team suddenly ceases to exist one nice morning. If you are a SENIOR, confront the manager with the flicker that (s)he shows. Give him/her a feedback that you want to finalize the projects and that one new idea a week a probably enough. If he doesn’t understand or laugh at you, go to his supervisor to describe the situation and say either HIM/HER (or YOU). Both answers will be the right choice for you. If you are the first to do this, you probably save the rest of the team, but you will not regret the possible departure (possibly with handsome severance pay to get rid of you quickly).

Have you stumbled across one of these 4 types in the workplace? Have you ever experienced yet another type of dysfunctional Data Manager? Share your impressions at info@mocnedata.sk. I keep my fingers crossed for you to avoid those types of people. And if you happen to meet them, try to follow the advice from this blog. Bon voyage!

AI SUPERPOWERS : Book for those WHO THINK about OUR FUTURE

A few weeks back, I came across the book AI SUPERPOWERS by Chinese author KAI-FU LEE.  I think it’s about 250 pages that anyone, who works in the field of data analytics should read (or at least think about it). It’s one of those books that are best when you read them yourself. Therefore, I will try to keep my review a reasonable balance between teasing and the feeling that you already know all what the book should tell you.

AI SUPERPOWERS offers many points to think about. I personally counted at least 20 (!) thoughts that I realized that I haven’t thought about yet. However, before  outlining some of them, we should explain who the author is. Kai-Fu Lee is a Taiwanese man who worked for 35 years in the field of artificial intelligence. He started voice analytics for Apple, set up a Microsoft research center in Asia and, as CEO of Google of China faced the dilemma of establishing Google in a country that does not necessary celebrate its existence. He also manages venture capital funds to develop AI solutions. Kai-Fu Lee is a rare combination of experience with state-of-the-art AI approaches from Silicon Valley and a typical Asian “cautious overview” that does not accept simplifications as well as does not need adhere to cult of America is Great. He praises where he sees real mastery and pinpoints hardly pretending and unwarranted stereotypes.

The reason, I think you should read this book by yourself, is that among official lines of the text you will likely find your own inspirations (as it was with me). The book is a busy tree where everyone can choose “how much they sit” on each branch. However, in essence the book is a cocktail of three complementary streams (some of which you would not expect to appear judging by the very book title) :

The first stream (most in line with the name of the book) describes developments in the field of artificial intelligence. It contrasts how diverse the paths to the sophisticated analytics were for the US and China. Taiwan and Hong Kong Hong Kong have a bond with China, but their relationship is not, ehm, optimal. (I have a colleague from Hong Kong who narrates often about it in detail.) So, Kai-fu Lee’s position is not a pink ode on the Chinese model. Quite the contrary, it offers a very balanced view of where China stands in AI area and where lags behind US. As he had experienced both environments, his comparison is a valuable counterweight to the general propaganda both for and against China.

The second line is the author’s personal account of how (thanks to the cancer he managed to overcome) he changed his view on the direction in which artificial intelligence should go. The story of a seriously ill man, who at stair to possible end of his life completely alters way of thinking, is almost a cliché in our culture. But if you manage to be stay less cynical and you will close your eyes on this emotional aspect of the story and focus in reading this section rather on his conclusions, it becomes an inspirational reading.

The third stream of the book was a bit of a surprise to me. But a pleasant one. Being tricked by the title of the book, I did not expect author to try to extrapolate AI trends and describe what awaits us. The focus of this last part is similar to (for me the hilarious) SuperIntelligence book, so very inspiring read. However, as AI SUPERPOWERS came out later, it is already looking at some future aspects of  AI richer, with result of the first experiments (eg. with UBI) and, hence, in more specific narrative.

However, in order not to just scratch the surface of this masterpiece, let me offer you few specific inspirational ideas that this book has brought to me. I believe they might be the right “teaser” to actually read the whole book:

Copy-cat China. The book clearly and detail depicts that China has reached its peak in economic significance by copying foreign products. Therefore, author bluntly admits that in industrial production and design of material things, China is certainly not a ruling world power, but rather an “embarrassing copier”. However, the development of online services, AI and data analytics has gone through completely different story. As a result, the latest Chinese advancements in AI and digital services are a still bit in shade of “copy-cat sticker” of the past. But book clearly explains that it would foolish and outright dangerous for the external world to keep perception China in this mental illusion.

From cash directly to App-pay. In some parts of Africa, they have long lagged behind in building a fixed line network, so many areas have been cut off from the world. However, suddenly, with the advent of mobile networks, they could skip the landline stage and get access (to internet) directly via mobile network. A similar episode took place in China in the area of payments. In China, credit cards have never  properly settled as a form of payment. And when e-commerce was launched, the market has jumped directly on in-app payments like We-chat or Alibaba.

4 AI development forces. Like any effort, even the development of artificial intelligence has its own factors that can accelerate or hinder it. In the case of AI, the following 4 dimensions seem to be relevant: a) Computing power in the form of HW, b) sufficient human talent, c) volume and quality of data you have for training AI, d) business underpinnings for implementing developed solutions. At the same time, the extent to which these 4 factors are fulfilled by a particular country predetermines what role that country should take in applying the AI. I also used this knowledge to prepare an AI strategy for Slovakia, which construction of I had the honor to participate in.

The status of your phone’s battery. When predicting phenomena, you should use all the available inputs and be aware of whether you are not limiting the possibilities of AI by your very own prejudices. The book gives some great examples on this subject, most of which I liked how the usual state of your mobile battery is related to your discipline of paying off financial obligations. My regular readers know I’m a strong promoter of Feature engineering and data riddles, so I really enjoyed this part.

Probably this much, Your Honor. In many areas, AI will serve as a human counselor. Medicine is often discussed, but Justice is more of the Taboo so far. Artificial intelligence can also be helpful in this sensitive area without machines deciding on us. There are already systems that search within historical court records to detect false testimonies of witnesses, contrasting the information given in previous legal litigation. Moreover, AI can provide inputs to calibrate the severity of penalties for the same criminal acts (using scatter-plots between particular aggravating/attenuating circumstances and the length of the sentence periods to see if the proposed sentence is too strict or moderate).

Autonomous car(t)s. The discussion of self-propelled vehicles is primarily zooming on autonomous cars. However, there are much simpler implementations that are both not as dangerous and show much more of immediate, mass-use potential. These include shopping carts, for example. They could be programmed to follow you (and stop whenever you turn to fetch something) or even set themselves the fastest route through supermarket, depending on where your shopping lists items are located in store.

Hold on, I’m sending you a drone. The second implementation of self-propelled vehicles, which is simpler than cars, are flying things. No, this is not hype about drones, but really there is more space in the air and lower chance of collisions than on the road. We might not realize it, but the planes were equipped with autopilots sooner than cars. We also have pilot-less attack aircraft, but not unmanned tanks or warships. Therefore, one of the nearest ways of AI use will be unmanned rescue units that will be able to extinguish fire or rescue people even in the exposed terrain, without compromising the lives of the helicopter or aircraft crew.

O2O, the key to platform success. Online-To-Offline (O2O) is a concept where you start a service in the online environment but at its end there is material fulfillment in the physical world. Examples of such services are E-commerce, Uber or Booking.com. Markets that offer o2o products are more tangible to people than pure virtual services (self-learning courses, online software business). People feel the physical dimension of such a service. Therefore, we are also more willing to pay for such a service (such as pizza delivery), while services such as online tax advice are only slowly collecting their enthusiasts.

What is different this time? Past industrial revolutions are often used as an example of how mankind has dealt with the harsh changes in the labor market. Thus, the optimists say that even AI will not be a disaster for jobs (by the way, a few words why Kai-Fu Lee is not so optimistic on this note). This book brings one interesting twist to this issue, the Deskilling paradigm. When attentively reading the history, we find out that the jobs, that sprung after the industrial revolution, required of workers shallower knowledge of the matter then their alternatives before revolution (weaver vs. weaving mill operator, mathematician vs. man with a calculator, …). This phenomenon is called Deskilling. The important question remains whether we are ready to admit such a development for healthcare professionals or teachers. To put in one sentence: In the AI Industrial Revolution, the vocations at stake are those where the credibility of the profession is linked to the human factor.

Bigger surveillance, not weaker one. Due to the accumulation of data and some other factors, AI services have a greater tendency towards  monopoly than in other economy sectors. (anybody, Google?) It is, thus, important that the AI industries are subject to stronger rather than the weaker (antitrust) regulation than traditional industries. However, the states are lagging behind both in legislation as well as competency to steer them. There is no clear idea how to regulate services such as Facebook and public authority, at the same time, lack educated employees to supervise them first place. It feels almost like if there was no one with medical education in Health Care Supervision Office.

If you are interested in any of the topics, I encourage you to read the entire book. It’s really worth it. If you’re still wondering if it’s a good (time) investment, check out the Kai-Fu Lee’s video, where he talks about some parts of this book.

Two BigMac’s and one Big Data along, please.

Finally, we got on turn in the queue. “Two Big Macs and one Big Data along it, please,” says my colleague nonchalantly. The girl behind the counter is obviously stunned, her glance jumps alternately from one of us to another. She is balancing somewhere between misheard the second part of the order or worried that she had not yet studied the entire menu of the restaurant properly. Then she gently flushes and nods in confirmation. With huge effort we fight back laughter to keep us from revealing.

– – –

McDonalds_BigMacThis was a joke that we tried a few years back at one of our McDonald’s visits. Putting Big Mac and Big Data into context was really a prank. How it finally worked out you will find out at the very end of this blog. However, what served to be teasing joke back then is no longer laughable today. Even as straightforward business, as  fast-food undoubtedly is, starts to discover the nooks of data analysts and applications of artificial intelligence.

According to WIRED, this fast-food giant has decided to buy Israeli firm Dynamic Yield, which specializes in machine learning algorithms supporting Sales and Customer service. If the bare essence of the message has not raised your eyebrows, I add that this is the biggest acquisition McDonald’s has done over the past 20 years. Backstage expert information suggests that Dynamic Yield price was north $ 300 million, or about 7% of McDonald’s worldwide cash flow or 5% of global annual revenue for the past year! For comparison, it’s about as much as it costs them to build restaurants for all the Scandinavian countries combined. Perhaps, some of you will shake your head: What does McDonald’s sees in artificial intelligence, so that it is willing to “throw” such huge money on it?

While McDonald’s products are so standardized that they are often perceived as the cornerstone of simplicity,  you might wonder what is it there in McDonald’s business to analyze in such a depth. Some would fall for usual suspects of optimizing inventory logistics or the efficiency of hamburger and fries roasting. At least this is how we, the customers, see McDonald’s from behind the counter. Therefore, I bet you may be surprised, that the real reason for fast-food to chew into sophisticated analytics is the bare customer data. As McDonald’s is still severely limited by the physical number of different products it can offer you (contrary to Amazon, for example), client data is not primary helpful to make yet another upgrades or new versions of burgers. The way that they reported using Dynamic Yield technology is surprising different.

McDonald’s seeming data analytics bonanza is a drive-thru process. (…, which is not the focus of sales in most of the Europe, but is an important share of the overall market in the core WesternMcDonalds_DriveThrough markets. Thus, Big Data might not find immediate use in our region for some month, but in the US it has dire consequences). You may have noticed that most of the offers and promo banners in McDonal’s restaurant chain has lately  been transformed into digital displays. This change allows not only to speed up the exchange of new offers for the old ones, but allows also to personalize the offer for a particular customer. Certainly, with standard ordering directly at the counter in the restaurant, it doesn’t make sense because the “personalized” offer would confuse other clients waiting in the queue. But in drive-in it is all possible. So how will it all work?

Upon approaching the drive-through, their system attempts to recognize which customer it actually is. There are, metter-of-factly already several  alternatives to do so: recognizing of car number-plate, beacons fishing for your mobile device, credit card details, or at least unique enough combination of products you order. By identifying (or estimating) your identity, the system will then take advantage of the time you are waiting (for order to fulfill) and by using local weather data, info on nearby events and current menu item popularity of items, you have not spontaneously added to your order,  will offer you personalized coupons (which can be turned into purchase extension via one-touch purchase). What is more, if the system recognizes you before placing the order, it can even incorporate factors like duration meal preparation (to match your (in)patience profile), or the relative length of the queue versus the standard you experienced in similar part of the day in past.

However “banal” this might sound to you, properly personalized UP-sell offers normally achieve a 3-7% success rate. McDonald’s averages 68 million clients served per day. Adding to that fact that by speeding up McDonald’s own service might prevent some customers dropping from a slow-moving car queue, one can easily imagine that a 5% return on sales can be achieved almost instantaneously. Moreover, as the number of generated client coupon grows, the system learns to be increasingly targeted (and therefore more successful). If any benefits can be materialized from servicing (with AI help) “non-motorized” customers  inside the restaurant will be extra joker card of this project. Though face recognition systems are already tool in retail, so it will be not long before they prove their might in regular counter-serving of hungry guests. The greasy, fast-food mass production, as McDonald’s often see, is seemingly moving into a new era. Who would have expected so just a couple of years back?

– – –

Laughter finally beat us. We had tried our best to stay calm, but when the order tray landed in front of us, we couldn’t stand it anymore and burst into laughter. There were large fries laying besides the two big burgers. We quickly grabbed the burgers, leaving the ordering colleague with long, Big Potatoes instead of lunch. Well, he deserved it, after all. He ordered Big Data few years too early.

 

Why should Data Scientists BE SCARED ABOUT AI coming as well?

Debate on if Artificial Intelligence (AI) will slash some jobs (or entire professions) transformed from obscure omen reading into mainstream heated issue. Truck drivers, financial intermediaries and few other professions are nervously looking ahead if they gonna join red-listed endangered species. They certainly have good reasons to be worried …

… but have about Data analytics? Are Data Scientists on the AI replacement to-do-list as well? What a silly question, isn’t it? Ultimately, the Data Scientist are the oil of the AI solutions. Thus, they will be the ones eating others’ jobs and they do not need to worry about their own future. Or should they maybe?

 How sure you should (not) be

Few months ago, the economic expert commentaries were still shy in indications that the world economy state might deteriorate in quarters to come.  Back then it was a message per week on this issue. In last 2 weeks the matter got visibly more dramatic, the black omens pop up now on, literally, daily basis. As we learned from past, every economic crisis usually slashes substantial number of jobs opportunities on the market. In this sense the crises to come will be no different. We learned not to worry about that too much, as new jobs are recreated back, when the economy walks from crises back to good times. The problem is that on this account the nearest crises will be different. It will kill some jobs that will be never be recreated again.

Almost any week you can see some profession striking for salary increases. As the economy is booming employees push to reap a slice from the victory cake. However, there are some jobs where salaries kept on rising without any push from labor unions. Data Science is one of that areas where annual income has been on crazy adventure to the north. Driven by over-demand on (weak) supply, companies were raising the pay-level to swerve people from competitions (or motivate more people to get re-qualify to Data Scientist). But no more. Data from US (largest free Data science labor market) indicate that the entry salary of the Data Scientist stagnated in 2017 and corrected few percentage points down in 2018. The reasons for that is the price of Data Science talent got over the level fitting business case for their possible impact in company (to justify their pay). Not many people realize that higher remuneration of these years are last dances before DJ calls off the party.

In both cases, the strike- or surge-driven salaries will make the AI replacement scenario more severe. When we come out of the crises, the employers will be facing the dilemma if to rehire stuff again or to replace some part of it by automation. The higher the annual salary level of employees, the easier the case for AI solutions to be cost-saver. Especially, the area of super expensive (and still scarce) Data scientists offers a lot of room for rethinking, as one year cost of Data Scientist in US is, literally, 7 digit figure.

The (seemingly strong) peace of mind of the Data community about their jobs security has roots in fatal attribution error. For most of the manual jobs the replacement will come with automation, presumably intelligent computers running on data. Therefore, data processing industry might be perceived as the lubricant of the whole automation process. Hence, the strong believe that data scientists are on the right side of this transformation river. While data might, indeed, be the oil of the AI transformation, it is ill conceived that humans necessarily need to take part in extracting it. If we stick to analogy, most of the things on oil rig is not human labor force but automation itself. Similarly, the repetitive and easy-to-automate jobs in Data analytics will not be run by humans. If you take two steps back and impartially review the work of most of recent data analysts their work is much more well-defined and repetitive than driving of the autonomous car. Therefore, data community should not wall into trap of illusions, that AI job revolution will take detour from their domain.

Time for panic?

The omens are out there, time for panic? Well, we as humans were having difficulties facing the previous industrial revolutions. And we will probably struggle this time around as well.  Almost any time disrupting technology arouse in past, first answer was to push back by, literally, beating the machines. However, there are ways how we can face the AI job hunt properly. I have been invited to speak about HOW TO SURVIVE the (first) AI ATTACK on DATA SCIENCE JOBS at the DataFestival 2019 in Munich this week. This is a short teaser about the topic, and I offer you exclusive sneak-peek into

PRESENTATION >>> FILIP_VITEK_TeamViewer_SURVIVAL_TICKET

here as you are precious members our TheMighyData.com community. As this topic hits all of us, any comments or views from you on this topic are highly welcomed in comments to this blog or at info@mocnedata.sk ; Enjoy the reading and see you on some other event soon.

What (should) good Feature engineering look like?

We are living in era where Data analytics upgrade form mere main stream small talk topic to key business driver. Almost everybody tries to get better in Advanced analytics, AI or other data-based activities. Some do so from sheer conviction of its importance, some are forced by group think or competition moves. Let’s admit it, for most organizations and teams this is quite leap. So they, …

Leaning on methods

… lean on whatever literature or manuals there are around. I feel strange mixture of sympathy and disdain for them, while almost all self-learning courses and books on analytics primarily focus on methods of Data Science. You can read on which algorithms are suited for which tasks, how to program them or use open source packages to calculate them (as black boxes) for you. If you dwell in analytics for a bit more time, you probably smile already and know where I am heading with this point. Large part of the novices (and late- or returning-comers) fail to understand that the HOW (what AI method) is getting more and more commodity. Thus, the more you try to specialize in those categories the less prospective your future in analytics might be. Don’t get me wrong, I am far from suggesting that new data Scientist should (or need) not to understand the methods behind, quite the contrary. My point is that trying to get better at writing logistic regression or training decision tree is like trying to get better in potatoes digging in era of tractors. Machines got so good in applying the analytical constructs that it becomes Don Quijote-ish to fight with them. Algorithms are becoming commodities, even commodities that are free there in community. So where shall you channel your skill improvement in analytics?

Where to go?

From all the areas that are needed for successful analytics (see here), Feature engineering seems to be the most promising bet in attempt to make significant impact. While it is the first step of the analytical funnel, whatever defects (or data loss) you introduce there is carried over the whole analytical process. It is proven by Kaggle competitions that informational disadvantage is almost never compensated by “smarter” algorithm down the road of model training. What surprises me, though, that in ever mushrooming inundation of books and analytics, you find very little on Feature engineering. Ironically, it is difficult to stumble even upon a definition of what feature engineering should and should not do. Not even talking about best practices in this area.

That is exactly the reason I decided to sum up in this blog shortly what my experience in Machine Learning suggests that Good Feature engineering should include:

1] Extending the set of features. Whenever you start to build some advanced analytical model, you only start with some group of raw parameters (= inputs). As the Features are ultimately what the food are for humans, your live is much lively if you have both enough and variety of the food. To achieve that in analytical efforts feature engineering need to address make sure that you “chew” the raw inputs and get some aggregated features as well. For example, you might be having history of customer purchases, but it is also important to calculate what is the lowest, usual and highest ever amount purchased. You also would find useful to know if the customer was buying certain part of the year more or what is the time since his/her last purchase. These are all aggregates. As you can probably smell from the examples, these are often standardized (like MIN, MAX, AVG, …) and foreseeable steps to take, so good Feature engineering should include automated generation of the aggregates. However, besides the obvious aggregates, one needs also to create new pieces of information that are not directly represented in raw inputs. When you are trying to predict interest in buying ice cream for particular person, it is certainly interesting to know for how long this person has been already buying ice cream products and what is total consumption of it. But if you need to predict consumption in certain short time frame, the ratio of ice creams per week would be more telling then the total consumption metrics.  That is when it comes to transformations, cross-pollination of features and completely new dimensions. Therefore, good feature engineering should extend the set of features BOTH aggregated features and newly derived features.

2] Sorting out the hopeless cases. Plenty is certainly good start, because from quantity you can breed quality. Being open in step one but also means you will have a lot of hopeless predictors that obviously (from its design) can have little impact on prediction. There are many reasons, why some features should be weeded out, but let me elaborate on one very common pitfall. Imagine you have 5 different sofas from spartan one to really fancy ones. Your model is to decide which if the sofa versions will be most appealing to the customer. If you have parameter that has only two values (think male and female) and these are evenly distributed in the sample, it is difficult for this parameter to classify customers into 5 groups with just 2 values (Yes, I said difficult, not impossible). The other way around, if you have colour scheme of the sofas coded into 10000 colour hue and you have about 5000 customers in the sample, some colour options will not have even single customer interested so their predictive relevance gets very questionable as well. Feaure engineering should give you plethora of inputs but should also spare you from pointless cases.

3] Prune to have more efficient model training and operation. Some methods down right struggle under burden of the too many input parameters (think Logistic regression). Yet other can swallow the load but take ages to recalculate or retrain the model. So, it is strongly encouraged to allow models more breathing space. In order to do so one has to care for two dimension: You should not allow clutter from too many distinct values in one variable. For instance modelling health of the person based on the age is probably sensible assumption, but nothing serious will come out, if you count the age meticulously in number of days person had when accident happened, rather then in years (or even decades of years). So good Feature engineering should do (automatic) binning of the values. Secondly the model should not struggle with maintaining the parameters that bring little additional value for the precision of the prediction and it is role of Feature engineering to prune the input set. Good model is not the most precise one, but the one with least parameters needed to achieve acceptable precision.  Because only then you achieve quality through quantity maxim mentioned before.

4] Signal, if I missed anything relevant. Lastly but not least, good feature engineering approach should be your buddy. Your guide through the process. It should be able to point you to information space areas that are under-served or completely missing from your input set. If you are modelling probability to buy certain product, you should have features on how pricy the product is included. If you don’t the feature engineering system should remind you of that. You might scratch your head how would the system know. Well there are 2 common approaches how to achieve this. You can either define permanent dictionary of all possible features and create some additional layers/tags (like pricing issues) on top of the raw list. Upon reading the set you are just about to consider the system would detect there is no representative from the pricing category. If you do not have the resources to maintain such a dictionary, then you can use historical knowledge as the challenger. Your system reviews similar models done (by your) organization and collects the areas of the features used in given models. Then it can verify, if you covered all historically covered areas also in your next model. Though this might sound like wacky idea, in larger teams or in organizations with “predictive model factories” having such a tool is close to must.

We went together through requirements that Good feature engineering should include. Now that you know, you can return to drawing board of your analytical project and think about how to achieve that. If you happen to work in Python you can get a bit more inspiration on this topic from my recent presentation at AI in Action Meetup from Berlin.

 

Berlin Meetup: Cool Feature engineering [my slides]

AI_in_ACTION

 

 

 

 

 

 

 

 

 

 

Dear fellows, 

on Wednesday 20th Feb 2019 I have been invited to speak at AI IN ACTION Meetup organized by ALLDUS in Berlin. The topic was one of my favorite issues, namely Feature engineering. This time we looked at the issue from the of How To Do Cool Feature Engineering In Python. If you had the chance to be in the Meetup crowd and failed to note down some figure, or if you are interested to read about what were the ideas discussed even thought you were not there, here you can find attached the presentation slides from that MeetUp.

slides >>> FILIP_VITEK_TeamViewer_Feature_Selection_IN_PYTHON

If you have any question of different opinion on some of the debated issues, do not hesitate to drop me a few lines on info@mocnedata.sk ;