Why should Data Scientists BE SCARED ABOUT AI coming as well?

Debate on if Artificial Intelligence (AI) will slash some jobs (or entire professions) transformed from obscure omen reading into mainstream heated issue. Truck drivers, financial intermediaries and few other professions are nervously looking ahead if they gonna join red-listed endangered species. They certainly have good reasons to be worried …

… but have about Data analytics? Are Data Scientists on the AI replacement to-do-list as well? What a silly question, isn’t it? Ultimately, the Data Scientist are the oil of the AI solutions. Thus, they will be the ones eating others’ jobs and they do not need to worry about their own future. Or should they maybe?

 How sure you should (not) be

Few months ago, the economic expert commentaries were still shy in indications that the world economy state might deteriorate in quarters to come.  Back then it was a message per week on this issue. In last 2 weeks the matter got visibly more dramatic, the black omens pop up now on, literally, daily basis. As we learned from past, every economic crisis usually slashes substantial number of jobs opportunities on the market. In this sense the crises to come will be no different. We learned not to worry about that too much, as new jobs are recreated back, when the economy walks from crises back to good times. The problem is that on this account the nearest crises will be different. It will kill some jobs that will be never be recreated again.

Almost any week you can see some profession striking for salary increases. As the economy is booming employees push to reap a slice from the victory cake. However, there are some jobs where salaries kept on rising without any push from labor unions. Data Science is one of that areas where annual income has been on crazy adventure to the north. Driven by over-demand on (weak) supply, companies were raising the pay-level to swerve people from competitions (or motivate more people to get re-qualify to Data Scientist). But no more. Data from US (largest free Data science labor market) indicate that the entry salary of the Data Scientist stagnated in 2017 and corrected few percentage points down in 2018. The reasons for that is the price of Data Science talent got over the level fitting business case for their possible impact in company (to justify their pay). Not many people realize that higher remuneration of these years are last dances before DJ calls off the party.

In both cases, the strike- or surge-driven salaries will make the AI replacement scenario more severe. When we come out of the crises, the employers will be facing the dilemma if to rehire stuff again or to replace some part of it by automation. The higher the annual salary level of employees, the easier the case for AI solutions to be cost-saver. Especially, the area of super expensive (and still scarce) Data scientists offers a lot of room for rethinking, as one year cost of Data Scientist in US is, literally, 7 digit figure.

The (seemingly strong) peace of mind of the Data community about their jobs security has roots in fatal attribution error. For most of the manual jobs the replacement will come with automation, presumably intelligent computers running on data. Therefore, data processing industry might be perceived as the lubricant of the whole automation process. Hence, the strong believe that data scientists are on the right side of this transformation river. While data might, indeed, be the oil of the AI transformation, it is ill conceived that humans necessarily need to take part in extracting it. If we stick to analogy, most of the things on oil rig is not human labor force but automation itself. Similarly, the repetitive and easy-to-automate jobs in Data analytics will not be run by humans. If you take two steps back and impartially review the work of most of recent data analysts their work is much more well-defined and repetitive than driving of the autonomous car. Therefore, data community should not wall into trap of illusions, that AI job revolution will take detour from their domain.

Time for panic?

The omens are out there, time for panic? Well, we as humans were having difficulties facing the previous industrial revolutions. And we will probably struggle this time around as well.  Almost any time disrupting technology arouse in past, first answer was to push back by, literally, beating the machines. However, there are ways how we can face the AI job hunt properly. I have been invited to speak about HOW TO SURVIVE the (first) AI ATTACK on DATA SCIENCE JOBS at the DataFestival 2019 in Munich this week. This is a short teaser about the topic, and I offer you exclusive sneak-peek into


here as you are precious members our TheMighyData.com community. As this topic hits all of us, any comments or views from you on this topic are highly welcomed in comments to this blog or at info@mocnedata.sk ; Enjoy the reading and see you on some other event soon.

Berlin Meetup: Cool Feature engineering [my slides]












Dear fellows, 

on Wednesday 20th Feb 2019 I have been invited to speak at AI IN ACTION Meetup organized by ALLDUS in Berlin. The topic was one of my favorite issues, namely Feature engineering. This time we looked at the issue from the of How To Do Cool Feature Engineering In Python. If you had the chance to be in the Meetup crowd and failed to note down some figure, or if you are interested to read about what were the ideas discussed even thought you were not there, here you can find attached the presentation slides from that MeetUp.

slides >>> FILIP_VITEK_TeamViewer_Feature_Selection_IN_PYTHON

If you have any question of different opinion on some of the debated issues, do not hesitate to drop me a few lines on info@mocnedata.sk ;

Are you “V” or “A”? What type of Data Scientist are you?

Are you more of ESTJ or rather INFP? If you ever went through MBTI personality test you probably know, what I am referring to. If you did not come across Myers-Briggs test, I strongly suggest you take the free test here, you will learn a lot about yourself (and better understand also main topic of this blog).

It has been a long time since first people found their jobs as data analysts (in 1980’s). Structured Query Language (or SQL as we know it now) gained in popularity heavily with advent of ANSI standardized rules. Most of the 1990 – 2010 period, you did not have to think about what sort of the analyst you are. The only data around (anyway) had been the structured data in databases or formatted data files. Processing data for e-shop was just about the same task as analyzing the utility data of water or gas consumption. That led also to golden era of data analyst: if you mastered SQL (and some statistics on top of that), you could easily swim between jobs or even industries.

Admittedly, I joined the labour force in this period as well, so when choosing the career path of data analyst, I did not crack my head too hard with thinking what kind of analyst I would like to be. After all, being analyst felt so generalist-ish. Almost like living in Ford-times and being perplexed by question, if you can drive car, truck or bus. Hey, man, what’s the difference?!

But After 2010 things got a bit different twist. As I regularly go to several expert conferences on analytics, I started to notice that strange feeling building somewhere in my background. I have seen colleague speakers to present their use cases of analytics and they felt somewhat different to what my team was mainly working on. I still understood the principles, I could replicate the projects, if asked to, but have realized that we deem those marginal and rarely pursue them in our company.

Few years down the road, and I was getting certain: Analytics is not a single continent, where you can move freely from one area to another. It is rather an archipelago of torn islands and tectonic forces push it pieces more and more apart. You still can see there is party on the other island, but you would need to get really wet to get there to take part in it. I started to investigate how many pieces has the analytics continent disintegrated into. Then, on one of my business flights, the article from Kai-Fu Lee in Nov 2018 Europe edition of Fortune nudged me. Wow, these are exactly the clusters, I have been thinking of! A bit different naming and some thoughts in another order, but it strongly converged with what I believed in.

Thus, I decided to wrap my older thoughts around ideas seeded by Kai-Fu Lee and came with VIBA, (most likely) the first data scientists’ persona model attempt, helping you to orientate where you stand on the scale of analytics. It is dwarfed by usefulness of MBTI (or similar) profiling, but gives you basic frame of where you are … And most importantly where you should go. It constitutes 4 different tribes of analytics, each living on its island, slightly moving apart from each other with every quarter of year. V-I-B-A namely stands for:

VIBA_VVisual & Voice analytics & Words analytics. Primarily attempting to detect patterns in images, video, voice and text. Building their models on unstructured data, that needs to be annotated to help the engine to learn the patterns.  Trying to make sense of perception-based data and to compete with (or even replace) the human senses.  Mostly working on different topologies of neural networks, tapping into tools like Keras, TensorFlow and NLP. If you work for this area, you encounter mostly objects, sound samples or text excerpts. Projects usually have longer leading session for exploring data and extensive training. Requirements of analytical progress are framed by Innovation teams or R&D departments.

Internet and Social media related. Striving to describe user flow or conversion in online apps, e-commerce or through interactions on social media. Analytics based mainly on streams of user generated data from clicks & purchases, often monitoring rather short time window of data.  VIBA_IRelying on Time series analysis, some forms of Machine learning or Graph database analytics. Supporting Online and Marketing departments of the larger companies or start-ups in these fields. To survive on this island, you need to be familiar with Google Analytics and/or APIs to major social media networks. Crunching and storing of data often happens in external cloud, analytical results are needed in (near) real-time basis. Projects have short span of time; their results often morph into permanent monitoring of the discovered patterns.

VIBA_BBehavioral Analytics. Serving requests coming from Traditional sales channels, CRM dept’s or Product management teams. Delivers predictive models about crucial business behaviour of the clients/users. Alternatively, clusters the portfolio of users or stipulates their life-time value. Inputs come in form of numerical or categorical features about clients, often still calculated from underlying relational databases. Features are rather aggregated, human suggested info about behaviour. Analytical effort on this island requires at least (6+) months of history to arrive at stable results. To operate in this area, you still need to master SQL and some of (maybe proprietary) analytical packages for regressions, decision forests or classification algorithms. Analysis barely happens in cloud, most of the data is stored on own premises. Projects happen in days, max weeks span, leading to regular, ongoing scoring of the user-base via developed models.

Automating & Autonomous. This type of analytics aims to generate rule engines and sophisticated models to drive decision in certain process, on request of Operations or R&D teams. Takes (streamed,) low level sensory data, coming from experiment or real-life readings. Tries to makeVIBA_A sense of them through multiple layers of own features. Using Deep Learning methods teaches machine to decide within stated error rate allowance. Data are either real-time of historical logs of some process or motions’ sequence. Uses opensource Neural network-based packages like TensorFlow, Keras or Caffe. Working in this area, you most likely come across human driven processes being handed over to machines. Analytics happen on/close to mechanical hardware or centrally in cloud for large span of parallel process iterations.

Do you already know, what is YOUR PRIMARY letter in analytics? To support you in this effort, I created a free VIBA test I suggest you take it right now, to assure (or surprise) yourself about it.

As mentioned before, V-I-B-A profiles are distinct, though you might be on edge of more than one “letters”. What is important to understand, that each VIBA letter has its own reason to exist. Similarly, to MBTI, there is no good or bad profile, just one more suited for some jobs as the another is for other job. You might be pleased with where VIBA profiles you or less so. As with personality test, you even might chose to change yourself from one VIBA island to another. Or stick harder to tribe you belong to now. In both cases it is essential for you to understand what constitutes to be a good “V”, ”I”, “B” or “A”. For that matter I am preparing (soon to come) a separate blog explaining what each letter should do improve their mastery and what skills one needs to acquire, if (s)he wants to hop to another VIBA island.

I am more than happy to receive on info@mocnedata.sk any feedback or learnings that you might arrive at while working with VIBA. If you want to share VIBA with your friends to compare if you are part of same tribe, feel free to pass on this blog post.  If you intend to use the methodology in your professional (or academic) work, please, pay tribute by properly citing authorship of this article.

5+1 interesting AI videos

After reviews of some AI related books caught quite a interest from TheMightyData.com  community, I decided it to elaborate on these points a bit and point you to yet another insights. To prevent turning you into bookworms, I decided to pick a bit more engaging media this time and I would like to inspire you to see some good AI related video speeches. Enjoy!

AI_videos_KAI_FULEEHow AI can save our humanity | Kai-Fu Lee

If you work in AI community name of Kai-Fu-Lee might not be unknown to you, as he stood behind boost of AI in Apple (and few other platforms). What is more, his view on AI is rather encouraging. In his life (and attached video) he strives to point our areas, where people can succeed even after AI fully kicks in. Is it hope worthy following? Well, that conclusion I leave up to you to make after seeing this great video.

A brain in a supercomputer | Henry MarkramAI_videos_HENRY_MARKRAM

There are at least 2 things fascinating about this video. The first one that it was filmed 8 years (!) ago. That depicts how far away the Oxford guys have been in the respective topic already back then. Even 10 years later down the road, some research teams are not at verge of the same understanding of matter. The second fascinating aspect of the video is that in less than 16 minutes, it walks the spectator from basics of Neuroscience up to expert insight on Neural Simulations. And that is certainly worth your 16 minutes.

AI_videos_ANDREW_ZEITLERThe Truth Behind Artificial Intelligence | Andrew Zeitler

Andrew was only (hard to believe) 17 years old when he delivered this speech. And if you pardon his young enthusiasm (here and there bordering with affect) he will introduce you to load of interesting thoughts on what are the next milestones for General AI, as well as how will General AI most likely behave, when it comes. For those educated in matter, some parts of the speech might be, I admit,  a bit sluggish. However, have You pumped into applause any hall of that size when you were 17?



Where AI is today and where it’s going. | Richard SocherAI_videos_RICHARD_SOCHER

Richard Socher is professor at Stanford Computer Science Department, mainly focusing on Deep Learning. And he is awesome in clearly and humorously explaining what progress there has been achieved in neural networks lately, as well as stating where AI still drags its feet (as you will see sometime literally). For those facing master or PhD thesis this might be good short list of  highly desired topics.  If you out of school and happen to already work in AI area, maybe an inspiration to educate more about some areas of AI, you did not cross upon yet. But even if you are lightly interested in AI topics, Richard is a great entertainer, so its worth seeing the video just for fun sake.


AI_videos_CURTISMusic and Art Generation using Machine Learning | Curtis Hawthorne

Creativity is one of the often cited to be “last fortress of human superiority”. Machines cannot be creative, so at least here we are safe to assume human dominance for longer time, right? Well, to assess to what extent that is really true, I suggest you see this video by Curtis Hawthorne, who reports on how far (they in Google) machines got so far.Next time you hear this soothing self-defense of  humans, you already will have an educated arguments to discuss.

The final bonus+1 track  id nobody else but  Nick Bolstrom, the author of the book SuperIntelligence. In his video he describes basic principles of his book. So if you have not read my review of this fascinating piece of reading take a try with the author himself to motivate you to read it.

Unconventional methods of Feature engineering

My dear fellows,

after settling down in Berlin Datascience community and visiting few interesting, local meet-ups, I agreed to contribute to the community knowledge sharing as well. Thus, if you have a while you can join Sebastian Meier [Technologiestiftung Berlin], Sandris Murins [Iconiq Lab] and me on June 14th 2018 18:00 at AI IN DATA SCIENCE – BERLIN at Wayfair DE Office, Köpenicker Str. 180, 10997 Berlin, Germany.

To ensure you that it really pays off to reserve some time for this meet-up let me make a sneak-preview of what you might get from this event:

BERLIN AI TALK - Filip Vitek

COMING_SOONIn my presentation, titled  “Feature engineering – your trump card in Machine Learning”  you will find out what are the 2 key reasons for feature engineering future. First explains how variables can become competitive advantage of your models. The second one goes even beyond that and hopefully opens your eyes why Feature engineering might be actually essential for Data Scientists’ future job security.

After getting warmed up on the overall role of features, we shall jump into how variables are actually designed/generated in real life. What are the most used  approaches to design the features? But more importantly, what are the common pitfalls that we all often fall into while designing our feature sets for Machine Learning predictions? Are there tricks to prevent us from failing on these ?


However, just to demote the traditional ways and solely point out their notorious downsides without naming the alternative(s) would be a bit unfair. Therefore, in second half of my talk I want to walk you through unconventional ways how to design your features. What is more, I will take help of 4 specific case studies where unconventional features were needed. I hope you might get inspired for your own work.

If you cannot make it (for whatever reason) to this talk and you are member of MighytDataCommunity, you will find the presentation slides here after event, in member-only-restricted blog post. If you are not member of the free MightyData Community yet, you can get your free membership to this community in less then 2 minutes, by registering here.

Looking forward to seeing you all at the AI Meet-up event !




Bread for 8 Facebook likes. You can already buy food for personal data.

Data are labelled to be commodity in corporate speak for already quite some time. Monetization (e.g. selling of) the data is not just trendy concept but turned to be essence of business for horde of smaller or medium companies. (let me name Market locator as one of the shiny examples) However, so far, the data trade has been a B2B business mainly. Ordinary person could not pay for his regular food supplies with his/her own data. That is over for good. Grocery store has been opened in Germany, where you pay for purchase solely by personal data.


Milk for 10 photos, bread for 8 likes

No, this is not any April’s fool or hidden prank. In German city of Hamburg Datenmarkt (= datamarket) has opened its first store, where you pay solely by personal data from Facebook. Toast bread will cost you 8 Facebook likes or you can take pack of filled dumplings for full disclosure of 5 different Facebook messages. As you would expect the Grocery to, Datenmarkt ran lately special weekly action on most of the fruit with price cut down to just 5 Facebook photos per kilo.


Payment is organized into similar process as you know from the credit card payment via POS terminal. But instead of entering the PIN into the POS terminal to authorize the transaction you have to log into your Facebook account and indicate which Likes given, Facebook photos posted or messages from FB messenger, you decided to pay with. The cashier receipt also showcases the data that you decided to sacrifice. These provided data are at full disposal of the Datenmarket investors to sell them (along your metadata) to any third party interested. By paying for the goods by your social data, you give full consent to Datenmarkt for this trading.


Insights we give away

What might look like elegant and definitive solution to homeless people starving has actually much higher ambitions. Authors of this project want to also raise the awareness, that in our age any data point has its value. Reminding us that the sheer volume of data we freely give away to Titans like Google, Facebook or Youtube, has actually real monetary value attached to them.

In more mature markets the echo of “what is the value of digital track of person” resonates more intensively with every quarter of they year. To put it differently, if we hand over data to “attention brokers” (as labelled them Columbia University, USA professor Tim Wu, studying the matter) what do we actually get in return for those data? We might come to the point where outstanding debt forces us not only to “smash our piggy bank” but to sacrifice our Facebook assets as well.  Most of the governments have set monetary equivalent to human life (often calculated as lifetime taxes contribution). But value of our digital life remained still to be added. Bizarre? Well, not any more in our times.

Food Yes, but not for the services

The opening of the Hamburg Datenmarket, , sadly, confirms the hypotheses researched by behavioural economists for more than 5 years. In monetization of their data people simply tend to stick to same principles as by sweets (or other tiny sins). Immediate gratification wins above promise of some future value. Most of the citizens comprehend that their personal data might have some value. If given the option of using the personal data to get cheaper or better services (e.g. insurance or next haircut for free) soothing up to 80% of them turn the offer down. However, if the same people were given choice of something tangible immediately, they were willing to trade their personal data for as trivial things as one pizza.


Now your thoughts go: “Wait, but the above mentioned digital giants are not offering anything tangible, all of the Google, Facebook and YouTube business is in form of the services, isn’t it? And you may be right but the point here is that all of them offer immediate gratification. (immediate likes on Facebook, hunger for quick search on Google or immediate fun with YouTube videos.) Quite a few people would see turning their digital data (deemed to be as material as air we breath in and out) for something consumable as “great deal”. If this trend confirms to be true, it is likely that service exchanging yogurt or ticket to concert for our personal data will be mushrooming around. If you have similar business idea in your head, you better act on it fast. Commission for themightydata.com portal for seeding this idea to your head remains voluntary for you 😊.

Hold your (data) hats …

Immediately after roll-out of personal data payment for existing goods, there will be someone willing to give you goods for personal data loan (or data mortgage). In other words, you will hand over your author rights in advance, even before the content is created. For those in financial difficulties this still might seem more reasonable option to give away than fridge or you shelter. But here, we are already crossing the bridge to digital slavery of a kind. Keep in mind that the most vulnerable would be the young lads, still having long (and thus precious) digital life ahead, but short on funds to (e.g. by new motorbike) in this life stage. The whole trend will be probably infused by some industries (like utilities) that long the data to be their saviour from recent misery of tiny margins.

Soon we shall expect that besides Bitcoin, Ethereum of Blackcoin fever, person would be able to mine (from own Facebook account) the currency like FB-photoCoin or FB-LikeCoin. Ans our age will gain yet another bizarre level on top. Just imagine the headlines “Celebrity donated to charity 4 full albums of her Facebook photos? But, nobody can eat from that? Or can he now?

You might be interested to read:

3 WAYS how to TEACH ROBOTS human skills

What PETER SAGAN learned about his sprint from Helicopter?

CHILDREN had it REALLY tough on Titanic