themightydata.com – THE MIGHTY DATA

MACHINE LEARNING goes physical [2021 trends]

We all got used to (not necessarily appreciative of) machine learning recommending products, sorting our social media feeds or matching us with potential dating partners. However, the common factor of all this AI use-cases has been that it tries to influence our lives through software interface. Experts believe 2021 will change this, with machine learning entering also our physical lives. Not convinced? Read on.

Most of the autonomous machines have been developed for close environments of the factories, warehouses or other sites that are insulated from real human humming. The reason why machines were not allowed “out among humans” is that constrained environments set the sensing and intelligence need bar just low enough for machines to actually exceed it. However, recent development in reinforcement learning, NLP, computer vision and auto-ML areas have opened room for constructing intelligent machines that manage to operate also in open society.

The main point of change is that above mentioned cocktail of AI approaches gives machines chance to learn beyond their original scope of responsibilities (may the need be and opportunity be granted by its creators). As result, we will meet robots moving along to fulfill their duties. The interesting part of this change is that it all will happen without the consent of individuals passing by. (Can you object to street sweeping machine, it if does not hurt anybody and does not destroy anything of the value?) One should also admit that highest goal of first ML moving objects “out in wild”, most likely, will be to avoid humans first place. So, machine learning will not only get physical, but also (somewhat) ignorant.

Späť na domovskú stránku

Why should Data Scientists BE SCARED ABOUT AI coming as well?

Debate on if Artificial Intelligence (AI) will slash some jobs (or entire professions) transformed from obscure omen reading into mainstream heated issue. Truck drivers, financial intermediaries and few other professions are nervously looking ahead if they gonna join red-listed endangered species. They certainly have good reasons to be worried …

… but have about Data analytics? Are Data Scientists on the AI replacement to-do-list as well? What a silly question, isn’t it? Ultimately, the Data Scientist are the oil of the AI solutions. Thus, they will be the ones eating others’ jobs and they do not need to worry about their own future. Or should they maybe?

How sure you should (not) be

Few months ago, the economic expert commentaries were still shy in indications that the world economy state might deteriorate in quarters to come. Back then it was a message per week on this issue. In last 2 weeks the matter got visibly more dramatic, the black omens pop up now on, literally, daily basis. As we learned from past, every economic crisis usually slashes substantial number of jobs opportunities on the market. In this sense the crises to come will be no different. We learned not to worry about that too much, as new jobs are recreated back, when the economy walks from crises back to good times. The problem is that on this account the nearest crises will be different. It will kill some jobs that will be never be recreated again.

Almost any week you can see some profession striking for salary increases. As the economy is booming employees push to reap a slice from the victory cake. However, there are some jobs where salaries kept on rising without any push from labor unions. Data Science is one of that areas where annual income has been on crazy adventure to the north. Driven by over-demand on (weak) supply, companies were raising the pay-level to swerve people from competitions (or motivate more people to get re-qualify to Data Scientist). But no more. Data from US (largest free Data science labor market) indicate that the entry salary of the Data Scientist stagnated in 2017 and corrected few percentage points down in 2018. The reasons for that is the price of Data Science talent got over the level fitting business case for their possible impact in company (to justify their pay). Not many people realize that higher remuneration of these years are last dances before DJ calls off the party.

In both cases, the strike- or surge-driven salaries will make the AI replacement scenario more severe. When we come out of the crises, the employers will be facing the dilemma if to rehire stuff again or to replace some part of it by automation. The higher the annual salary level of employees, the easier the case for AI solutions to be cost-saver. Especially, the area of super expensive (and still scarce) Data scientists offers a lot of room for rethinking, as one year cost of Data Scientist in US is, literally, 7 digit figure.

The (seemingly strong) peace of mind of the Data community about their jobs security has roots in fatal attribution error. For most of the manual jobs the replacement will come with automation, presumably intelligent computers running on data. Therefore, data processing industry might be perceived as the lubricant of the whole automation process. Hence, the strong believe that data scientists are on the right side of this transformation river. While data might, indeed, be the oil of the AI transformation, it is ill conceived that humans necessarily need to take part in extracting it. If we stick to analogy, most of the things on oil rig is not human labor force but automation itself. Similarly, the repetitive and easy-to-automate jobs in Data analytics will not be run by humans. If you take two steps back and impartially review the work of most of recent data analysts their work is much more well-defined and repetitive than driving of the autonomous car. Therefore, data community should not wall into trap of illusions, that AI job revolution will take detour from their domain.

Time for panic?

The omens are out there, time for panic? Well, we as humans were having difficulties facing the previous industrial revolutions. And we will probably struggle this time around as well. Almost any time disrupting technology arouse in past, first answer was to push back by, literally, beating the machines. However, there are ways how we can face the AI job hunt properly. I have been invited to speak about HOW TO SURVIVE the (first) AI ATTACK on DATA SCIENCE JOBS at the DataFestival 2019 in Munich this week. This is a short teaser about the topic, and I offer you exclusive sneak-peek into

PRESENTATION >>> FILIP_VITEK_TeamViewer_SURVIVAL_TICKET

here as you are precious members our TheMighyData.com community. As this topic hits all of us, any comments or views from you on this topic are highly welcomed in comments to this blog or at info@mocnedata.sk ; Enjoy the reading and see you on some other event soon.

Späť na domovskú stránku

Berlin Meetup: Cool Feature engineering [my slides]

Dear fellows,

on Wednesday 20th Feb 2019 I have been invited to speak at AI IN ACTION Meetup organized by ALLDUS in Berlin. The topic was one of my favorite issues, namely Feature engineering. This time we looked at the issue from the of How To Do Cool Feature Engineering In Python. If you had the chance to be in the Meetup crowd and failed to note down some figure, or if you are interested to read about what were the ideas discussed even thought you were not there, here you can find attached the presentation slides from that MeetUp.

slides >>> FILIP_VITEK_TeamViewer_Feature_Selection_IN_PYTHON

If you have any question of different opinion on some of the debated issues, do not hesitate to drop me a few lines on info@mocnedata.sk ;

Späť na domovskú stránku

FREE TEST to detect your VIBA profile

You might have heard about VIBA, the attempt to describe different Data Scientist profiles. You maybe even heard what V-I-B-A categories stand for and are wondering what profile is actually your one? Then this short test is exactly aimed at you. Via answering below listed questions, you can gain indication on what is most likely your VIBA “letter”

Instructions to FREE VIBA test

Please answer all of the 8 questions below. For each question pick one answer only. If you feel more answers might apply to you in given question, try to pick one, which is the closest to your situation at the moment. For each answer you will be assigned certain number of points stated in brackets. After answering all the questions, please sum the points earned for answers to these 8 questions. Total score achieved will guide you to what test believes is your existing Analytical letter.

1.What is the prevailing data types that are inputs for your Data Science work?

a] unstructured data OR sound/video data [12 points]

b] stream or batches of transactional data [1 p]

c] structured data of longer time periods with many aggregated Features or Proxies [4 p]

d] sensory data, readings from IoT devices or physical measurement [8p]

2. Which part of the company is most often requesting (or being user of) data science outputs that you create?

a] Online commerce, Social media or PR [1 p]

b] Innovation, Research & Development teams [12 points]

c] Traditional (human operated) Sales, Strategy or Business development [4 p]

d] Operations, Finance or IT dept. [8p]

3. How does the training of the new data science models you (and your team) generate happens most of the times?

a] through annotated examples, most likely generated by humans [12 points]

b] through data from experiments, observations or simulations of the reality [8p]

c] using short time window samples of long going process [1 p]

d] based on Features that are human selected aggregates or proxies of raw data [4 p]

4. What Data Science methods are dominantly used in Data Science tasks your team is working on? If more of the below listed are used, which of those would you keep if allowed to have only one category available?

a] Advanced Machine Learning, Random Forests, Regressions or simple FF neural networks [4 p]

b] Deep learning (often CNN, DCN, KN, LSTM…) [12 points]

c] Time series, simple ML classifiers or Graph analytical methods [1 p]

d] Rule engines generators, Genetic algorithms, or more advanced typologies of Neural Networks [8p]

5. What tools/platforms do you CERTAINLY have to have AVAILABLE for your work?

a] Keras, TensorFlow or similar, NLP or other text mining tools [10 points]

b] Google Analytics or APIs to Social media [1 p]

c] SQL and analytical packages or Opensource ML platforms. [4 p]

6. How does a typical task that you are asked to deliver in your team look like?

a] Describe and analyze user flow , conversion rates or user usage specifics [1 p]

b] Determine probability to do something OR describe segments of the users [4 p]

c] Teach systems to decide or replace human role in processes [8p]

d] Detect similarities or patterns in objects or texts [12p]

7. When doing Data Science in your team, what is the most used domain of the data?

a] users/clients preferences or online data [1 p]

b] Physical (2D or 3D) objects, art or result of some creative work [12p]

c] Purchase data, products or off-line customer data [4 p]

d] Processes and their stages, Motion or Logistics of the things [8p]

8. How long the necessary/typical time window of input data that you need to train your models or prepare your Data Science deliverable?

a] Time does not play role. Many repeatings/variations of the same object(s). [12p]

b] Short time windows, usually below 3 months [1 p]

c] (Near) Real time based, often of many different types or sources from same time window [8p]

d] Longer time windows, usually 6M+ of the analyzed matter/event [4 p]

So what VIBA letter you are?

If you scored 0 – 23 points your most likely letter is I, the Internet and Social media related tribe of Data Science. The closer to 8 points you are the more evident this is. The closer you came to 23 points, the more inclinations or overlap with other letters there might be.

If you scored 24 – 49 points your most likely letter is B, the Behavioral Analytics group of Data Science. Staying on lower bound of the interval indicates that you also probably asked to analyze online behavior of the users. The closer you came to 49 upper limit you are, the more your work might be used also to improve decision making of the processes of automate things.

If you scored 50 – 72 points your domain in Analytics is most likely A, the Automating & Autonomous space of Data Science. If you ended up just few points above the 50 lower limit, we would guess that your automation is still in area with strong Human aspect. Staying closer to upper limit of 72 points means that autonomous aspects are paramount and your models probably also rely on reality measuring or sensory inputs.

If you scored 73 – 94 points your are living your analytical life as V, in the Visual & Voice analytics & Words analytics arena of Data Science. Scores in 70’s range would indicate that your work is somehow useful or needed for the decision machines or automating things. Scores on the higher end of the interval signal pure sensory orientation, most likely living off Deep learning algorithms.

Happy about your VIBA letter? Surprised? Maybe you want to read more/again about your profile, now that you know which you are?

Or soon there will be coming) next blog discussing what should good analyst of your type do and know or how to move to make transition to VIBA letter .

Späť na domovskú stránku

Bread for 8 Facebook likes. You can already buy food for personal data.

Data are labelled to be commodity in corporate speak for already quite some time. Monetization (e.g. selling of) the data is not just trendy concept but turned to be essence of business for horde of smaller or medium companies. (let me name Market locator as one of the shiny examples) However, so far, the data trade has been a B2B business mainly. Ordinary person could not pay for his regular food supplies with his/her own data. That is over for good. Grocery store has been opened in Germany, where you pay for purchase solely by personal data.

Milk for 10 photos, bread for 8 likes

No, this is not any April’s fool or hidden prank. In German city of Hamburg Datenmarkt (= datamarket) has opened its first store, where you pay solely by personal data from Facebook. Toast bread will cost you 8 Facebook likes or you can take pack of filled dumplings for full disclosure of 5 different Facebook messages. As you would expect the Grocery to, Datenmarkt ran lately special weekly action on most of the fruit with price cut down to just 5 Facebook photos per kilo.

Payment is organized into similar process as you know from the credit card payment via POS terminal. But instead of entering the PIN into the POS terminal to authorize the transaction you have to log into your Facebook account and indicate which Likes given, Facebook photos posted or messages from FB messenger, you decided to pay with. The cashier receipt also showcases the data that you decided to sacrifice. These provided data are at full disposal of the Datenmarket investors to sell them (along your metadata) to any third party interested. By paying for the goods by your social data, you give full consent to Datenmarkt for this trading.

Insights we give away

What might look like elegant and definitive solution to homeless people starving has actually much higher ambitions. Authors of this project want to also raise the awareness, that in our age any data point has its value. Reminding us that the sheer volume of data we freely give away to Titans like Google, Facebook or Youtube, has actually real monetary value attached to them.

In more mature markets the echo of “what is the value of digital track of person” resonates more intensively with every quarter of they year. To put it differently, if we hand over data to “attention brokers” (as labelled them Columbia University, USA professor Tim Wu, studying the matter) what do we actually get in return for those data? We might come to the point where outstanding debt forces us not only to “smash our piggy bank” but to sacrifice our Facebook assets as well. Most of the governments have set monetary equivalent to human life (often calculated as lifetime taxes contribution). But value of our digital life remained still to be added. Bizarre? Well, not any more in our times.

Food Yes, but not for the services

The opening of the Hamburg Datenmarket, , sadly, confirms the hypotheses researched by behavioural economists for more than 5 years. In monetization of their data people simply tend to stick to same principles as by sweets (or other tiny sins). Immediate gratification wins above promise of some future value. Most of the citizens comprehend that their personal data might have some value. If given the option of using the personal data to get cheaper or better services (e.g. insurance or next haircut for free) soothing up to 80% of them turn the offer down. However, if the same people were given choice of something tangible immediately, they were willing to trade their personal data for as trivial things as one pizza.

Now your thoughts go: “Wait, but the above mentioned digital giants are not offering anything tangible, all of the Google, Facebook and YouTube business is in form of the services, isn’t it? And you may be right but the point here is that all of them offer immediate gratification. (immediate likes on Facebook, hunger for quick search on Google or immediate fun with YouTube videos.) Quite a few people would see turning their digital data (deemed to be as material as air we breath in and out) for something consumable as “great deal”. If this trend confirms to be true, it is likely that service exchanging yogurt or ticket to concert for our personal data will be mushrooming around. If you have similar business idea in your head, you better act on it fast. Commission for themightydata.com portal for seeding this idea to your head remains voluntary for you 😊.

Hold your (data) hats …

Immediately after roll-out of personal data payment for existing goods, there will be someone willing to give you goods for personal data loan (or data mortgage). In other words, you will hand over your author rights in advance, even before the content is created. For those in financial difficulties this still might seem more reasonable option to give away than fridge or you shelter. But here, we are already crossing the bridge to digital slavery of a kind. Keep in mind that the most vulnerable would be the young lads, still having long (and thus precious) digital life ahead, but short on funds to (e.g. by new motorbike) in this life stage. The whole trend will be probably infused by some industries (like utilities) that long the data to be their saviour from recent misery of tiny margins.

Soon we shall expect that besides Bitcoin, Ethereum of Blackcoin fever, person would be able to mine (from own Facebook account) the currency like FB-photoCoin or FB-LikeCoin. Ans our age will gain yet another bizarre level on top. Just imagine the headlines “Celebrity donated to charity 4 full albums of her Facebook photos? But, nobody can eat from that? Or can he now?

You might be interested to read:

3 WAYS how to TEACH ROBOTS human skills

What PETER SAGAN learned about his sprint from Helicopter?

CHILDREN had it REALLY tough on Titanic

Späť na domovskú stránku

What ALTERNATIVE do we have to AI?

Artificial intelligence (AI) and its applications are mushrooming in more and more areas of our lives. To read economic magazine without stumbling across AI article becomes almost impossible. As my Grandma used to say: “I open the fridge and I fear AI jumping out if. (which, by the way, happens too). If you possess a bit of critical thinking, you may be revisiting the thought “But why, for God sake?!”

You may have found out yourself that professional blindness is a strong phenomenon. It has not spared myself either. On daily basis, it my job to think how to improve machine learning and predictive analytics, so that it brings value to our company. Therefore the “poke’’ on why are we embracing the AI so heavily, has come from unexpected source. Our talk has been flowing continuously, when striking and well-aimed question has aroused: “Why, on earth, do we humans invest so much money to create something that can dwarf us? Where would human race progress, if all those billions were actually aimed at developing the human intellect?” I took a breath before launching the tirade on clear AI benefits and … then I swallowed the sentence. There is a point in this thought. Is there actually alternative to AI?

It was beautiful autumn afternoon and we have been walking our dogs in one of the parks of Bratislava, Slovakia. My wife is respected, seasoned soft skills trainer for large global company. I have been explaining to my wife the fascinating essence of AlphaGo victory over the GO game world champion. She silently reflected on the story, then stopped for a while and turned to me: “Why, on earth, do we humans invest so much money to create something that can dwarf us? Where would human race progress, if all those billions were actually aimed at developing the human intellect?” As any husband, unthinkable ideas are something, that one get used to from our beloved ones. But just before I spit the answer, I had to admit to myself I never thought this way. Existence and development of AI seemed to me natural same way as lumberjack probably does not think about paper recycling.

But think provocative idea kept on itching me. Consequently, I started to think about what alternative to AI we REALLY have? Why are we so eager to advance with AI anyway? So, let’s do a deep-dive together:

The original motivation

At the very beginning of the computers (and their usefulness) was the desire to count things, where humans can do mistakes. Mainly the complex calculations with high precisions (too many digits) are the primary targets. While, back then the only alternative had been paper and column of figures to add together. To be fair, it is difficult to object to his motivation, as we all know, that humans are indeed error prone. But to keep the dispute entirely fair, one should add “humans without any training are error prone”. One team taught me this lesson during my time working for Postal Bank, where I had the privilege to lead Client and processing centre. The group of a dozen middle-aged and senior ladies faced every single day the staggering task to retype 10 000 hand-written, paper money orders per employee into transaction system. Even though that this was very monotonous (maybe even dull) task – imagine you retype digits from paper to screen 8 hours a day, every day – their error rate was hard-to-believe 1 in 100 000. If you apply 4-eyes check to this process, you are at level of 1 in 10 million. That beats most of the computer managed processes I have ever seen. So, the error rate is certainly not the ultimate excuse for AI.

Paradoxically, the second motivation for strong boost in computer intelligence was the effort to hide something from the others. Encryption and breaking the cyphers were strong pull for computer science. The entire storing of Turing is great testimony to it. What we should note here, that this was also first attempt to use AI against the human. In recent security situation, it is difficult to argue against privacy of the communication or interception risk. More idealistic souls would probably stand to “More human trust would bread less encryption needs.” But obviously this is more difficult issue, as already decades a go we needed sealed envelopes to send even the banal news from our lives. The encryption need springs from ultimate human longing to manipulate other. And there the humanity is not keen (for centuries) to give up.

That brings us to the third motivation for human race to massively introduce computer science to their lives: Comfort. Similarly, as other machines also the computers tuned-in to human comfort seeking passion. Obviously, human can also try solve numeric optimization by plugging in 20 000 times the values to set of equations. But why would the mortal to bother with this, if the computer can dully take this task for him. The sad part of the story here is the fact that computers were not cleverer at solving the equation, it was pure brute force. (it was the speed by which the computers could enter too many wrong solutions before stumbling across a correct one). While I have met several genius humans, who were talented to snap computations in their heads. The younger crowd maybe puzzled by info that there is even “paper-based” way to calculate any square root, so one can really do it without reaching for calculator.

True, even with supreme training, none of the humans would match the millions per second calculations done by modern computer processing units (CPUs). But to stay on a fair ground, even the most advanced AI has come to the point where more computational power is not gained mainly by getting better and better CPUs, but rather tapping into parallelization. Thus, if you realize that only fraction of percent of humans are earning their bread with daily calculating something, it is safe to say we have not really tried the full power of human parallel power before (somewhat eagerly) embracing the artificial intelligence. In this sense the cheeky question of my wife holds: Why humans decided to build better and better electron box rather than try to improve the intelligence of our fellow humans?

Last nail to coffin

All of the above trends would be still revertible if not for one more important human decision made. For decades the engineers and scientists were putting their brainpower to question of how to replace as much of the human labour by robots as possible. At the peak of the 90s their succeeded in this effort and they – well – start to look left-and-right for next mission to conquer. Research centres all around the world jumped on simulating human senses and human line of thought.

However, to succeed in this, first the machines needed to copy our cognitive functions. So, step by step, we taught computers to detect the voices and images same way we, humans, do that. Once they mastered that, we instructed them how to realize the basic tasks (like continuously check on temperature in the room and if below threshold, put the heating on). But as we do not have not entire understanding of our thinking processes, soon we ran into the problem that more complex tasks cannot be rewritten into chain of what-if instructions. So, we let the computers try (and fail) as many time as the machine (repeatedly) mimicked the desired result. Thus, machines derived rules for things we ourselves were unsure of (and thus we struggle to validate as general principles). Areas, where humans were not able to generate reasonable number of examples for machines’ try-and-fail learning, are still unmastered by computers.

Where are we (doomed) to go

From above stated facts, it is clear that AI rise was not inevitable consequence of the history and that this train has been set on its track (and cleared to go) by us, mortals. First impatience to our own mistakes, later suspicion and finally laziness to repeated mental tasks. Without being pathetic, I am not that sure we always used all the options to “match” the AI.

Now we have progressed even one step further and launched the development of universal artificial intelligence (UAI) that can zoom out to think if it thinks properly. There are already several ways, how to teach computers to copy human approaches. Yes, in some areas, it is still not clear, how to teach computers to beat us in that domain. But all it takes is just few research projects that compile large enough set ( equals millions) of of the annotated examples and machines will tediously crack their way through these obstacles. If you look from any angle, “the genie is out of the bottle” and the machines’ feat to rival the humans is imminent. So what options are left with?

The most naïve option would be to try to fully stop. There are still some areas that machines cannot do and if we, humans, do not show them what is right and wrong in that areas, they will not master this skill (e.g. human empathy). But reading this line again after myself sounds equally silly as the rallies to damage machines taking the human labour at outbreak of industrial revolution. Therefore, I deem this scenario to be highly unlikely.

The second option is to use artificial intelligence to reversely strengthen the human development. If you remember 1996, when Gary Kasparov was paying the chess against the Big Blue computer, he lost his first game of the best-of-five match. Gary, back then, did not have any human rival, who would bring him to the rim of losing the game. However, under extra stimuli of the super powerful computer opponent, he managed to top his skill yet even more and he defeated the computer in overall match score. Therefore, if we use AI to stimulate human development, in some areas we could (at least temporarily) reclaim the “throne” of intelligence. However, that would require to add extra layer of AI that explains to human the principles it has “learned itself”.

The third alternative to AI dominating the humans is to merge AI into human intelligence. In other words, to find a biologically sustainable way for our brain (maybe via implants) to be extended for additional layer of all that AI has learned. With substantial risk of overgeneralization this can be parallel to you extending the camera memory by plug-in memory card. This way, any AI advancement would immediately be translated into human potential as well. The principle certainly carries some moral risks as well (who decides on what will be programmed into heads of the others), but also prevents the doom scenario of machines taking the rule over the humans.

One way or the other …

Despite the encouragement from previous paragraphs, I am afraid that I do not foresee the happy-end scenario in this AI story. To tip the hand of intelligence dominance scale back a bit towards human race, we would need 2 essential ingredients: A] source of information that is independent of computer infrastructure (otherwise, later in time we would be not able to tell the difference between info computer generated and the reality just chronicled into computer). And add to that also B] quick way of replicating of all knowledge among human-fellows, even across the generations. Contrary to machines, that need to learn all our knowledge just once, we, humans, need to relearn the knowledge with every new generation again. In both required dimensions there exist the theoretical approaches, but their implementation is not to be seen anywhere near ahead in forthcoming years to come.

Therefore, as the AI development is at its full speed, we should be ready to face singularity scenario (machines surpassing our entire intelligence) as the most probable scenario. Including all the social implications arising from this. (mainly wave of unemployment and career progression crises that we aim to discuss in more detail on this blog soon). Because, all in all, with AI taking over there is much more swallow than our intelligence pride.

Späť na domovskú stránku