Uncategorized – Page 10 – THE MIGHTY DATA

What ALTERNATIVE do we have to AI?

Artificial intelligence (AI) and its applications are mushrooming in more and more areas of our lives. To read economic magazine without stumbling across AI article becomes almost impossible. As my Grandma used to say: “I open the fridge and I fear AI jumping out if. (which, by the way, happens too). If you possess a bit of critical thinking, you may be revisiting the thought “But why, for God sake?!”

You may have found out yourself that professional blindness is a strong phenomenon. It has not spared myself either. On daily basis, it my job to think how to improve machine learning and predictive analytics, so that it brings value to our company. Therefore the “poke’’ on why are we embracing the AI so heavily, has come from unexpected source. Our talk has been flowing continuously, when striking and well-aimed question has aroused: “Why, on earth, do we humans invest so much money to create something that can dwarf us? Where would human race progress, if all those billions were actually aimed at developing the human intellect?” I took a breath before launching the tirade on clear AI benefits and … then I swallowed the sentence. There is a point in this thought. Is there actually alternative to AI?

It was beautiful autumn afternoon and we have been walking our dogs in one of the parks of Bratislava, Slovakia. My wife is respected, seasoned soft skills trainer for large global company. I have been explaining to my wife the fascinating essence of AlphaGo victory over the GO game world champion. She silently reflected on the story, then stopped for a while and turned to me: “Why, on earth, do we humans invest so much money to create something that can dwarf us? Where would human race progress, if all those billions were actually aimed at developing the human intellect?” As any husband, unthinkable ideas are something, that one get used to from our beloved ones. But just before I spit the answer, I had to admit to myself I never thought this way. Existence and development of AI seemed to me natural same way as lumberjack probably does not think about paper recycling.

But think provocative idea kept on itching me. Consequently, I started to think about what alternative to AI we REALLY have? Why are we so eager to advance with AI anyway? So, let’s do a deep-dive together:

The original motivation

At the very beginning of the computers (and their usefulness) was the desire to count things, where humans can do mistakes. Mainly the complex calculations with high precisions (too many digits) are the primary targets. While, back then the only alternative had been paper and column of figures to add together. To be fair, it is difficult to object to his motivation, as we all know, that humans are indeed error prone. But to keep the dispute entirely fair, one should add “humans without any training are error prone”. One team taught me this lesson during my time working for Postal Bank, where I had the privilege to lead Client and processing centre. The group of a dozen middle-aged and senior ladies faced every single day the staggering task to retype 10 000 hand-written, paper money orders per employee into transaction system. Even though that this was very monotonous (maybe even dull) task – imagine you retype digits from paper to screen 8 hours a day, every day – their error rate was hard-to-believe 1 in 100 000. If you apply 4-eyes check to this process, you are at level of 1 in 10 million. That beats most of the computer managed processes I have ever seen. So, the error rate is certainly not the ultimate excuse for AI.

Paradoxically, the second motivation for strong boost in computer intelligence was the effort to hide something from the others. Encryption and breaking the cyphers were strong pull for computer science. The entire storing of Turing is great testimony to it. What we should note here, that this was also first attempt to use AI against the human. In recent security situation, it is difficult to argue against privacy of the communication or interception risk. More idealistic souls would probably stand to “More human trust would bread less encryption needs.” But obviously this is more difficult issue, as already decades a go we needed sealed envelopes to send even the banal news from our lives. The encryption need springs from ultimate human longing to manipulate other. And there the humanity is not keen (for centuries) to give up.

That brings us to the third motivation for human race to massively introduce computer science to their lives: Comfort. Similarly, as other machines also the computers tuned-in to human comfort seeking passion. Obviously, human can also try solve numeric optimization by plugging in 20 000 times the values to set of equations. But why would the mortal to bother with this, if the computer can dully take this task for him. The sad part of the story here is the fact that computers were not cleverer at solving the equation, it was pure brute force. (it was the speed by which the computers could enter too many wrong solutions before stumbling across a correct one). While I have met several genius humans, who were talented to snap computations in their heads. The younger crowd maybe puzzled by info that there is even “paper-based” way to calculate any square root, so one can really do it without reaching for calculator.

True, even with supreme training, none of the humans would match the millions per second calculations done by modern computer processing units (CPUs). But to stay on a fair ground, even the most advanced AI has come to the point where more computational power is not gained mainly by getting better and better CPUs, but rather tapping into parallelization. Thus, if you realize that only fraction of percent of humans are earning their bread with daily calculating something, it is safe to say we have not really tried the full power of human parallel power before (somewhat eagerly) embracing the artificial intelligence. In this sense the cheeky question of my wife holds: Why humans decided to build better and better electron box rather than try to improve the intelligence of our fellow humans?

Last nail to coffin

All of the above trends would be still revertible if not for one more important human decision made. For decades the engineers and scientists were putting their brainpower to question of how to replace as much of the human labour by robots as possible. At the peak of the 90s their succeeded in this effort and they – well – start to look left-and-right for next mission to conquer. Research centres all around the world jumped on simulating human senses and human line of thought.

However, to succeed in this, first the machines needed to copy our cognitive functions. So, step by step, we taught computers to detect the voices and images same way we, humans, do that. Once they mastered that, we instructed them how to realize the basic tasks (like continuously check on temperature in the room and if below threshold, put the heating on). But as we do not have not entire understanding of our thinking processes, soon we ran into the problem that more complex tasks cannot be rewritten into chain of what-if instructions. So, we let the computers try (and fail) as many time as the machine (repeatedly) mimicked the desired result. Thus, machines derived rules for things we ourselves were unsure of (and thus we struggle to validate as general principles). Areas, where humans were not able to generate reasonable number of examples for machines’ try-and-fail learning, are still unmastered by computers.

Where are we (doomed) to go

From above stated facts, it is clear that AI rise was not inevitable consequence of the history and that this train has been set on its track (and cleared to go) by us, mortals. First impatience to our own mistakes, later suspicion and finally laziness to repeated mental tasks. Without being pathetic, I am not that sure we always used all the options to “match” the AI.

Now we have progressed even one step further and launched the development of universal artificial intelligence (UAI) that can zoom out to think if it thinks properly. There are already several ways, how to teach computers to copy human approaches. Yes, in some areas, it is still not clear, how to teach computers to beat us in that domain. But all it takes is just few research projects that compile large enough set ( equals millions) of of the annotated examples and machines will tediously crack their way through these obstacles. If you look from any angle, “the genie is out of the bottle” and the machines’ feat to rival the humans is imminent. So what options are left with?

The most naïve option would be to try to fully stop. There are still some areas that machines cannot do and if we, humans, do not show them what is right and wrong in that areas, they will not master this skill (e.g. human empathy). But reading this line again after myself sounds equally silly as the rallies to damage machines taking the human labour at outbreak of industrial revolution. Therefore, I deem this scenario to be highly unlikely.

The second option is to use artificial intelligence to reversely strengthen the human development. If you remember 1996, when Gary Kasparov was paying the chess against the Big Blue computer, he lost his first game of the best-of-five match. Gary, back then, did not have any human rival, who would bring him to the rim of losing the game. However, under extra stimuli of the super powerful computer opponent, he managed to top his skill yet even more and he defeated the computer in overall match score. Therefore, if we use AI to stimulate human development, in some areas we could (at least temporarily) reclaim the “throne” of intelligence. However, that would require to add extra layer of AI that explains to human the principles it has “learned itself”.

The third alternative to AI dominating the humans is to merge AI into human intelligence. In other words, to find a biologically sustainable way for our brain (maybe via implants) to be extended for additional layer of all that AI has learned. With substantial risk of overgeneralization this can be parallel to you extending the camera memory by plug-in memory card. This way, any AI advancement would immediately be translated into human potential as well. The principle certainly carries some moral risks as well (who decides on what will be programmed into heads of the others), but also prevents the doom scenario of machines taking the rule over the humans.

One way or the other …

Despite the encouragement from previous paragraphs, I am afraid that I do not foresee the happy-end scenario in this AI story. To tip the hand of intelligence dominance scale back a bit towards human race, we would need 2 essential ingredients: A] source of information that is independent of computer infrastructure (otherwise, later in time we would be not able to tell the difference between info computer generated and the reality just chronicled into computer). And add to that also B] quick way of replicating of all knowledge among human-fellows, even across the generations. Contrary to machines, that need to learn all our knowledge just once, we, humans, need to relearn the knowledge with every new generation again. In both required dimensions there exist the theoretical approaches, but their implementation is not to be seen anywhere near ahead in forthcoming years to come.

Therefore, as the AI development is at its full speed, we should be ready to face singularity scenario (machines surpassing our entire intelligence) as the most probable scenario. Including all the social implications arising from this. (mainly wave of unemployment and career progression crises that we aim to discuss in more detail on this blog soon). Because, all in all, with AI taking over there is much more swallow than our intelligence pride.

Späť na domovskú stránku

Elections are like Hadoop

In my mother tongue, when one does not know/want to share reason for his actions he says “Lebo medved” (= due to the bear). That brings the actual bears off balance, because they also do not have a clue why you would do that. Going through several elections in past few months, I realize that elections are “Lebo Hadoop” = a bit like Hadoop. Don’t you get it why? Well, then let both you and yellow elephants understand why:

Hadoop and other technologies of distributed computing and data analytics are still less common in our society than the progress would expect them to be. Upon explaining the concept to managers I quite commonly have to reason that implementing Hadoop (which they heard of at any business conference lately) is not switching Fords for Volkswagens or like migration from MS SQL to Oracle. To really embrace the distributed technologies is rather massive change to company business processes. As this stage I usually run short of managerial understanding and start to hear statements like “After all, Hadoop is piece of software, so it can be reinstalled to take over the actual SW, isn’t it?” You roll your eyes and start to think if it is actually worth investing further clarification effort, when the chances to succeed are so small. And exactly for these situations comes handy following explanation:

In most of the European countries elections are organized through means of voting districts. Every voter is registered to exactly one polling station, usually the closest to his/her living address. That helps to make sure that the voter does not have to travel too much to polling station and thus allowing to minimize the time needed for single voter to cast his/her vote. Immediately as the election time is over, the local polling station (through hands of the local electoral committee) starts to count the votes casted for different candidates. After they finish they work, they create election protocol summarizing all achieved results of vote in given polling station. This protocol than travels to county electoral committee to sum up all of their polling stations and then through means of regional and national electoral committee to sum the entire election result. We feel this process to be, somewhat, natural, as we got used to that over the decades. But let’s imagine it would be all done differently …

Imagine the whole country would need to vote at the very same place. That would mean some people will travel very long distance (what certainly would influence participation in elections). What is more, imagine how big would have to the polling station to be, to actually host that millions of voters. And how long the queue would be to actually cast the vote. Polling station that big would have to be a dedicated building built just for that purpose (that is of no other use outside the elections). This way organized elections would be only possible with flawless voter register. Any small discrepancies in recognizing or registering the voter would clutter the queue waiting to give their vote. This election would be ultimate hell for local electoral committee. They would need, literally, weeks if not months before single committee counts and verifies several millions of voting ballots. And imagine that some of the electoral committee members get ill or go on strike. As there is only one electoral committee in whole country there is no replacement at hand. By this moment, you are probably getting irritated and think “Who on Earth would go such a down right insanity?!“

Well, you might be surprised to find out this is exactly the way how our legacy relational SQL databases operate. They try to store all data into the same table(s). As a result, the computer that can handle that much information, has to possess immense capacity that is both (as single polling station in country) too robust and too expensive. What is more, these cannot be the regular PCs honed by normal users, these have to be dedicated machines (usually not utilized outside of database tasks). All data points have to be written sequentially what automatically creates long queues if data load is massive. God save you, if single line of the intended data input is incorrect. Writting process is then aborted and the rest of the data still waiting to be injected. If the main database corrupts or some calculation got stuck, the whole business process freezes. Any slightly more complicated count takes enormous time. As more and more data flow in the process, it only gets worse at the time.

After seeing all this, Hadoop has said: Enough! It’s operation and data storage are similar to way we organize our elections. In Hadoop the total data is split into great number of smaller chunks (polling stations) that hosts limited sub-group of (voter) records. Data are stored close by their origin (same way the polling station is close to your place) and thus the reading and writing the data is much faster. When the (election result) sum is called, several (polling) places work in parallel to count the votes, not single election committee calculating everything. After calculation is completed on individual places (Hadoop calls then “nodes”), the nodes pass the result to higher level of aggregation, where results from nodes is summed to total result. While the node are calculating rather small and simple amounts, common PCs can serve as host for the nodes, no single supercomputer (one mega election station) is needed. These common PCs can have other use outside the Hadoop calculation (same way polling stations turn back to schools and community centers after elections are over). The analogy is real, as the Hadoop nodes have similar autonomy as the local electoral committees have in voting process. Same resources available to node are both steering the read/write process as they are taking part in calculations (similarly to local election committee members both casting their own vote as well as taking care of counting all votes casted). And finally if one node (local polling station) fails, another neighbouring node can take over the task. This way, the system is much more resilient to failure or overload. (imagine it to be the case of all voters have portal electoral ID allowing them to vote in another polling station if they home one went on fire)

The Hadoop geeks can object that the Hadoop-Elections analogy is not exact in two points: In order to prevent data loss, Hadoop intentionally stores the copies of the same data chunk to more than one node. If this was to be replicated in the voting process, people would have to cast the same vote into several polling stations “just in case one of them goes on fire”, which is not really wanted phenomena of proper elections (leaving aside the fact that there has been one legendary case of this actually happening in Slovakia). Secondly, there is not fixed, appointed state election committee overseeing the work of the local election committees. In Hadoop any local election committee can take role of the national one. However, leaving these two tiny details aside, the Hadoop-Election analogy fits perfectly.

So next time you need to explain to your boss (or somebody else) how Hadoop really works, remind them of elections. Or simply say “Lebo Medved”…

Späť na domovskú stránku

LONELINESS of data analyst

Higher salary. New challenges. Misfit with recent boss. Desire to get hand on much larger sets. Or pure world peace. None of the above is the main reason for data analysts to change their job. But what is then?

Not just American beauty contests, but also job interviews have quotes like “I wish a world-wide peace”. These are ridiculous statements on why you decided to change the job. I interviewed more than 350 people in my career and thus there is no shortage of even bizarre statements like “My father greats you, he thinks Allianz is still very good company and hopes you gonna get me the job.” Since 2017 I have been doing the interviews in a new country. I thought I would be learning some new “world peace” equivalents of this market. However, my surprise has been much bigger.

Loneliness as the denominator

I felt sorry for the guy, when it appeared in first interview. But when second, third and then forth candidate has come up with same reason of leaving recent job, I got on my toes. I thought if this market is some kind of fairy-tale kingdom cursed by mighty wizard. I could not comprehend, why so many people have loneliness as the common denominator of their life. This was not relationship- or friend- kind of loneliness. Rather this was all about work, the analytical loneliness.

How the story goes …

You graduated from some technical or IT-savvy economic university. You love puzzles and quizzes, you loved Math competitions and finding new interesting links between things. Thus, you look around the job market, where would you be able to put your talent into use. You attend several interviews and find out, that your potential employer has serious data conundrums ahead and that analysts would be measured against demanding criteria. You are thrilled. This is exactly what you are longing for: Let some task beat the f**k of you, so you can learn something new.

The first weeks on job is really fun, indeed. You discover new data that nobody analyzed before you. (even after you, but this you will find only later on). Demanding analytical tasks turn out to be a rather trivial data queries, often falling into reporting only category. But work is fun, so never mind. Your boss does not really understand the nitty-gritty of your work, he would not be up to finding you mistake in your queries. But he is really nice. Time flows, you achieved your quarter, year one, two … oh, man, how the time flies.

To your second Christmas in the company you beg Santa for a new boss, new project or at least the new version of the software you are working for. Just to get some progress ahead in your work life. You try to switch the job, try Start-up, Large corporations, family-driven business… Always few moments of thrill and then Yo-Yo effect of frustration hit as with diet. In the turmoil of internal fight, you sign up for expert conference. After all, one can get inspired externally as well. You meet a lot of stand-ins with same spark fading in the eye.

Loneliness of data analysts

With a bit of the alteration this was the common plot of all 4 candidate stories. None of them asked for salary of company benefits. None of them was investigating the chances of career growth in future. All they were interested to hear was WHAT and UNDER WHO’S LEADERSHIP will I work on? The usual comment went like “In my recent job I am the only one doing data science. Nobody understands that matter, I have no one to consult my approaches. I only know what I googled out on stackoverflow.com or similar portals. I feel lonely!”

It is interesting. When I gave it a deep thought, I realized I can find stories like that in my friend circles, too. The job of analyst is now going through strange wave. There is enormous demand for data-analyzing people. However, since Data Science is very young branch of analytics, there is very few managers, who actually did the Data Science themselves. Analysts often report to managers, you are not experts in the area. Thus, the young data scientists are doomed for the path of the bonsai: Nobody expects you to grow big, they water you time-to-time, but you are just cartoon of the real tree. The limitations of the expert growth of analytics (subject to another blog coming soon) are dire. These candidate stories only remind me of that again.

The same chorus 3 times again

Now it is clear to me that these were not isolated outliers. Sadly, the analysts’ loneliness is indeed a bitter interplay of three factors. Firstly, there is still very few data scientists. Therefore, it is unlikely that more people with this same job description bump into each other in the same company. Controlling, reporting, data engineers, these are all “multiple repeating’s” jobs in the same firm. But sophisticated analytics is often rather lonely. As a result, deserted Data Scientists do not experience team spirit, they have nobody to consult with their assumptions of uncertainty.

Second factor heating the loneliness is absence of managerial leadership. Sophisticated analysts often find themselves within teams that deal with data only marginally (e.g. sales or marketing). Or the teams work with data, but just to report on structured databases. As a result the Data Scientists do not receive proper feedback on their work and are set to learn only from own mistakes (which they have to detect themselves the first place). Some torture themselves with self-study portals and courses, but few have the iron self-discipline to do these for several years in row. Sooner or later, the majority just throws the towel.

The two above mentioned forces are joined by the lack of external know-how as the third factor. No offence here, but if you want to experience conference where each speaker of the day contributed to your growth, you probably have to organize one by yourself. No jokes here, trust me, my own experience speaks here. Expert conferences are rarely organized by people who can tell if the speaker is beneficial or not. Most of the Data Science events are vendor-brainwashing or people-headhunting traps. That is why (in the popularity ever rising) Meetup’s are often the only remedy for the expert conference hunger.

How to brake the vicious circle?

Even though our dispute might not suggest so, there is a happy-end for the 4 lonely candidate stories. Although they represent a sad probe into soul of the contemporary data analyst, they also show a way how to brake the vicious circle. Based on their talks and my own expert experience I would suggest you following 3 steps out of the loneliness:

Step 1: Self-diagnostics. The real change can happen only if participants admit there is an issue to address. This makes the change needed and unavoidable. Thus, please give yourself following 3 questions: 1] Do I have a boss, who understands my job to extent that (s)he would be able to temporarily step in during my absence? 2] Is there somebody else in our company that I can ask if I am doing my Data Science tasks correctly or give me tips if I got stuck? 3] Did I have a chance to get my hands dirty on at least 2 new analytical approaches that I never tried before over the last 6-9 months? If at least one “NO” emerged for you in previous three questions, you should seriously think about your analytical future.

Step 2: Deep breath before the dive. If you self-diagnosed yourself for a move ahead, do not run to job portals to look for new job ads. Before you embrace the switch, get ready for the leap. If you start looking for a new job now right away, you will most likely fail to get one. Grab a bit of self-discipline. Take some on-line courses, watch YouTube videos on your new desired expertise. But foremost, force yourself to really try by your own hands the new skills that you will need in new job. Maybe start here.

Step 3: Look for real Data Science leader in real Data Science team. To escape the analytical loneliness for real, one has to solve all the three underlying factors. Therefore, .. (a) you have to find a job, where you will be .. (b) part of the larger team working on Data Science and the team … (c) will be led by person with own, real analytical experience. Team that has ambition to work on plenty of new, analytics relying projects. Don’t get fooled by sexy offers of companies where one of the 3 aspects is hyped. Start-up’s often look as cool place to work, but often is accompanied with inexperienced founder who “just” had a great idea OR dire DYI conditions that leave you weak to unreliable systems or unskilled neighbor teams. I advise you to start the search by nailing down interesting projects and check on who is leading them. Alternatively, you can start your search from respected analytics manager and check is his team works on something that would make you dream big again. With high quality managers do not bother to revisit your former bosses’ profiles as well. As they say in airplane security instructions: “Look around, as your nearest emergency exit may be located behind you as well”.

On final note, let me emphasize that analytical loneliness might be a cyclic phenomenon. As the Data Science industry takes traction, teams will grow and finally form generation of nature Data Science leaders. However, in our region it make take well 5-7 years before happening so. Hence, probably a bit too long of a period to “shelter against the storm” in your recent hiding. Thus, if you identified yourself with some aspects of analysts’ loneliness, do not sooth yourself, it will get better sometime. Repeatedly do the same stuff, in stable, well-payed job, where my boss does not know enough to mess into my job or to fire me, can sound like recipe for nice life. So rather than galvanizing you to action, I ask you to give it a thought: How will the whole industry shift in the meantime? How do my chances to switch to more sophisticated analytics improve/worsen as I enjoy the “invincible” times? No matter if you decide to stay or get ready for the leap, let me wish you months (or years) without analytical loneliness.

Späť na domovskú stránku

CHILDREN had it REALLY tough on Titanic

If you had been a child at Titanic, you would have died with 46% probability. Damn sad. Especially if considered, that total survival rate of Titanic was 39%. Contrary to (proclaimed) Ladies-and-Children first evacuation rule, there is only 15pp difference in death incidence of children compared to overall all ages combined. Had they disregarded this evacuation mantra back there?

Maybe you think, hold on, it was ship wreck on the open sea, so maybe children were not able to swim. After all, we usually learn to swim a bit later into the life. Well, the whole truth is revealed only, if you drill down the children survival rate based on class, they had been traveling in on Titanic:

Children from first and second class had almost 3 times higher chance to survive then those from 3. class. Thus, the swimming skills were not the game there, even if you had been Mike Phepls’ son (or rather grandfather giving the year of the catastrophe).

Following graph depicts that Gentlemen from the upper class, have been not that much of the gentlemen at all:

Men from first class survived in similar rate as women and children from 3rd class. Quite the contrary, the most heroic approach took the men from 2nd class, out of who only 8% survived the trip of legendary steamship. But you wonder maybe, how would we know that?

Kaggle is a web portal that organizes competitions in analysis of data. What is more, on given portal, one can find several training data sets originated for training of the analytical skills. One of these sets is also Titanic Survival dataset. (you can download it HERE, but you have to be a registered Kaggle member). This attractive dataset compiles the data of passengers on the sadly known Titanic voyage with adding parameter if they survived the incident or perished.

Using Machine learning methods (which is dataset destined for training of) you can discover how location of your cabin within ship construction or how much you payed for ticket increased or decreased your chances of survival. One can also analyze if having more relatives on the ship was competitive advantage or disadvantage in fight for life. I will not rip you off the pleasure to find all these “sunk treasures”, but would like to indicate that some conclusions are pretty sinister (as the above mentioned are).

So, do not hesitate and try easy Machine learning predictions with this super interesting dataset. Feel free to share additional findings with THEmighyDATA community in discussion to this blog. On last note: “Near, Far, Wherever you are …”, may your Machine learning attempts to be as convincing as Kate Winslet with Leonardom di Capriom were on the railing 🙂

Späť na domovskú stránku

3 WAYS how to TEACH ROBOTS human skills

Surprising it may sound, but question of robots’ moral standards is with us for centuries and have been present well before first functioning robot prototype constructed. If you have seen the old Czech movie Emperor’s Baker, Baker’s Emperor (or its American version the Emperor and the Golem) or knows the story of the Golem, has actually encountered first attempts to make robots morally competent. Clay robot has been activated by inserting the magical stone (shaem) into robot’s forehead. From today’s perspective, more interesting that Golem’s clay is the fact that the robots back then acted as good or bad by standards of the man, who activated the Golem. Unfortunately, the story does not recall how this has been achieved. It is even more puzzling because Golems (and other early robots) did not have any senses to observe its surroundings or to adapt to environment.

Fast forward, human kind has moved from legends to reality. We can build (much more elaborate) robots, we can even equip them with all senses that regular human possesses (sight, touch, hearing, smell and taste). However, in question, how to make the robot good or evil by design we are not much further then Golem times. So what methods did we develop as humans to teach the machine what is morally acceptable and what not?

Some readers might get disappointed upon finding out that in programming robots we try to rely on the learning methods that we use to teach ourselves. The supporters of these approaches reason that if we want to create robots compatible with human kind, we should provide them with comparable education as we receive as humans. Some disillusion rises from opponents reminding that we hope robots to be better/fairer than the bar we, humans, se for them. While, let’s be sincere, our education system still produces abundance of cheaters, violence or human intolerance. Thus, striving for higher moral standards in robots does not seem to unreasonable request in that context. Be it reality or just our dream, these are three ways we now try to educate machines:

I. Mistakes as a path to success

Central Europeans have the right to be proud that one of the ways, which humans pick to teach robots, has started in Slovakia. Marek Rosa (Slovak founder or gaming start-up KeenSofwareHouse, decided to devout his full focus to Artificial Intelligence in company named GoodAI. Marek’s approach uses mentor to build robot’s thinking habits. Literally, robot’s own mistakes help him get better via loops of tasks served to him from mentor. Robot tries all possible solutions to the problem and those ways leading to desired result are stored into robot’s permanent memory as useful concepts. Mentor’s role is to feed robot with still more and more complicated tasks upon learning the simpler ones. When robot is faced with conditions same to already solved task from past, he acts to deliver the desired result.

This approach probably most closely follows the pattern of human learning: First we learn to detect digits, then to substract and multliply, yet later to solve the set of equations or to describe curve of time-space. The name GoodAI has been chosen to indicate that artificial intelligence trained this way will not feel off the balance by new circumstances. Robot simply selects the solutions that minimizes the damage and if more often facing that new phenomena, over the time it improves the method to ideal resolution of the problem.

II. When sugar succumbs to whip

Quite different approach has been selected by Ron Arkin, American professor of Robotics at Georgia Tech University in USA. Upon his experience he had gained by programming robots for American military, in classic sugar-whip choice he leaves the caloric option aside. His approach builds on simulating emotions in robots. And emotions that are. Arkin let the robot to decide and after the decision he simulates joy or shame in the robot’s system (by assigning black or red points to his solution). So if robot hits the barrier and tries to smash it by force, the teacher stimulates the shame feeling in robot’s memory about the damage caused to the wall. Therefore, next time robot hits the wall, he refrains the solutions that he felt ashamed about in past and prefers solutions that he has been praised about. This approach is essential because robots quickly learn to avoid unacceptable mistakes. In real life these robots will be less blunt in “surprising, never experienced” scenario than the ones trained by first method.

III. Read your robot a fairy tale before sleep

The third approach relies on moral standards development that we, humans, receive in the early childhood. Fairy tales and Stories. Mark Riedl, also from Georgia Tech University agrees to Good AI approach. But he reasons that we do not have enough time to teach robot every tiny bit of the intelligence by plethora of trials and fails.

Therefore, Riedl suggests that robot „reads“ great number of stories and analyses human thought and acting into cause-aftermath pairs. If robots during reading of the stories identifies formula that repeats, it stores it into the memory and will try to validate or disprove this rule in next stories to read. Already from legendary movie „Number five is alive“ (see video) we know that robots can read enormously fast. Hence, this approach of learning can progress much faster than other methods involving human feedback. Robot can this way infer from innocent stories that if humans walk into the restaurant, they sit down and wait for the waitress to take their food orders. Do you find this trivial? Well, then consider robots to be perplexed, why hungry humans do not storm into the restaurant kitchen and cook something for them, as they do at their homes. The advantage of the “fairy tale” approach is that it can train event complex events that are very complicated to construct into try-and-fail attempts used by Marek Rosa.

Together or against each other?

So, what all three methods share in common? Moral standards training of robots cannot rely on preprogramed routines. Even if we took the effort of rewriting all our moral standards into chains of “If X happens, then do Y“, robot educated by them would be still paralyzed if new circumstances arise. This way trained robots would also be rigid in times with their standards, fully inapt for human dynamics changing. Let’s not forget that not that long-ago women did not have right to vote and it was owner’s legal right to beat his slave on the street. Proper training of the robot must allow for him to “learn along seeing” new societal norms, same way we teach new customs upon arriving into foreign culture. At the beginning we are a bit cautious and reserved, but after few days we slowly learn not to be elephant in glasshouse. Robot’s training has several advantages to human education. Firstly, if one robot learns all the needed skills, all his next copies get them right away from moment of the construction. What is more, state authorities, can demand that all human facing robots in given country will share common moral standards and compulsory stick to them. The thing that would be so often needed in our human life as well. But that is different fairy tale to read …

Späť na domovskú stránku

OBSESSION WITH PHONE NUMBERS

One of the advantages of working with large (or even Big-) data sets is that you run interesting and often fun research and experiments. I came one of these opportunities when analysing data of one large Slovak corporations with customer count in range of millions. The study was organized around customer telephone numbers and has arrived upon interesting insights. After all, you better see for yourself:

Are Slovaks obsessed with mobile telephone numbers?!

Dataset comprised of all telephone numbers of the company’s customer base. All of the underlying customers have been retail persons, so now common prefixes of phone numbers involved, pure random subset of country’s mobile phone book. Slovak mobile numbers are always of +421 xxx yyy yyy format. Therefore, the numbers have been ripped of the international (+421) prefix and following 3 digits xxx (signalling which mobile carrier provided the number) have been scrapped as well. Remaining 6 digits (yyy yyy) are part of telephone number that customer chooses (or is proposed) upon signing the contract. This was also subject of the study, as last 6 digits is the part that customer can actively influence.

Simple research question was plotted: Do Slovaks have any significant preference for digits when selecting their mobile numbers? While if some number pattern appears more often then it statistically should, it is clear, this option was actively requires by clients. So do we, Slovaks, form any distinct pattern this area?

Don’t be zero. At least at start …

Even though zero is highly regarded among blood groups (can be transferred to any other patient’s veins), this digit is not having a stellar reputation among Slovak phone users. Especially not so in the beginning of the phone number. If you look at what is the frequency of each digit in first position of the telephone number, you clearly see that zero is by far the least popular and Slovaks seem to have some kind of ZERO (and SEVEN) phobia. ON the contrary, 1,2 and 3 seem to be among the most wanted digits to start your telephone number.

The feature gets even more anecdotal, if you understand that we “fear“ this 0’s and 7’s just in first position of the number. If you look at incidence of the each digit on 2nd to 6th place within the numbers the phobia is all of the sudden away. What is more, 0 appears to be quite popular in latter part of the number

Favourite digit? Yes, but diferent in every age

If also possible positions of given digit taken into account, Slovaks crave for having the 2’s represented in number at most. (At least one digit 2 has been selected into their number by 49%). Therefore, if you meet at least 4 people for family lunch, and everybody writes down his/her telephone number on paper, you can shine with a magic trick claiming that the figure 2 is present at least twice on the papers.

Interestingly the least popular digit in Slovak telephone numbers is digit 9. In phone numbers of our fellow residents the 9’s appear by 10% less often then above mentioned 2’s. What is more striking that these digits preference is not consistent across age groups and some age groups have different preferred digits in their phone number the others. It is clearly documented by digit preference by age clusters:

The tendency towards loving “2” and “3” appears to be prevalent across generations. But teenagers find ZERO much more cool than for mid-agers, where “0” is the least liked digit. Older generations tend to prefer “5” (may there be a connotation to Number 5 alive movie?), but younger telephonists are more than lax about it. Interesting trend that all age groups come to agreement is significant lower usage of “7”, “8” and “9.”

Mobile number as status symbol. How badly do Slovaks crave for “nice” number?

You might or might nor remember (based on your age), but when mobile phones have been introduced in mid 90’s of previous century, there was strong demand for fetching nice numbers. This number show-off has sprang out fact that in socialist time, the telephone numbers have been assigned by state Slovak Telecom operator. Having phone line wired to your flat/house was not obvious, so when the line has been installed, you cheered to be in network first place, selecting number was unthinkable. Therefore, upon mobile phones arriving, with you back then being able to pick you own number, hunt for status numbers (e.g. 999 777) has been on rise.

This is obviously one of the reasons that “nice” numbers have been swallowed and they appear not in pool much more frequently then other combinations. Slovaks indeed have an obsession for how their number looks. Just have a look how many Slovaks require their number to have following features:

In overall, more than half of the Slovak numbers have at least one form of the “nice“ number. That this is not and random phenomena, can be testes with use of Benford’s Law as well. So if Slovak is to choose new phone number, the number better be easy to remember (or at least so was the excuse commonly used for selecting status numbers 😊)

Dates are not cool at all

If people honestly would like to remember their phone numbers, the easiest would be to pick their birth date. Six-digit-form of the phone number actually offers great window of opportunity for that (e.g. 1981 3 6 or 810306 which is also form of socialist time social security number). To my great surprise, this trend has not picked up at all. We have tested the frequency of birthdays in Slovak population and then matched this to list of mobile numbers selected. The research shows that Slovaks do not chose their birthday to appear in phone number. One could object that it can be wedding anniversary or partner’s birthday to appear in the number, which would be nearly impossible to match for test. Nonetheless, the dates in general appear well below average tendency in Slovak mobile numbers.

Nation of Secret service agents

If the might of nations own secret service was determined by how many people in given nation hold agent-like number, Slovakia would be a real intelligence super power. For every 1000 Slovaks there are 41 James Bonds (having 007 as the number ending). Just for the illustration that is 4-times more than as if the people choose numbers randomly. What is interesting here that in there has been much more Bond Girls in the famous spy movies and James Bond is perceived predominantly a “mucho” character, this “spy tendency” is equally distributed among male and female in the Slovak population.

Yes, we are superstitious too

Even though we, Slovaks, tend to pick really nice number, still there is three-digit combination that we try to avoid. Devil number (666) is desirable to have in phone number for less than 0,09% of population. The fear of this demonic digit combination is also evident in comparison of other all-3-same combinations. Reviewing frequency of how often number starts with 111 to 888, it is evident that 666 combination is the least popular, even though digit “6” is rather popular among Slovaks in general.

[just for explanation, using combinations 000 a 999 has been regulated by mobile operators as they were used for different purposes, so these 2 options have been excluded from comparison]

********* This is just one of 100+ blogs on how both funny and useful might be data analytics in our lives. Blogs have been published originally on two news portals SME.SK or TREND.SK, but since last year I publish them here on www.themightydata.com. Most of the blogs were originally written for my Slovak fellows, but I constantly work on getting the most interesting ones to get translated into English as well. If you wish to receive updates on new blog posts, do not forget to register into free MightyData community. Members of the community receive also access to presentations and locked private blogs and tutorial videos not visible to general public. If you liked the blog or you have a further question or comment, please drop me an email to info@mocnedata.sk . *********

This blog post is (some what funny) illustration of how META DATA might have important role in analytics. Meta data are side information on form or other context of the data points. They are not intended (and not expected) to carry the relevant information (e.g. phone number of person is not perceived to be field for predicting his/her behavior), but they often have interesting analytical and predictive implications. There are more illustrations of this phenomenon in my writing, I even coined a term “data underdogs“, as they are surprisingly useful but not expected to be so. I promise to share more of the English versions of blogs on this topic soon.

Späť na domovskú stránku