Unconventional methods of Feature engineering

My dear fellows,

after settling down in Berlin Datascience community and visiting few interesting, local meet-ups, I agreed to contribute to the community knowledge sharing as well. Thus, if you have a while you can join Sebastian Meier [Technologiestiftung Berlin], Sandris Murins [Iconiq Lab] and me on June 14th 2018 18:00 at AI IN DATA SCIENCE – BERLIN at Wayfair DE Office, Köpenicker Str. 180, 10997 Berlin, Germany.

To ensure you that it really pays off to reserve some time for this meet-up let me make a sneak-preview of what you might get from this event:

BERLIN AI TALK - Filip Vitek

COMING_SOONIn my presentation, titled  “Feature engineering – your trump card in Machine Learning”  you will find out what are the 2 key reasons for feature engineering future. First explains how variables can become competitive advantage of your models. The second one goes even beyond that and hopefully opens your eyes why Feature engineering might be actually essential for Data Scientists’ future job security.

After getting warmed up on the overall role of features, we shall jump into how variables are actually designed/generated in real life. What are the most used  approaches to design the features? But more importantly, what are the common pitfalls that we all often fall into while designing our feature sets for Machine Learning predictions? Are there tricks to prevent us from failing on these ?

uncoventional

However, just to demote the traditional ways and solely point out their notorious downsides without naming the alternative(s) would be a bit unfair. Therefore, in second half of my talk I want to walk you through unconventional ways how to design your features. What is more, I will take help of 4 specific case studies where unconventional features were needed. I hope you might get inspired for your own work.

If you cannot make it (for whatever reason) to this talk and you are member of MighytDataCommunity, you will find the presentation slides here after event, in member-only-restricted blog post. If you are not member of the free MightyData Community yet, you can get your free membership to this community in less then 2 minutes, by registering here.

Looking forward to seeing you all at the AI Meet-up event !

 

 

 

Bread for 8 Facebook likes. You can already buy food for personal data.

Data are labelled to be commodity in corporate speak for already quite some time. Monetization (e.g. selling of) the data is not just trendy concept but turned to be essence of business for horde of smaller or medium companies. (let me name Market locator as one of the shiny examples) However, so far, the data trade has been a B2B business mainly. Ordinary person could not pay for his regular food supplies with his/her own data. That is over for good. Grocery store has been opened in Germany, where you pay for purchase solely by personal data.

DATEN_markt_chleba

Milk for 10 photos, bread for 8 likes

No, this is not any April’s fool or hidden prank. In German city of Hamburg Datenmarkt (= datamarket) has opened its first store, where you pay solely by personal data from Facebook. Toast bread will cost you 8 Facebook likes or you can take pack of filled dumplings for full disclosure of 5 different Facebook messages. As you would expect the Grocery to, Datenmarkt ran lately special weekly action on most of the fruit with price cut down to just 5 Facebook photos per kilo.

DATEN_markt_frucht

Payment is organized into similar process as you know from the credit card payment via POS terminal. But instead of entering the PIN into the POS terminal to authorize the transaction you have to log into your Facebook account and indicate which Likes given, Facebook photos posted or messages from FB messenger, you decided to pay with. The cashier receipt also showcases the data that you decided to sacrifice. These provided data are at full disposal of the Datenmarket investors to sell them (along your metadata) to any third party interested. By paying for the goods by your social data, you give full consent to Datenmarkt for this trading.

DATEN_markt_ucet_II

Insights we give away

What might look like elegant and definitive solution to homeless people starving has actually much higher ambitions. Authors of this project want to also raise the awareness, that in our age any data point has its value. Reminding us that the sheer volume of data we freely give away to Titans like Google, Facebook or Youtube, has actually real monetary value attached to them.

In more mature markets the echo of “what is the value of digital track of person” resonates more intensively with every quarter of they year. To put it differently, if we hand over data to “attention brokers” (as labelled them Columbia University, USA professor Tim Wu, studying the matter) what do we actually get in return for those data? We might come to the point where outstanding debt forces us not only to “smash our piggy bank” but to sacrifice our Facebook assets as well.  Most of the governments have set monetary equivalent to human life (often calculated as lifetime taxes contribution). But value of our digital life remained still to be added. Bizarre? Well, not any more in our times.

Food Yes, but not for the services

The opening of the Hamburg Datenmarket, , sadly, confirms the hypotheses researched by behavioural economists for more than 5 years. In monetization of their data people simply tend to stick to same principles as by sweets (or other tiny sins). Immediate gratification wins above promise of some future value. Most of the citizens comprehend that their personal data might have some value. If given the option of using the personal data to get cheaper or better services (e.g. insurance or next haircut for free) soothing up to 80% of them turn the offer down. However, if the same people were given choice of something tangible immediately, they were willing to trade their personal data for as trivial things as one pizza.

DATEN_markt_gulky

Now your thoughts go: “Wait, but the above mentioned digital giants are not offering anything tangible, all of the Google, Facebook and YouTube business is in form of the services, isn’t it? And you may be right but the point here is that all of them offer immediate gratification. (immediate likes on Facebook, hunger for quick search on Google or immediate fun with YouTube videos.) Quite a few people would see turning their digital data (deemed to be as material as air we breath in and out) for something consumable as “great deal”. If this trend confirms to be true, it is likely that service exchanging yogurt or ticket to concert for our personal data will be mushrooming around. If you have similar business idea in your head, you better act on it fast. Commission for themightydata.com portal for seeding this idea to your head remains voluntary for you 😊.

Hold your (data) hats …

Immediately after roll-out of personal data payment for existing goods, there will be someone willing to give you goods for personal data loan (or data mortgage). In other words, you will hand over your author rights in advance, even before the content is created. For those in financial difficulties this still might seem more reasonable option to give away than fridge or you shelter. But here, we are already crossing the bridge to digital slavery of a kind. Keep in mind that the most vulnerable would be the young lads, still having long (and thus precious) digital life ahead, but short on funds to (e.g. by new motorbike) in this life stage. The whole trend will be probably infused by some industries (like utilities) that long the data to be their saviour from recent misery of tiny margins.

Soon we shall expect that besides Bitcoin, Ethereum of Blackcoin fever, person would be able to mine (from own Facebook account) the currency like FB-photoCoin or FB-LikeCoin. Ans our age will gain yet another bizarre level on top. Just imagine the headlines “Celebrity donated to charity 4 full albums of her Facebook photos? But, nobody can eat from that? Or can he now?

You might be interested to read:

3 WAYS how to TEACH ROBOTS human skills

What PETER SAGAN learned about his sprint from Helicopter?

CHILDREN had it REALLY tough on Titanic

What ALTERNATIVE do we have to AI?

Artificial intelligence (AI) and its applications are mushrooming in more and more areas of our lives. To read economic magazine without stumbling across AI article becomes almost impossible. As my Grandma used to say: “I open the fridge and I fear AI jumping out if. (which, by the way, happens too). If you possess a bit of critical thinking, you may be revisiting the thought “But why, for God sake?!”

You may have found out yourself that professional blindness is a strong phenomenon. It has not spared myself either. On daily basis, it my job to think how to improve machine learning and predictive analytics, so that it brings value to our company. Therefore the “poke’’ on why are we embracing the AI so heavily, has come from unexpected source. Our talk has been flowing continuously, when striking and well-aimed question has aroused: “Why, on earth, do we humans invest so much money to create something that can dwarf us? Where would human race progress, if all those billions were actually aimed at developing the human intellect?” I took a breath before launching the tirade on clear AI benefits and … then I swallowed the sentence. There is a point in this thought. Is there actually alternative to AI?

It was beautiful autumn afternoon and we have been walking our dogs in one of the parks of Bratislava, Slovakia. My wife is respected, seasoned soft skills trainer for large global company. I have been explaining to my wife the fascinating essence of AlphaGo victory over the GO game world champion. She silently reflected on the story, then stopped for a while and turned to me: “Why, on earth, do we humans invest so much money to create something that can dwarf us? Where would human race progress, if all those billions were actually aimed at developing the human intellect?” As any husband, unthinkable ideas are something, that one get used to from our beloved ones. But just before I spit the answer, I had to admit to myself I never thought this way. Existence and development of AI seemed to me natural same way as lumberjack probably does not think about paper recycling.

But think provocative idea kept on itching me. Consequently, I started to think about what alternative to AI we REALLY have? Why are we so eager to advance with AI anyway? So, let’s do a deep-dive together:

The original motivation

At the very beginning of the computers (and their usefulness) was the desire to count things, where humans can do mistakes. Mainly the complex calculations with high precisions (too many digits) are the primary targets. While, back then the only alternative had been paper and column of figures to add together. To be fair, it is difficult to object to his motivation, as we all know, that humans are indeed error prone. But to keep the dispute entirely fair, one should add “humans without any training are error prone”. One team taught me this lesson during my time working for Postal Bank, where I had the privilege to lead Client and processing centre. The group of a dozen middle-aged and senior ladies faced every single day the staggering task to retype 10 000 hand-written, paper money orders per employee into transaction system. Even though that this was very monotonous (maybe even dull) task – imagine you retype digits from paper to screen 8 hours a day, every day – their error rate was hard-to-believe 1 in 100 000. If you apply 4-eyes check to this process, you are at level of 1 in 10 million. That beats most of the computer managed processes I have ever seen.  So, the error rate is certainly not the ultimate excuse for AI.

Paradoxically, the second motivation for strong boost in computer intelligence was the effort to hide something from the others. Encryption and breaking the cyphers were strong pull for computer science. The entire storing of Turing is great testimony to it. What we should note here, that this was also first attempt to use AI against the human. In recent security situation, it is difficult to argue against privacy of the communication or interception risk. More idealistic souls would probably stand to “More human trust would bread less encryption needs.” But obviously this is more difficult issue, as already decades a go we needed sealed envelopes to send even the banal news from our lives. The encryption need springs from ultimate human longing to manipulate other. And there the humanity is not keen (for centuries) to give up.

That brings us to the third motivation for human race to massively introduce computer science to their lives: Comfort. Similarly, as other machines also the computers tuned-in to human comfort seeking passion. Obviously, human can also try solve numeric optimization by plugging in 20 000 times the values to set of equations. But why would the mortal to bother with this, if the computer can dully take this task for him. The sad part of the story here is the fact that computers were not cleverer at solving the equation, it was pure brute force. (it was the speed by which the computers could enter too many wrong solutions before stumbling across a correct one). While I have met several genius humans, who were talented to snap computations in their heads. The younger crowd maybe puzzled by info that there is even “paper-based” way to calculate any square root, so one can really do it without reaching for calculator.

True, even with supreme training, none of the humans would match the millions per second calculations done by modern computer processing units (CPUs). But to stay on a fair ground, even the most advanced AI has come to the point where more computational power is not gained mainly by getting better and better CPUs, but rather tapping into parallelization.  Thus, if you realize that only fraction of percent of humans are earning their bread with daily calculating something, it is safe to say we have not really tried the full power of human parallel power before (somewhat eagerly) embracing the artificial intelligence. In this sense the cheeky question of my wife holds: Why humans decided to build better and better electron box rather than try to improve the intelligence of our fellow humans?

Last nail to coffin

All of the above trends would be still revertible if not for one more important human decision made. For decades the engineers and scientists were putting their brainpower to question of how to replace as much of the human labour by robots as possible. At the peak of the 90s their succeeded in this effort and they – well – start to look left-and-right for next mission to conquer. Research centres all around the world jumped on simulating human senses and human line of thought.

klinecHowever, to succeed in this, first the machines needed to copy our cognitive functions. So, step by step, we taught computers to detect the voices and images same way we, humans, do that. Once they mastered that, we instructed them how to realize the basic tasks (like continuously check on temperature in the room and if below threshold, put the heating on). But as we do not have not entire understanding of our thinking processes, soon we ran into the problem that more complex tasks cannot be rewritten into chain of what-if instructions. So, we let the computers try (and fail) as many time as the machine (repeatedly) mimicked the desired result. Thus, machines derived rules for things we ourselves were unsure of (and thus we struggle to validate as general principles). Areas, where humans were not able to generate reasonable number of examples for machines’ try-and-fail learning, are still unmastered by computers.

Where are we (doomed) to go

From above stated facts, it is clear that AI rise was not inevitable consequence of the history and that this train has been set on its track (and cleared to go) by us, mortals. First impatience to our own mistakes, later suspicion and finally laziness to repeated mental tasks. Without being pathetic, I am not that sure we always used all the options to “match” the AI.

Now we have progressed even one step further and launched the development of universal artificial intelligence (UAI) that can zoom out to think if it thinks properly. There are already several ways, how to teach computers to copy human approaches. Yes, in some areas, it is  still not clear, how to teach computers  to beat us in that domain. But all it takes is just few research projects that compile large enough set ( equals millions) of  of the annotated examples and machines will tediously crack their way through these obstacles. If you look from any angle, “the genie is out of the bottle” and the machines’ feat to rival the humans is imminent. So what options are left with?

The most naïve option would be to try to fully stop. There are still some areas that machines cannot do and if we, humans, do not show them what is right and wrong in that areas, they will not master this skill (e.g. human empathy). But reading this line again after myself sounds equally silly as the rallies to damage machines taking the human labour at outbreak of industrial revolution. Therefore, I deem this scenario to be highly unlikely.

The secončipd option is to use artificial intelligence to reversely strengthen the human development. If you remember 1996, when Gary Kasparov was paying the chess against the Big Blue computer, he lost his first game of the best-of-five match. Gary, back then, did not have any human rival, who would bring him to the rim of losing the game. However, under extra stimuli of the super powerful computer opponent, he managed to top his skill yet even more and he defeated the computer in overall match score. Therefore, if we use AI to stimulate human development, in some areas we could (at least temporarily) reclaim the “throne” of intelligence. However, that would require to add extra layer of AI that explains to human the principles it has “learned itself”.

The third alternative to AI dominating the humans is to merge AI into human intelligence. In other words, to find a biologically sustainable way for our brain (maybe via implants) to be extended for additional layer of all that AI has learned. With substantial risk of overgeneralization this can be parallel to you extending the camera memory by plug-in memory card. This way, any AI advancement would immediately be translated into human potential as well. The principle certainly carries some moral risks as well (who decides on what will be programmed into heads of the others), but also prevents the doom scenario of machines taking the rule over the humans.

One way or the other

Despite the encouragement from previous paragraphs, I am afraid that I do not foresee the happy-end scenario in this AI story.  To tip the hand of intelligence dominance scale back a bit towards human race, we would need 2 essential ingredients: A] source of information that is independent of computer infrastructure (otherwise, later in time we would be not able to tell the difference between info computer generated and the reality just chronicled into computer). And add to that also B] quick way of replicating of all knowledge among human-fellows, even across the generations. Contrary to machines, that need to learn all our knowledge just once, we, humans, need to relearn the knowledge with every new generation again. In both required dimensions there exist the theoretical approaches, but their implementation is not to be seen anywhere near ahead in forthcoming years to come.

Therefore, as the AI development is at its full speed, we should be ready to face singularity scenario (machines surpassing our entire intelligence) as the most probable scenario. Including all the social implications arising from this. (mainly wave of unemployment and career progression crises that we aim to discuss in more detail on this blog soon). Because, all in all, with AI taking over there is much more swallow than our intelligence pride.

Elections are like Hadoop

In my mother tongue, when one does not know/want to share reason for his actions he says “Lebo medved” (= due to the bear). That brings the actual bears off balance, because they also do not have a clue why you would do that. Going through several elections in past few months, I realize that elections are “Lebo Hadoop” = a bit like Hadoop. Don’t you get it why? Well, then let both you and yellow elephants understand why:

Hadoop

Hadoop and other technologies of distributed computing and data analytics are still less common in our society than the progress would expect them to be. Upon explaining the concept to managers I quite commonly have to reason that  implementing Hadoop (which they heard of at any business conference lately) is not switching Fords for Volkswagens or like migration from MS SQL to Oracle. To really embrace the distributed technologies is rather massive change to company business processes. As this stage I usually run short of managerial understanding and start to hear statements like “After all, Hadoop is piece of software, so it can be reinstalled to take over the actual SW, isn’t it?” You roll your eyes and start to think if it is actually worth investing further clarification effort, when the chances to succeed are so small. And exactly for these situations comes handy following explanation:

VOĽBY okrskova komisia

In most of the European countries elections are organized through means of voting districts. Every voter is registered to exactly one polling station, usually the closest to his/her living address. That helps to make sure that the voter does not have to travel too much to polling station and thus allowing to minimize the time needed for single voter to cast his/her vote. Immediately as the election time is over, the local polling station (through hands of the local electoral committee) starts to count the votes casted for different candidates. After they finish they work, they create election protocol summarizing all achieved results of vote in given polling station. This protocol than travels to county electoral committee to sum up all of their polling stations and then through means of regional and national electoral committee to sum the entire election result. We feel this process to be, somewhat, natural, as we got used to that over the decades. But let’s imagine it would be all done differently …

Imagine the whole country would need to vote at the very same place. That would mean some people will travel very long distance (what certainly would influence participation in elections). What is more, imagine how big would have to the polling station to be, to actually host that millions of voters. And how long the queue would be to actually cast the vote. Polling station that big would have to be a dedicated building built just for that purpose (that is of no other use outside the elections). This way organized elections would be only possible with flawless voter register. Any small discrepancies in recognizing or registering the voter would clutter the queue waiting to give their vote. This election would be ultimate hell for local electoral committee. They would need, literally, weeks if not months before single committee counts and verifies several millions of voting ballots. And imagine that some of the electoral committee members get ill or go on strike. As there is only one electoral committee in whole country there is no replacement at hand. By this moment, you are probably getting irritated and think “Who on Earth would go such a down right insanity?!

dlhy rad ľudí

Well, you might be surprised to find out this is exactly the way how our legacy relational SQL databases operate. They try to store all data into the same table(s). As a result, the computer that can handle that much information, has to possess immense capacity that is both (as single polling station in country) too robust and too expensive. What is more, these cannot be the regular PCs honed by normal users, these have to be dedicated machines (usually not utilized outside of database tasks). All data points have to be written sequentially what automatically creates long queues if data load is massive. God save you, if single line of the intended data input is incorrect. Writting process is then aborted and the rest of the data still waiting to be injected. If the main database corrupts or some calculation got stuck, the whole business process freezes. Any slightly more complicated count takes enormous time. As more and more data flow in the process, it only gets worse at the time.

After seeing all this, Hadoop has said: Enough! It’s operation and data storage are similar to way we organize our elections.  In Hadoop the total data is split into great number of smaller chunks (polling stations) that hosts limited sub-group of (voter) records. Data are stored close by their origin (same way the polling station is close to your place) and thus the reading and writing the data is much faster. When the (election result) sum is called, several (polling) places work in parallel to count the votes, not single election committee calculating everything. After calculation is completed on individual places (Hadoop calls then “nodes”), the nodes pass the result to higher level of aggregation, where results from nodes is summed to total result. While the node are calculating rather small and simple amounts, common PCs can serve as host for the nodes, no single supercomputer (one mega election station) is needed. These common PCs can have other use outside the Hadoop calculation (same way polling stations turn back to schools and community centers after elections are over). The analogy is real, as the Hadoop nodes have similar autonomy as the local electoral committees have in voting process. Same resources available to node are both steering the read/write process as they are taking part in calculations (similarly to local election committee members both casting their own vote as well as taking care of counting all votes casted). And finally if one node (local polling station) fails, another neighbouring node can take over the task. This way, the system is much more resilient to failure or overload. (imagine it to be the case of all voters have portal electoral ID allowing them to vote in another polling station if they home one went on fire)

The Hadoop geeks can object that the Hadoop-Elections analogy is not exact in two points: In order to prevent data loss, Hadoop intentionally stores the copies of the same data chunk to more than one node. If this was to be replicated in the voting process, people would have to cast the same vote into several polling stations “just in case one of them goes on fire”, which is not really wanted phenomena of proper elections (leaving aside the fact that there has been one legendary case of this actually happening in Slovakia). Secondly, there is not fixed, appointed state election committee overseeing the work of the local election committees. In Hadoop any local election committee can take role of the national one. However, leaving these two tiny details aside, the Hadoop-Election analogy fits perfectly.

So next time you need to explain to your boss (or somebody else) how Hadoop really works, remind them of elections. Or simply say “Lebo Medved”…

LONELINESS of data analyst

Higher salary. New challenges. Misfit with recent boss. Desire to get hand on much larger sets. Or pure world peace. None of the above is the main reason for data analysts to change their job. But what is then?

Not just American beauty contests, but also job interviews have quotes like “I wish a world-wide peace”. These are ridiculous statements on why you decided to change the job. I interviewed more than 350 people in my career and thus there is no shortage of even bizarre statements like “My father greats you, he thinks Allianz is still very good company and hopes you gonna get me the job.” Since 2017 I have been doing the interviews in a new country. I thought I would be learning some new “world peace” equivalents of this market. However, my surprise has been much bigger.

Loneliness as the denominator

I felt sorry for the guy, when it appeared in first interview. But when second, third and then forth candidate has come up with same reason of leaving recent job, I got on my toes. I thought if this market is some kind of fairy-tale kingdom cursed by mighty wizard. I could not comprehend, why so many people have loneliness as the common denominator of their life. This was not relationship- or friend- kind of loneliness. Rather this was all about work, the analytical loneliness.

How the story goes …

You graduated from some technical or IT-savvy economic university. You love puzzles and quizzes, you loved Math competitions and finding new interesting links between things. Thus, you look around the job market, where would you be able to put your talent into use. You attend several interviews and find out, that your potential employer has serious data conundrums ahead and that analysts would be measured against demanding criteria. You are thrilled. This is exactly what you are longing for: Let some task beat the f**k of you, so you can learn something new.

The first weeks on job is really fun, indeed. You discover new data that nobody analyzed before you. (even after you, but this you will find only later on). Demanding analytical tasks turn out to be a rather trivial data queries, often falling into reporting only category. But work is fun, so never mind. Your boss does not really understand the nitty-gritty of your work, he would not be up to finding you mistake in your queries. But he is really nice. Time flows, you achieved your quarter, year one, two … oh, man, how the time flies.

To your second Christmas in the company you beg Santa for a new boss, new project or at least the new version of the software you are working for. Just to get some progress ahead in your work life. You try to switch the job, try Start-up, Large corporations, family-driven business… Always few moments of thrill and then Yo-Yo effect of frustration hit as with diet. In the turmoil of internal fight, you sign up for expert conference. After all, one can get inspired externally as well. You meet a lot of stand-ins with same spark fading in the eye.

samota

 Loneliness of data analysts

With a bit of the alteration this was the common plot of all 4 candidate stories. None of them asked for salary of company benefits. None of them was investigating the chances of career growth in future. All they were interested to hear was WHAT and UNDER WHO’S LEADERSHIP will I work on? The usual comment went like “In my recent job I am the only one doing data science. Nobody understands that matter, I have no one to consult my approaches. I only know what I googled out on stackoverflow.com or similar portals. I feel lonely!”

It is interesting. When I gave it a deep thought, I realized I can find stories like that in my friend circles, too. The job of analyst is now going through strange wave. There is enormous demand for data-analyzing people. However, since Data Science is very young branch of analytics, there is very few managers, who actually did the Data Science themselves. Analysts often report to managers, you are not experts in the area. Thus, the young data scientists are doomed for the path of the bonsai: Nobody expects you to grow big, they water you time-to-time, but you are just cartoon of the real tree. The limitations of the expert growth of analytics (subject to another blog coming soon) are dire. These candidate stories only remind me of that again.

The same chorus 3 times again

Now it is clear to me that these were not isolated outliers. Sadly, the analysts’ loneliness is indeed a bitter interplay of three factors. Firstly, there is still very few data scientists. Therefore, it is unlikely that more people with this same job description bump into each other in the same company. Controlling, reporting, data engineers, these are all “multiple repeating’s” jobs in the same firm. But sophisticated analytics is often rather lonely. As a result, deserted Data Scientists do not experience team spirit, they have nobody to consult with their assumptions of uncertainty.

Second factor heating the loneliness is absence of managerial leadership. Sophisticated analysts often find themselves within teams that deal with data only marginally (e.g. sales or marketing). Or the teams work with data, but just to report on structured databases. As a result the Data Scientists do not receive proper feedback on their work and are set to learn only from own mistakes (which they have to detect themselves the first place). Some torture themselves with self-study portals and courses, but few have the iron self-discipline to do these for several years in row. Sooner or later, the majority just throws the towel.

The two above mentioned forces are joined by the lack of external know-how as the third factor. No offence here, but if you want to experience conference where each speaker of the day contributed to your growth, you probably have to organize one by yourself. No jokes here, trust me, my own experience speaks here. Expert conferences are rarely organized by people who can tell if the speaker is beneficial or not. Most of the  Data Science events are vendor-brainwashing or people-headhunting traps. That is why (in the popularity ever rising) Meetup’s are often the only remedy for the expert conference hunger.

meetup

How to brake the vicious circle?

Even though our dispute might not suggest so, there is a happy-end for the 4 lonely candidate stories.  Although they represent a sad probe into soul of the contemporary data analyst, they also show a way how to brake the vicious circle.  Based on their talks and my own expert experience I would suggest you following 3 steps out of the loneliness:

Step 1: Self-diagnostics. The real change can happen only if participants admit there is an issue to address. This makes the change needed and unavoidable. Thus, please give yourself following 3 questions:  1] Do I have a boss, who understands my job to extent that (s)he would be able to temporarily step in during my absence?  2] Is there somebody else in our company that I can ask if I am doing my Data Science tasks correctly or give me tips if I got stuck?  3] Did I have a chance to get my hands dirty on at least 2 new analytical approaches that I never tried before over the last 6-9 months?  If at least one “NO” emerged  for you in previous three questions, you should seriously think about your analytical future.

Step 2: Deep breath before the dive. If you self-diagnosed yourself for a move ahead, do not run to job portals to look for new job ads. Before you embrace the switch, get ready for the leap. If you start looking for a new job now right away, you will most likely fail to get one. Grab a bit of self-discipline. Take some on-line courses, watch YouTube videos on your new desired expertise. But foremost,  force yourself to really try by your own hands the new skills that you will need in new job.  Maybe start here.

spiralaStep 3: Look for real Data Science leader in real Data Science team. To escape the analytical loneliness for real, one has to solve all the three underlying factors. Therefore, .. (a) you have to find a job, where you will be .. (b) part of the larger team working on Data Science and the team … (c) will be led by person with own, real analytical experience.  Team that has ambition to work on plenty of new, analytics relying projects.  Don’t get fooled by sexy offers of companies where one of the 3 aspects is hyped. Start-up’s often look as cool place to work, but often is accompanied with inexperienced founder who “just” had a great idea OR dire DYI conditions that leave you weak to unreliable systems or unskilled neighbor teams. I advise you to start the search by nailing down interesting projects and check on who is leading them.  Alternatively, you can start your search from respected analytics manager and check is his team works on something that would make you dream big again. With high quality managers do not bother to revisit your former bosses’ profiles as well. As they say in airplane security instructions: “Look around, as your nearest emergency exit may be located behind you as well”.

On final note, let me emphasize that analytical loneliness might be a cyclic phenomenon. As the Data Science industry takes traction, teams will grow and finally form generation of nature Data Science leaders. However, in our region it make take well 5-7 years before happening so. Hence, probably a bit too long of a period to “shelter against the storm” in your recent hiding. Thus, if you identified yourself with some aspects of analysts’ loneliness, do not sooth yourself, it will get better sometime. Repeatedly do the same stuff, in stable, well-payed job, where my boss does not know enough to mess into my job or to fire me, can sound like recipe for nice life. So rather than galvanizing you to action, I ask you to give it a thought: How will the whole industry shift in the meantime? How do my chances to switch to more sophisticated analytics improve/worsen as I enjoy the “invincible” times? No matter if you decide to stay or get ready for the leap, let me wish you months (or years) without analytical loneliness.

CHILDREN had it REALLY tough on Titanic

If you had been a child at Titanic, you would have died with 46% probability. Damn sad. Especially if considered, that total survival rate of Titanic was 39%. Contrary to (proclaimed) Ladies-and-Children first evacuation rule, there is only 15pp difference in death incidence of children compared to overall all ages combined. Had they disregarded this evacuation mantra back there?

Maybe you think, hold on, it was ship wreck on the open sea, so maybe children were not able to swim. After all, we usually learn to swim a bit later into the life. Well, the whole truth is revealed only, if you drill down the children survival rate based on class, they had been traveling in on Titanic:
titanic_graph_1

Children from first and second class had almost 3 times higher chance to survive then those from 3. class. Thus, the swimming skills were not the game there, even if you had been Mike Phepls’ son (or rather grandfather giving the year of the catastrophe).

Following graph depicts that Gentlemen from the upper class, have been not that much of the gentlemen at all:

titanic_graph_2

Men from first class survived in similar rate as women and children from 3rd class. Quite the contrary, the most heroic approach took the men from 2nd class, out of who only 8% survived the trip of legendary steamship. But you wonder maybe, how would we know that?

Kaggle is a web portal that organizes competitions in analysis of data. What is more, on given portal, one can find several training data sets originated for training of the analytical skills. One of these sets is also Titanic Survival dataset. (you can download it HERE, but you have to be a registered Kaggle member). This attractive dataset compiles the data of passengers on the sadly known Titanic voyage with adding parameter if they survived the incident or perished.

Using Machine learning methods (which is dataset destined for training of) you can discover how location of your cabin within ship construction or how much you payed for ticket increased or decreased your chances of survival. One can also analyze if having more relatives on the ship was competitive advantage or disadvantage in fight for life. I will not rip you off the pleasure to find all these “sunk treasures”, but would like to indicate that some conclusions are pretty sinister (as the above mentioned are).

So, do not hesitate and try easy Machine learning predictions with this super interesting dataset. Feel free to share additional findings with THEmighyDATA community in discussion to this blog. On last note: “Near, Far, Wherever you are …”, may your Machine learning attempts to be as convincing as Kate Winslet with Leonardom di Capriom were on the railing 🙂