Machine learning – THE MIGHTY DATA

Data Executive’s Read 2023 | Book suggestions

Staying sharp in the data realm is like juggling flaming laptops – challenging and a tad risky. To keep my executive skills from going the way of the floppy disk, I’ve committed to tackling a whopping 10,000 pages of books annually. Like private brain gym, but with more words and fewer sweaty towels. (Not only) for executive, reading 300+ pages book is a large time investment, so you better pick a worthy one. Therefore, below I( offer list of this year’s best reads in 2023, curated to inspire, educate, and maybe even give you a chuckle. Think of below listed books an potential beacon in maze of staying tuned to data wizardry!

Blue Ocean SHIFT

Topic | Innovation, Strategy

If you ever went through some Strategic management training, this name might ring the bell with you. You also might roll your eyes, as Renée Mauborgne and W. Chan Kim published their first introduction to Blue Ocean in 2004, so whooping 20 years ago. But wait I am not that ignorant, there is more to this suggestion.

Blue Ocean strategy (BOS) is one of the major concepts in strategy how to differentiate your business from (blood thirsty, break-the-neck) competition. It is framework that enables you to innovate no matter how good/bad or unique your products or services are. If you have not read this book before, close the gap immediately. I used it several assignments of my career and the methodology always yielded interesting new business strategies.

However, even if you did read the original 2004’s Blue Strategy book, this one is different. Authors of the original concept bring additional insights how to not only design the differentiating strategy, but foremostly also how to implement it. They added and rewritten original scope of BOS based on learning from 20-years of implementing it in industries and public organizations. Hence the updated name reference to “SHIFT” in Title. I honestly think, this is a must read for any middle or top manager.

Link | https://www.amazon.com/Blue-Ocean-Shift-Competing-Confidence/dp/0316314048

AI 2041

Topic | Sci-Fi Fiction + AI commentary

Many authors and books try to explain the major shift in ArtificialIntelligence (or AI) in last days. Few writers also dare to predict or speculate about where it might takes us from here further.

️ However, the book from Kai-Fu Lee and @ChenQuifan is very special and different. Kai-Fu is formal Executive from Google, Apple and likes, responsible for implementing AI solutions. When he talks AI methods, he most likely headed implementation of the early pilots of that. Real well of AI knowledge and experience.

He teamed with Sci-Fi author to write unique piece narrated by dozen of stories (all happening around year 2041). In each story/chapter they first introduce the future use of AI in real life, only to finish the chapter with facts and details of how this will be implemented and what is the realistic stage of future AI to expect before 2041.

The book is somewhat thick, but absolutely worth and easy to read, as you can dig through it one story at a time. I think it is especially good gift for somebody who wants to understand the (future) of AI, but does not have technical background to read white papers.

Link | https://www.amazon.com/AI-2041-Ten-Visions-Future/dp/059323829X

Becoming a Data Head

Topic | Data-driven, Management, Data literacy

Decision to put this book on my reading list was stemming from the curiosity. The book reviews suggest that this book is good entry-book for executive trying to be data-driven or AI-ready. Being SVP Data & Analytics (and seasoned Data Scientist) myself, hardly the fit for my career phase. But I have seen so many books claim (and fail) to introduce you to Data Science bushes, that I was tempted on how this book will be doing? Yet another flat-falling promise?

No, quite the contrary! This book really walks its talk. Namely walks you as user through different stages of Data analytics and Data Science smoothly. Even the basic concepts are explained in no-nonsense style that does not require any previous knowledge from you, but also does not insult (your intelligence) neither gets you bored, if you are reading things already obvious to you. You can also decide how “far into the woods” do you want to dive and stop reading any time you think this is exactly the level of understanding that is enough for you. Or maybe you look even deeper to understand the principles of what you just read?

I strongly recommend this book for anybody trying to change career into data jobs. I find it also great present for any manager or executive if you want to enlighten them in data.

Link | https://www.amazon.com/Becoming-Data-Head-Understand-Statistics/dp/1119741742/

COLORWISE

Topic | Data Visualization, Storytelling

As somebody shaping (literally) thousands of visualization year after year, I welcome books describing the rules and good (and bad) practices for creating visualizations. I have few in my library (and suggested them in my previous reading lists), but they often talk more about what kind of graph to chose and how to shape the composition. Many of them take use of color for granted (or touch the issue only from the side).

The ColorWise is book giving “color choice” and “color coding” in graphs and visualization full spot-light. It explains the background of colors in very non-academic way and surely taking you beyond your previous knowledge about color usage. It also gives clear guidance on how to create your graph color schemes, if you are anchored with some of the brand (must-have) colors. What is more, it goes also deeper into psychology of different color schemes and warns you about cultural or color deficiency pitfalls of your graphs. If you are already pro, you will often nod your head with “Exactly!” on your lips … and you still learn few new aspects to think about. If you are “regular” color user, your color coding skills will take significant boost. I strongly recommend for anybody , who needs to produce dashboards or presentations regularly in their work.

Link | https://www.amazon.com/ColorWise-Storytellers-Guide-Intentional-Color/dp/1492097845

BUILD

Topic | Strategy, Data, Product management

Many admire TonyFadell for what he achieved. He built iPod for Apple and basically saved Apple from falling. And then humbly he built iPhone on top. And if that would not be enough for you, then he built the brand new company Nest that started the whole SmartHome category of technology and sold it to Google for few billions. So certainly inspiring person enough. But if you are not a tech geek, you probably did not hear his name before or care too much. Nor did I. And I regret so.

His book BUILD is interesting mixture of advice and guidance for people who want to have their life (and career) a bit more in their hands. He narrates the story from the adolescence through earlier years in job up to CEO-part of your life. And yes, maybe you will never (want to) be CEO, but the story is still a good guidance. It might sound fluffy, but whoever you are in business, I am quite sure you can take some benefit from some chapter of this book. Yes, occasionally you have to pardon him Tony’s American optics, but the smell of it is more like fragrance you know, but would not wear yourself, not a sensoric disgust.

‍ I especially admire a chapter on how data plays different role in building individual phases of the product. It gives you clear idea guidance on where data is horse and where it is (still needed but rather) cart. Going through 3 layers of management (Team Lead to SVP) myself, I can confirm that his views of how to perceive your role is very accurate and I was amazed how he can compress the essence into (often just) few pages of the text.

All in all, this book is Masterpiece (uh, I told you that already, right? ). And I strongly suggest you to read it. The earlier the better. Because some of the lessons he gives I had to learn hard way and I only wished he had written that book earlier. Have a great read!

Link | https://www.amazon.com/Build-Unorthodox-Guide-Making-Things/dp/B09CF2YB6Z/

All in on AI

Topic | AI, Growth, Strategy

I have read most of 15 books that @DavenportTom authored and mostly were happy about them. Therefore, when I saw his newest piece ALL IN ON AI, I was full in anticipation.

Author introduces group of businesses that decided to make artificialintelligence the center piece of their business strategy and operation. They really went ALL-IN on it. Book walks you first through how does such a AI-ALL-IN company looks like. What are common denominators, but also industry specific aspects. Quickly you understand how to spot the markers.

But that’s only start of it. In the remainder of the book Davenport (and his co-author) provide examples of how to your existing business into AI-ALL-IN state. They do it cleverly, picking real companies (‘ stories) from different maturity levels and industries. Authors also methodically link the needed AI-markers to the development in the stories, proving that common denominators are actually fitting and well chosen.

Who is this book for?
Well, for anybody who envisions or dreams about taking benefit of progressive technologies in their work. For those wanting to step-up or future-proof their business.
It’s also good gift idea for employees trying to pitch the AI change to top manager(s).

Link: https://www.amazon.com/All-AI-Companies-Artificial-Intelligence/dp/1647824699/

Good Data

Topic | Data, Ethics, Search data

Reading Sam Gilbert’s book Good Data is stimulating and entertaining at the same time (you just need to see through authors masked humor). Sam is seasoned data professional, who does not fall into cliche and mental short-cuts oof today’s data speak.

Not always had I agreed to his opinions, but all the questions he raised in the book made me really (re)think what I considered role of data to be in different corners of business and our society. Thus, if you ask “What questions should we have about future of data?” , this book will get you there.

Just for the answers to those questions, please, think a bit more critically than the author suggests. All in all, quick and fun to read, opening new horizons. Worth few days of reading.

Link | https://www.amazon.com/Good-Data-Optimists-Digital-Future/dp/1787396339

Don’t Make Me Think (Revisited)

Topic | UX, Product management, Web design

Web and App’s became our window of everyday activities, social interaction, shopping and most of of work (certainly so during COVID). In 1990’s and 2000’s institutions and businesses were trying to impress us by physical real estate. But how do us digital institutions treat now?

This book is for everyone, who wants to grasp the basics (yes, it is starting from ground) of how to design digital interface on web or app. Even though this might sound like UX designer guideline (which I was happy user if it was), it is really served in down to earth language and does not require from you any design domain knowledge. (but it leaves you with some after you read through).

It is not long read and I strongly encourage anybody interacting in our with Web and App’s (or have a say in their design) to at least skim through this. No regret move!

Link | https://www.amazon.com/Dont-Make-Think-Revisited-Usability/dp/0321965515

Extremely ONLINE

Topic | Creators, Social Media

At first glance, the subject of online influencers might not seem like a page-turner. However, a friend’s recommendation led me to Taylor’s exploration of the hidden layers behind social media’s evolution, and I was instantly captivated.

This book isn’t just a timeline of social media from the late 90s; it’s a narrative that weaves through the changing social dynamics influenced by online platforms. It provides an intriguing mix of statistical data and storytelling, revealing how various online communities engage with social media.

The book also offers surprising insights into questions like:

What was the first major topic that sparked the blogging revolution?
How did the requirement for influencers to disclose sponsorships impact the effectiveness of advertisements?
What truly contributes to societal polarization if not social media algorithms?
Which other social networks suffered at the hands of Twitter?

️| For those in marketing or content creation, this book is an essential read from start to finish. It’s equally crucial for parents or soon-to-be parents to understand the evolving relationship between kids and social media.

For me the book has a bit special twist, that is likely to work for you as well if you are in your late 30’s or 40’s. It maps the development of internet consumption for our generation, as when blogs hit the internet was exactly the time that our generation started to interact with it.

Link | https://www.amazon.com/Extremely-Online-Untold-Influence-Internet/dp/1982146869

Machine Learning Design Patterns

Topic | Machine Learning, Data Science

This book feels like the Swiss Army knife for machine learning enthusiasts. It’s the first of its kind as it dives into the wild world of ML design patterns. Forget about dry, technical jargon; this book is like a treasure map, guiding you through 30 quirky, yet ingenious design patterns, each one a secret weapon against those head-scratching ML problems. It’s like finding cheat codes for a video game, but for machine learning!

Imagine a cookbook, but instead of recipes for apple pie, it’s chock-full of solutions for when your AI project decides to go on a coffee break. Whether you’re a seasoned data scientist or just someone who accidentally wandered into the machine learning aisle, this book is your trusty sidekick. It’s the kind of read that makes you think, “Ah, so this is what Google’s brainiacs do for fun!” – solving problems and making ML as approachable as a friendly robot assistant.

Link | https://www.amazon.com/Machine-Learning-Design-Patterns-Preparation/dp/1098115783

CRUX

Topic | Strategy, Business Analysis

As someone with a background in Strategic Management, I’ve devoured nearly every strategy book available. Through my extensive reading, I’ve discovered two authors who consistently deliver valuable strategic insights: #GaryHammel and #RichardRumelt.

‍♂️ Therefore, to no surprise, Richard Rumelt’s #CRUX stands out as a masterpiece (again). It skillfully guides you in crafting authentic strategies for your business or team and shatters common executive misconceptions, like the necessity of a mission statement, misconstruing international expansion as strategy, or overvaluing shareholder interests. It’s also an excellent resource for learning to spearhead genuine strategic development.

I strongly recommend this book to all executives. Be prepared for a reflective and sometimes uncomfortable journey through your previous strategy endeavors. It’s equally insightful for middle managers, equipping them with the knowledge to challenge and refine the strategies proposed by their higher-ups. Overall, it’s a perfect read to gift yourself or others during a vacation.

Link | https://www.amazon.com/Crux-Richard-Rumelt/dp/1788169514

The Choice Factory

Topic | Marketing, Psychology, Feature engineering

The Choice Factory” by Richard Shotton is an exceptional read, especially recommended for data analysts focused on human behavior modeling and prediction, as well as marketers seeking to boost their marketing conversions via leverage (or taking tail-wind of) natural human tendencies.

What sets this book apart is its reliance on proven real-world best practices, presented not as isolated case studies, but as principles backed by comprehensive research. Another key strength of the book also lies in its concise, easily digestible chapters, each ending with practical, actionable advice on how to implement these insights.

I strongly endorse this book for anyone looking to gain a deeper understanding of human behavior in feature engineering for ML prediction models or for marketing optimization context.

Link | https://www.amazon.com/Choice-Factory-behavioural-biases-influence/dp/085719609X

The Ruthless elimination of Hurry

Topic | Work-Life balance, Mental health

The Ruthless Elimination of Hurry,” as the title aptly indicates, is more than just a book; it’s a compelling manifesto advocating for a deliberate shift away from the relentless pursuit of speed for its own sake.

In our fast-paced world, where speed is often synonymous with efficiency and success, this book presents a refreshing perspective. It acknowledges that while speed can be beneficial (except when it leads to a speeding ticket!), it shouldn’t be the primary objective. Speed should be a tool, employed judiciously and only when truly necessary. The book emphasizes the importance of intentionality in our actions, encouraging us not to rush mindlessly but to consider the purpose and value of our speed.

Authored by John M. Comer, a U.S. pastor, the book is understandably infused with religious references and teachings, particularly focusing on Jesus and other Christian elements. For some readers, this religious aspect might seem predominant, but the book’s core message transcends religious boundaries. If one can look past the religious overtones, or perhaps even draw insight from them, “The Ruthless Elimination of Hurry” reveals itself as a deeply thought-provoking and intriguing read.

It’s a book that challenges the status quo of our hurried lives. It invites readers to pause, reflect, and reconsider the pace at which we live. The author’s insights offer a unique perspective on how slowing down can lead to a more fulfilled, purpose-driven life. This makes the book an essential read for anyone feeling overwhelmed by the ceaseless rush of modern life and seeking a path to a more balanced, intentional existence.

Link | https://www.amazon.com/Ruthless-Elimination-Hurry-Emotionally-Spiritually/dp/0525653090

Data Science on AWS

Topic | ML operations, Data Science, Data engineering

Ah, the wild ride of prototyping machine learning models! Many of us have gone through fast prototyping (or toy examples) of the Machine learning clustering or prediction models in notebooks or sand-box environments. It’s like building a Lego castle in your living room – fun, easy, and oh-so-satisfying. But then, you decide to move that castle to the real world, and suddenly, it’s like trying to assemble it in a windstorm. Surprise! Porting your perfect little prototype into the jungle of a live environment is like herding cats while juggling.

Most of today’s implementations are left with no choice but to run in cloud, virtual machines set-up. Requiring additional complexity and care to even deliver the bleak functionalities of the easy, local machine PoC. This book is about how to think of Machine Learning aspects of live solution in advance. To understand what combo of the tools one should expected to be deployed, to run your machine learning train properly on rails. It is must-read text not because you will be ever coding the things and connectors mentioned in material. It is essential rather because you need to understand what everything your teams have to go through to make it all happen for you.

Link | https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391

Text As Data

Topics: NLP, Machine Learning

As the title of the book rightly suggests, text has been for long perceived as special “animal”. On the edge of the data analytics, much more obscure than analysis of the relational data by SQL or by Predictive analytics. Text analytics was also managed by dedicated (python) packages and often by NLP-specializing-only staff. If you were not one, you would probably just reach out for (simplified) predefined functions in NLTK (or similar code library).
Those times are over. Text is mainstream. If you were not convinced before ChatGPT burst, now there is no way to disprove it. But Text analytics still finds the audience (and practitioners) left in pre-text era, only having rough idea how to address data that is stored in troves of text.

Therefore, This book comes as a kind of gift. If you admit to be one of those having general (read limited) only understanding of insight extraction from text and how to set-up the text analytics in your team, if you have not been treating text equally heavy as ML or Reinforcement learning, this book helps you to close that gap. It’s well written and always illustrated on telling examples. If you missed to buy the ticket for departing text analytics “train”, this is your fast track to get on it.

Link | https://www.amazon.com/Text-Data-Framework-Learning-Sciences/dp/0691207550

The Coming Wave

Topic | AI, Philosophy

Hold onto your hats, folks! Mustafa Suleyman’s “The Coming Wave” isn’t just a book; it’s like a roller coaster ride into the future, where your coffee maker might be plotting world domination. Suleiman, the AI whiz-kid and DeepMind co-founder, is dishing out a buffet of mind-boggling predictions. Imagine a world where your vacuum cleaner is judging your music taste and your fridge is gossiping about your late-night snack habits. That’s the kind of AI party Suleiman’s inviting us to.

But wait, there’s a catch. It’s not all about tech wizardry and gadgets having a mind of their own. Suleiman waves a big, bright warning flag about AI’s dark side. Picture a world where AI is like that one overachieving cousin who’s great at everything but sometimes scares the living daylights out of you. He’s like the cool uncle of the tech world, telling us to enjoy the party but maybe hide the fine china just in case.

So, whether you’re a tech-head, a skeptic, or just someone who’s curious if your phone is silently laughing at your TikTok attempts, “The Coming Wave” is your handbook for the AI age. It’s like a survival guide for the digital jungle, complete with a map, a flashlight, and a slightly ominous warning about the creatures lurking in the shadows. Buckle up and get ready for a wild ride into the future, where your toaster might just be the smartest thing in your house!

Link | https://www.amazon.com/The-Coming-Wave/dp/1847927491

Julia High Performance

Topic | Data engineering, Data Science

No, this is not a mesh of the Shakespeare’s famous love novel and Performance marketing guide. Julia might still be the new kid on the block in the programming world, especially compared to Python, the reigning “lingua franca” of data science. But don’t be fooled – this emerging language packs a punch with its speed and efficiency. “Julia High Performance” by Avik Sengupta and Alan Edelman is like the ultimate guidebook for this speedster of a language.

Think of this book as your go-to manual for making your code run like a sprinter on a caffeine high. It’s like a masterclass in getting the most out of Julia, from understanding its high-speed capabilities to avoiding performance roadblocks. While some readers might wish for a deeper dive into the more intricate examples, the book remains an eye-opener, proving its worth by empowering users to supercharge their projects, leaving Python in the dust. Some users even boasted a tenfold performance boost after switching from Python/NumPy to Julia – think about leaving the comfort zone and head towards a coding glow-up!

This book, admittedly, is a bit of the Joker card, but if you did not pick anything above and you are reasonably fluent in Python coding, maybe give it a try.

Link | https://www.amazon.com/Julia-High-Performance-Avik-Sengupta/dp/178829811X

Späť na domovskú stránku

Did ChatGPT pass Data Science technical interview?

On last day of November 2022, bit in the shadow of the Cyber week craze, there has been released by OpenAI team for free testing the new ChatGPT. It is aimed to be an chat-bot using strong GPT 3.5 natural language model, capable of not only casual conversation but also able to answer real (even tough) expert questions, or write creatively texts, poetry or even whole stories.

As the features (and performance) of the model are pretty awesome step-up to what we have seen so-far, its launch immediately rolled the snowball of testing it in plethora of the domains. The craze seems to be actually so intense, that it is believed to be the first digital tool/service to reach 1 million of new users within 5 days of its official release. (To be fair, I think it is the first recorded one only, I am quite sure that in countries like India or China it is not unheard of gaining 1 mil users fast for something really catchy 😊)

But back to core story. The ChatGPT use-case, that was bringing the most havoc on LinkedIn and many blogs and news portals, is fact that can produce real snippets of code based on very simple specification of what the code should do. You can go really like “Show me code to predict survival rate on Titanic” and it returns in snap the Python code to fetch the data, create predictive model and run it, all in gleaming, well commented Python coding language. Or so it looks.

In effort to create my own opinion, I tried (and collected others’) attempts on coding inquiries to investigate the real quality of the code. I made a short summary of this early investigation in this this LinkedIn post. Tl;DR = it was not flaw-less code; if you try to run it, you will still often stumble upon errors, BUT … For somebody not having a clue how to attack the problem, it might be more than an inspiration.

Few days later, my dear friend (and former colleague) Nikhil Kumar Jha came with the idea to ask the ChatGPT one of the technical interview questions he remembered from the time I was hiring him into my team. He passed me the question and answer in message. And I have to say, the answer was pretty solid. That made my mind twisting. So, we quickly agreed to take the whole battery of the test that I use for technical interview for Data scientist and submit the ChatGPT “candidate” through the whole interview hassle. Rest of this blog tries to summarize how did the robot do and what are the implications of that. But before we get there: What do you think: Has the ChatGPT passed the technical round to be hired?

Technical interview to pass

Before jumping into (obviously most) juicy answer to question at the end of previous paragraph, let me give you a bit of the context about my interview as such. The market of the Data Scientist and Machine Learning engineers is full of “aspirational Data Scientists” ( = euphemism for pretenders). They rely on the fact that it is difficult to technically screen the candidate into details. Also the creativity of the hiring managers to design very own interview questions is relatively low, so if you keep on going to interview after interview, over several tens of rounds you can be lucky to brute force some o them (simply by piggybacking on the answers from failed past interviews).

To fight this, I have several sets of uniquely designed questions, that I rotate through (and secret follow-up questions ready for those answering the basic questions surprisingly fast). In general, the technical round needs to separate for me the average from great and yet genius from great. Thus, it is pretty challenging in its entirety. Candidate can earn 0 -100 points and the highest score I had in my history was 96 points. (And that only happened once; single digit number of candidates getting over 90 points from more than 300 people subjected to it). The average lady or gentleman would end up in 40 – 50 points range, the weak ones don’t make it through 35 points mark even. I don’t have a hard cut-off point, but as a rule of a thumb, I don’t hire candidate below 70 points. (And I hope to get to 85+ mark with candidates to be given offer). So now is the time to big revelation…

Did the ChatGPT get hired?

Let me unbox the most interesting piece here first and then support it with a bit of the details. So, dear real human candidates, the ChatGPT did not get hired. BUT it scored 61 points. Therefore, if OpenAI keeps on improving it version by version, it might get over the minimal threshold (soon). Even in tested November 2022 version, it would beat majority of the candidates applying for Senior data science position. Yes, you read right, it would beat them!

That is pretty eye-opening and just confirms what I have been trying to suggest for 2-3 years back already: The junior coding (and Data Science) positions are really endangered. The level of the coding skills needed for entry positions are, indeed, already within the realm of Generative AI (like ChatGPT is). So, if you plan to enter the Data Science or Software engineering career, you better aim for higher sophistication. The lower level chairs might not be for the humans any more (in next years to come).

What did robot get right and what stood the test?

Besides the (somewhat shallow) concern on passing the interview as such, more interesting for me was: On what kind of questions it can and cannot provide correct answers? In general, the bot was doing fine in broader technical questions (e.g. asking about different methods, picking among alternative algorithms or data transformation questions).

It was also doing more than fine in actual coding questions, certainly to the point that I would be willing to close one-eye on technical proficiency. Because also in real life interviews, it is not about being nitty-gritty with syntax, as long as the candidate provides right methods, sound coding patterns and gears them together. The bot was also good at answering straight forward expert question on “How to” and “Why so” for particular areas of Data Science or Engineering.

Where does the robot still fall short?

One of the surprising shortcomings was for example when prompted on how to solve the missing data problem in the data set. It provided the usual identification of it (like “n/a’, NULL, …), but it failed to answer what shall be done about it, how to replace the missing values. It also failed to answer some detailed questions (like difference between clustered and non-clustered index in SQL), funny enough it returned the same definition for both, even though prompted explicitly for their difference.

Second interesting failure was trying to swerve the discussion on most recent breakthroughs in Data Science areas. ChatGPT was just beating around the bush, not really revealing anything sensible (or citing trends from decade ago). I later realized that these GPT models still take months to train and validate, so the training data of GPT is seemingly limited to 2021 state-of affairs. (You can try to ask it why Her Majesty Queen died this year or what Nobel prize was awarded for in 2022 in Physics 😉 ).

To calm the enthusiasts, the ChatGPT also (deservedly and soothingly) failed in more complicated questions that need abstract thinking. In one of my interview questions, you need to collect the hints given in text to frame certain understanding and then use this to pivot into another level of aggregation within that domain. Hence to succeed, you need to grasp the essence of the question and then re-use the answer for second thought again. Here the robot obviously got only to the level 1 and failed to answer the second part of the question. But to be honest, that is exactly what most of the weak human candidates do when failing on this question. Thus, in a sense it is indeed at par with less skilled humans again.

How good was ChatGPT in the coding, really?

I specifically was interested in the coding questions, which form the core of technical screening for Data Science role. The tasks that candidate has to go through in our interview is mix of “show me how would you do” and “specific challenge/exercise to complete”. It also tests both usual numerical Data Science tasks as well as more NLP-ish exercises.

The bot was doing really great on “show me how would you do …” questions. It produced code that (based on descriptors) scores often close to full point score. However, it was struggling quite on specific tasks. In other words, it can do “theoretical principles”, it fails to cater for specific cases. But again, were failing, the solutions ChatGPT produced were the usual wrong solutions that the weak candidates come with. Interestingly, it was never a gibberish, pointless nonsense. It was code really running and doing something (even well commented for), just failing to do the task. Why am I saying so? The scary part about it that in all aspects the answers ChatGPT was providing, even when it was providing wrong one, were looking humanly wrong answers. If there was a Turing test for passing the interview, it would not give me suspicion that non-human is going through this interview. Yes, maybe sometimes just weaker candidate (as happens in real life so often as well), but perfectly credible human interview effort.

Conclusions of this experiment

As already mentioned, the first concern is that ChatGPT can already do as good as an average candidate on interview for Senior Data Scientist (and thus would be able to pass many Junior Data Scientist interviews fully). Thus, if you are in the industry of Data analysis (or you even plan to enter it), this experiment suggests that you better climb to the upper lads of the sophistication. As the low-level coding will be flooded by GPT-like tools soon. You can choose to ignore this omen on your own peril.

For me personally, there is also second conclusion from this experiment, namely pointing out which areas of our interview set need to be rebuilt. Because the performance of the ChatGPT in coding exercises (in version from November 2022) was well correlated with performance of human (even if less skilled) candidates. Therefore, areas in which robot could ace the interview question cleanly, signal that they are probably well described somewhere “out in wild internet” (as it had to be trained on something similar). I am not worried that candidate would be able to GPT it (yes we might replace “google it” with “GPT it” soon) live in interview. But the mere fact that GPT had enough training material to learn the answer flawlessly signals, that one can study that type of questions well in advance. And that’s the enough of concern to revisit tasks.

Hence, I went back to redrafting the interview test battery. And, of course, I will use “ChatGPT candidate” as guineapig of new version when completed. So that our interview test can stand its ground even in era of Generative AI getting mighty. Stay tuned, I might share more on the development here.

Older articles on AI topic:

AI tries to capture YET another human sense

Want to learn AI? Break shopping-window in Finland

REMOTE LEARNING now on AI steroids

5+1 interesting AI videos

Späť na domovskú stránku

MACHINE LEARNING goes physical [2021 trends]

We all got used to (not necessarily appreciative of) machine learning recommending products, sorting our social media feeds or matching us with potential dating partners. However, the common factor of all this AI use-cases has been that it tries to influence our lives through software interface. Experts believe 2021 will change this, with machine learning entering also our physical lives. Not convinced? Read on.

Most of the autonomous machines have been developed for close environments of the factories, warehouses or other sites that are insulated from real human humming. The reason why machines were not allowed “out among humans” is that constrained environments set the sensing and intelligence need bar just low enough for machines to actually exceed it. However, recent development in reinforcement learning, NLP, computer vision and auto-ML areas have opened room for constructing intelligent machines that manage to operate also in open society.

The main point of change is that above mentioned cocktail of AI approaches gives machines chance to learn beyond their original scope of responsibilities (may the need be and opportunity be granted by its creators). As result, we will meet robots moving along to fulfill their duties. The interesting part of this change is that it all will happen without the consent of individuals passing by. (Can you object to street sweeping machine, it if does not hurt anybody and does not destroy anything of the value?) One should also admit that highest goal of first ML moving objects “out in wild”, most likely, will be to avoid humans first place. So, machine learning will not only get physical, but also (somewhat) ignorant.

Späť na domovskú stránku