Uncategorized – THE MIGHTY DATA

Data Executive’s Read 2023 | Book suggestions

Staying sharp in the data realm is like juggling flaming laptops – challenging and a tad risky. To keep my executive skills from going the way of the floppy disk, I’ve committed to tackling a whopping 10,000 pages of books annually. Like private brain gym, but with more words and fewer sweaty towels. (Not only) for executive, reading 300+ pages book is a large time investment, so you better pick a worthy one. Therefore, below I( offer list of this year’s best reads in 2023, curated to inspire, educate, and maybe even give you a chuckle. Think of below listed books an potential beacon in maze of staying tuned to data wizardry!

Blue Ocean SHIFT

Topic | Innovation, Strategy

If you ever went through some Strategic management training, this name might ring the bell with you. You also might roll your eyes, as Renée Mauborgne and W. Chan Kim published their first introduction to Blue Ocean in 2004, so whooping 20 years ago. But wait I am not that ignorant, there is more to this suggestion.

Blue Ocean strategy (BOS) is one of the major concepts in strategy how to differentiate your business from (blood thirsty, break-the-neck) competition. It is framework that enables you to innovate no matter how good/bad or unique your products or services are. If you have not read this book before, close the gap immediately. I used it several assignments of my career and the methodology always yielded interesting new business strategies.

However, even if you did read the original 2004’s Blue Strategy book, this one is different. Authors of the original concept bring additional insights how to not only design the differentiating strategy, but foremostly also how to implement it. They added and rewritten original scope of BOS based on learning from 20-years of implementing it in industries and public organizations. Hence the updated name reference to “SHIFT” in Title. I honestly think, this is a must read for any middle or top manager.

Link | https://www.amazon.com/Blue-Ocean-Shift-Competing-Confidence/dp/0316314048

AI 2041

Topic | Sci-Fi Fiction + AI commentary

Many authors and books try to explain the major shift in ArtificialIntelligence (or AI) in last days. Few writers also dare to predict or speculate about where it might takes us from here further.

️ However, the book from Kai-Fu Lee and @ChenQuifan is very special and different. Kai-Fu is formal Executive from Google, Apple and likes, responsible for implementing AI solutions. When he talks AI methods, he most likely headed implementation of the early pilots of that. Real well of AI knowledge and experience.

He teamed with Sci-Fi author to write unique piece narrated by dozen of stories (all happening around year 2041). In each story/chapter they first introduce the future use of AI in real life, only to finish the chapter with facts and details of how this will be implemented and what is the realistic stage of future AI to expect before 2041.

The book is somewhat thick, but absolutely worth and easy to read, as you can dig through it one story at a time. I think it is especially good gift for somebody who wants to understand the (future) of AI, but does not have technical background to read white papers.

Link | https://www.amazon.com/AI-2041-Ten-Visions-Future/dp/059323829X

Becoming a Data Head

Topic | Data-driven, Management, Data literacy

Decision to put this book on my reading list was stemming from the curiosity. The book reviews suggest that this book is good entry-book for executive trying to be data-driven or AI-ready. Being SVP Data & Analytics (and seasoned Data Scientist) myself, hardly the fit for my career phase. But I have seen so many books claim (and fail) to introduce you to Data Science bushes, that I was tempted on how this book will be doing? Yet another flat-falling promise?

No, quite the contrary! This book really walks its talk. Namely walks you as user through different stages of Data analytics and Data Science smoothly. Even the basic concepts are explained in no-nonsense style that does not require any previous knowledge from you, but also does not insult (your intelligence) neither gets you bored, if you are reading things already obvious to you. You can also decide how “far into the woods” do you want to dive and stop reading any time you think this is exactly the level of understanding that is enough for you. Or maybe you look even deeper to understand the principles of what you just read?

I strongly recommend this book for anybody trying to change career into data jobs. I find it also great present for any manager or executive if you want to enlighten them in data.

Link | https://www.amazon.com/Becoming-Data-Head-Understand-Statistics/dp/1119741742/

COLORWISE

Topic | Data Visualization, Storytelling

As somebody shaping (literally) thousands of visualization year after year, I welcome books describing the rules and good (and bad) practices for creating visualizations. I have few in my library (and suggested them in my previous reading lists), but they often talk more about what kind of graph to chose and how to shape the composition. Many of them take use of color for granted (or touch the issue only from the side).

The ColorWise is book giving “color choice” and “color coding” in graphs and visualization full spot-light. It explains the background of colors in very non-academic way and surely taking you beyond your previous knowledge about color usage. It also gives clear guidance on how to create your graph color schemes, if you are anchored with some of the brand (must-have) colors. What is more, it goes also deeper into psychology of different color schemes and warns you about cultural or color deficiency pitfalls of your graphs. If you are already pro, you will often nod your head with “Exactly!” on your lips … and you still learn few new aspects to think about. If you are “regular” color user, your color coding skills will take significant boost. I strongly recommend for anybody , who needs to produce dashboards or presentations regularly in their work.

Link | https://www.amazon.com/ColorWise-Storytellers-Guide-Intentional-Color/dp/1492097845

BUILD

Topic | Strategy, Data, Product management

Many admire TonyFadell for what he achieved. He built iPod for Apple and basically saved Apple from falling. And then humbly he built iPhone on top. And if that would not be enough for you, then he built the brand new company Nest that started the whole SmartHome category of technology and sold it to Google for few billions. So certainly inspiring person enough. But if you are not a tech geek, you probably did not hear his name before or care too much. Nor did I. And I regret so.

‍ I especially admire a chapter on how data plays different role in building individual phases of the product. It gives you clear idea guidance on where data is horse and where it is (still needed but rather) cart. Going through 3 layers of management (Team Lead to SVP) myself, I can confirm that his views of how to perceive your role is very accurate and I was amazed how he can compress the essence into (often just) few pages of the text.

All in all, this book is Masterpiece (uh, I told you that already, right? ). And I strongly suggest you to read it. The earlier the better. Because some of the lessons he gives I had to learn hard way and I only wished he had written that book earlier. Have a great read!

Link | https://www.amazon.com/Build-Unorthodox-Guide-Making-Things/dp/B09CF2YB6Z/

All in on AI

Topic | AI, Growth, Strategy

I have read most of 15 books that @DavenportTom authored and mostly were happy about them. Therefore, when I saw his newest piece ALL IN ON AI, I was full in anticipation.

Author introduces group of businesses that decided to make artificialintelligence the center piece of their business strategy and operation. They really went ALL-IN on it. Book walks you first through how does such a AI-ALL-IN company looks like. What are common denominators, but also industry specific aspects. Quickly you understand how to spot the markers.

But that’s only start of it. In the remainder of the book Davenport (and his co-author) provide examples of how to your existing business into AI-ALL-IN state. They do it cleverly, picking real companies (‘ stories) from different maturity levels and industries. Authors also methodically link the needed AI-markers to the development in the stories, proving that common denominators are actually fitting and well chosen.

Who is this book for?
Well, for anybody who envisions or dreams about taking benefit of progressive technologies in their work. For those wanting to step-up or future-proof their business.
It’s also good gift idea for employees trying to pitch the AI change to top manager(s).

Link: https://www.amazon.com/All-AI-Companies-Artificial-Intelligence/dp/1647824699/

Good Data

Topic | Data, Ethics, Search data

Reading Sam Gilbert’s book Good Data is stimulating and entertaining at the same time (you just need to see through authors masked humor). Sam is seasoned data professional, who does not fall into cliche and mental short-cuts oof today’s data speak.

Not always had I agreed to his opinions, but all the questions he raised in the book made me really (re)think what I considered role of data to be in different corners of business and our society. Thus, if you ask “What questions should we have about future of data?” , this book will get you there.

Just for the answers to those questions, please, think a bit more critically than the author suggests. All in all, quick and fun to read, opening new horizons. Worth few days of reading.

Link | https://www.amazon.com/Good-Data-Optimists-Digital-Future/dp/1787396339

Don’t Make Me Think (Revisited)

Topic | UX, Product management, Web design

Web and App’s became our window of everyday activities, social interaction, shopping and most of of work (certainly so during COVID). In 1990’s and 2000’s institutions and businesses were trying to impress us by physical real estate. But how do us digital institutions treat now?

This book is for everyone, who wants to grasp the basics (yes, it is starting from ground) of how to design digital interface on web or app. Even though this might sound like UX designer guideline (which I was happy user if it was), it is really served in down to earth language and does not require from you any design domain knowledge. (but it leaves you with some after you read through).

It is not long read and I strongly encourage anybody interacting in our with Web and App’s (or have a say in their design) to at least skim through this. No regret move!

Link | https://www.amazon.com/Dont-Make-Think-Revisited-Usability/dp/0321965515

Extremely ONLINE

Topic | Creators, Social Media

At first glance, the subject of online influencers might not seem like a page-turner. However, a friend’s recommendation led me to Taylor’s exploration of the hidden layers behind social media’s evolution, and I was instantly captivated.

This book isn’t just a timeline of social media from the late 90s; it’s a narrative that weaves through the changing social dynamics influenced by online platforms. It provides an intriguing mix of statistical data and storytelling, revealing how various online communities engage with social media.

The book also offers surprising insights into questions like:

What was the first major topic that sparked the blogging revolution?
How did the requirement for influencers to disclose sponsorships impact the effectiveness of advertisements?
What truly contributes to societal polarization if not social media algorithms?
Which other social networks suffered at the hands of Twitter?

️| For those in marketing or content creation, this book is an essential read from start to finish. It’s equally crucial for parents or soon-to-be parents to understand the evolving relationship between kids and social media.

For me the book has a bit special twist, that is likely to work for you as well if you are in your late 30’s or 40’s. It maps the development of internet consumption for our generation, as when blogs hit the internet was exactly the time that our generation started to interact with it.

Link | https://www.amazon.com/Extremely-Online-Untold-Influence-Internet/dp/1982146869

Machine Learning Design Patterns

Topic | Machine Learning, Data Science

This book feels like the Swiss Army knife for machine learning enthusiasts. It’s the first of its kind as it dives into the wild world of ML design patterns. Forget about dry, technical jargon; this book is like a treasure map, guiding you through 30 quirky, yet ingenious design patterns, each one a secret weapon against those head-scratching ML problems. It’s like finding cheat codes for a video game, but for machine learning!

Imagine a cookbook, but instead of recipes for apple pie, it’s chock-full of solutions for when your AI project decides to go on a coffee break. Whether you’re a seasoned data scientist or just someone who accidentally wandered into the machine learning aisle, this book is your trusty sidekick. It’s the kind of read that makes you think, “Ah, so this is what Google’s brainiacs do for fun!” – solving problems and making ML as approachable as a friendly robot assistant.

Link | https://www.amazon.com/Machine-Learning-Design-Patterns-Preparation/dp/1098115783

CRUX

Topic | Strategy, Business Analysis

As someone with a background in Strategic Management, I’ve devoured nearly every strategy book available. Through my extensive reading, I’ve discovered two authors who consistently deliver valuable strategic insights: #GaryHammel and #RichardRumelt.

‍♂️ Therefore, to no surprise, Richard Rumelt’s #CRUX stands out as a masterpiece (again). It skillfully guides you in crafting authentic strategies for your business or team and shatters common executive misconceptions, like the necessity of a mission statement, misconstruing international expansion as strategy, or overvaluing shareholder interests. It’s also an excellent resource for learning to spearhead genuine strategic development.

I strongly recommend this book to all executives. Be prepared for a reflective and sometimes uncomfortable journey through your previous strategy endeavors. It’s equally insightful for middle managers, equipping them with the knowledge to challenge and refine the strategies proposed by their higher-ups. Overall, it’s a perfect read to gift yourself or others during a vacation.

Link | https://www.amazon.com/Crux-Richard-Rumelt/dp/1788169514

The Choice Factory

Topic | Marketing, Psychology, Feature engineering

The Choice Factory” by Richard Shotton is an exceptional read, especially recommended for data analysts focused on human behavior modeling and prediction, as well as marketers seeking to boost their marketing conversions via leverage (or taking tail-wind of) natural human tendencies.

What sets this book apart is its reliance on proven real-world best practices, presented not as isolated case studies, but as principles backed by comprehensive research. Another key strength of the book also lies in its concise, easily digestible chapters, each ending with practical, actionable advice on how to implement these insights.

I strongly endorse this book for anyone looking to gain a deeper understanding of human behavior in feature engineering for ML prediction models or for marketing optimization context.

Link | https://www.amazon.com/Choice-Factory-behavioural-biases-influence/dp/085719609X

The Ruthless elimination of Hurry

Topic | Work-Life balance, Mental health

The Ruthless Elimination of Hurry,” as the title aptly indicates, is more than just a book; it’s a compelling manifesto advocating for a deliberate shift away from the relentless pursuit of speed for its own sake.

In our fast-paced world, where speed is often synonymous with efficiency and success, this book presents a refreshing perspective. It acknowledges that while speed can be beneficial (except when it leads to a speeding ticket!), it shouldn’t be the primary objective. Speed should be a tool, employed judiciously and only when truly necessary. The book emphasizes the importance of intentionality in our actions, encouraging us not to rush mindlessly but to consider the purpose and value of our speed.

Authored by John M. Comer, a U.S. pastor, the book is understandably infused with religious references and teachings, particularly focusing on Jesus and other Christian elements. For some readers, this religious aspect might seem predominant, but the book’s core message transcends religious boundaries. If one can look past the religious overtones, or perhaps even draw insight from them, “The Ruthless Elimination of Hurry” reveals itself as a deeply thought-provoking and intriguing read.

It’s a book that challenges the status quo of our hurried lives. It invites readers to pause, reflect, and reconsider the pace at which we live. The author’s insights offer a unique perspective on how slowing down can lead to a more fulfilled, purpose-driven life. This makes the book an essential read for anyone feeling overwhelmed by the ceaseless rush of modern life and seeking a path to a more balanced, intentional existence.

Link | https://www.amazon.com/Ruthless-Elimination-Hurry-Emotionally-Spiritually/dp/0525653090

Data Science on AWS

Topic | ML operations, Data Science, Data engineering

Ah, the wild ride of prototyping machine learning models! Many of us have gone through fast prototyping (or toy examples) of the Machine learning clustering or prediction models in notebooks or sand-box environments. It’s like building a Lego castle in your living room – fun, easy, and oh-so-satisfying. But then, you decide to move that castle to the real world, and suddenly, it’s like trying to assemble it in a windstorm. Surprise! Porting your perfect little prototype into the jungle of a live environment is like herding cats while juggling.

Most of today’s implementations are left with no choice but to run in cloud, virtual machines set-up. Requiring additional complexity and care to even deliver the bleak functionalities of the easy, local machine PoC. This book is about how to think of Machine Learning aspects of live solution in advance. To understand what combo of the tools one should expected to be deployed, to run your machine learning train properly on rails. It is must-read text not because you will be ever coding the things and connectors mentioned in material. It is essential rather because you need to understand what everything your teams have to go through to make it all happen for you.

Link | https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391

Text As Data

Topics: NLP, Machine Learning

As the title of the book rightly suggests, text has been for long perceived as special “animal”. On the edge of the data analytics, much more obscure than analysis of the relational data by SQL or by Predictive analytics. Text analytics was also managed by dedicated (python) packages and often by NLP-specializing-only staff. If you were not one, you would probably just reach out for (simplified) predefined functions in NLTK (or similar code library).
Those times are over. Text is mainstream. If you were not convinced before ChatGPT burst, now there is no way to disprove it. But Text analytics still finds the audience (and practitioners) left in pre-text era, only having rough idea how to address data that is stored in troves of text.

Therefore, This book comes as a kind of gift. If you admit to be one of those having general (read limited) only understanding of insight extraction from text and how to set-up the text analytics in your team, if you have not been treating text equally heavy as ML or Reinforcement learning, this book helps you to close that gap. It’s well written and always illustrated on telling examples. If you missed to buy the ticket for departing text analytics “train”, this is your fast track to get on it.

Link | https://www.amazon.com/Text-Data-Framework-Learning-Sciences/dp/0691207550

The Coming Wave

Topic | AI, Philosophy

Hold onto your hats, folks! Mustafa Suleyman’s “The Coming Wave” isn’t just a book; it’s like a roller coaster ride into the future, where your coffee maker might be plotting world domination. Suleiman, the AI whiz-kid and DeepMind co-founder, is dishing out a buffet of mind-boggling predictions. Imagine a world where your vacuum cleaner is judging your music taste and your fridge is gossiping about your late-night snack habits. That’s the kind of AI party Suleiman’s inviting us to.

But wait, there’s a catch. It’s not all about tech wizardry and gadgets having a mind of their own. Suleiman waves a big, bright warning flag about AI’s dark side. Picture a world where AI is like that one overachieving cousin who’s great at everything but sometimes scares the living daylights out of you. He’s like the cool uncle of the tech world, telling us to enjoy the party but maybe hide the fine china just in case.

So, whether you’re a tech-head, a skeptic, or just someone who’s curious if your phone is silently laughing at your TikTok attempts, “The Coming Wave” is your handbook for the AI age. It’s like a survival guide for the digital jungle, complete with a map, a flashlight, and a slightly ominous warning about the creatures lurking in the shadows. Buckle up and get ready for a wild ride into the future, where your toaster might just be the smartest thing in your house!

Link | https://www.amazon.com/The-Coming-Wave/dp/1847927491

Julia High Performance

Topic | Data engineering, Data Science

No, this is not a mesh of the Shakespeare’s famous love novel and Performance marketing guide. Julia might still be the new kid on the block in the programming world, especially compared to Python, the reigning “lingua franca” of data science. But don’t be fooled – this emerging language packs a punch with its speed and efficiency. “Julia High Performance” by Avik Sengupta and Alan Edelman is like the ultimate guidebook for this speedster of a language.

Think of this book as your go-to manual for making your code run like a sprinter on a caffeine high. It’s like a masterclass in getting the most out of Julia, from understanding its high-speed capabilities to avoiding performance roadblocks. While some readers might wish for a deeper dive into the more intricate examples, the book remains an eye-opener, proving its worth by empowering users to supercharge their projects, leaving Python in the dust. Some users even boasted a tenfold performance boost after switching from Python/NumPy to Julia – think about leaving the comfort zone and head towards a coding glow-up!

This book, admittedly, is a bit of the Joker card, but if you did not pick anything above and you are reasonably fluent in Python coding, maybe give it a try.

Link | https://www.amazon.com/Julia-High-Performance-Avik-Sengupta/dp/178829811X

Späť na domovskú stránku

DATA JOBS MARKET GERMANY | 2023-06 update

Data job market continuously shrinking | Even Data Engineering in drop, though least of all Data jobs| Stuttgart overtaking Cologne in most of the Data job categories | Pricing analyst (as separate category) almost evaporated | Smaller German cities still in hunt for Data analysts

Every month I try to bring update to the German labor market in area of Data professions. Feel free to use this overview for you own orientation or for scanning market opportunities (which ever side of the job interview table you plan to sit on 😊 ) This report is by no means intended to replace official job market stats, so please note that it is commenting development in monthly batches and there can be other sources that describe the job market dynamics in more granular form.

Data Engineers – drop into decline as well

Though Data engineers was the “last fort standing” in German data jobs market, in June 2023 it falls into 1.1% decline as well. This is due to different dynamics in Berlin (where demand dropped -19% MoM) and Munich that is still hungry for new Data Engineers (number of open roles wen up by +12% MoM). Interesting change is also happening in west and west-south Germany, where demand in Cologne dropped so drastically (-20%) that it fell even behind Stuttgart (growing by +7%) for the first time in measurement history.

When it comes to demand for different seniority levels, the vast majority of open positions remain without any seniority indication (and go with just generic Data engineer). Among those explicitly looking for Senior data engineers, the demand increased by ~170 positions, taking the share of open Senior positions to 17.6% from all vacancies. ON the other extreme of the spectrum, there is about same number of open Junior Data engineers job ads, accounting for 6.2% of all open data engineering vacancies.

Demand by #worklocation:

#BERLIN 16.2% [1 792 open positions]

#MUNICH 12.4% [1 372]

#HAMBURG 6.3% [ 697]

#FRANKFURT 5.6% [ 620]

#COLOGNE 3.4% [ 376]

#STUTTGART 4.2% [ 465]

Data engineering jobs by the #SENIORITY:

#Junior 6.2% [ 686]

#Midlevel (or unspecified) 76.2% [8 431]

#Senior 17.6% [1 947]

Data Scientists – Falling through a hole

Also jobs in area of Data Science have been are slowly (by -6.5%) declining already before, the dynamics accumulate to pretty bleak total picture. Though there is till 40K data scientists wanted in Germany, in last 3M the demand dropped by whopping 19%. The slash is most visible in Junior spectrum, where there is -37% drop in demand MoM. Generic positions were also on decline (they dropped by -9%), but companies’ demand still grows in explicit Senior roles (+13%). Not sure if this is to already attributable to “GPT-effect”, but being Senior DS certainly puts you on the more promising side of the job market “river”. At least for now.

Geographically some interesting moves are happening as well. While Top 3 German Data Science hubs (Berlin, Munich and Hamburg) are already on brakes (~ -20% MoM), the south-west (Frankfurt, Stuttgart and cologne) still did not get what they were looking for (stable demand with even +1% growth). Out of these secondary hubs, the market has almost frozen in Cologne, to the point that Stuttgart overtook Cologne also in Data Science positions. Other 2^nd tier cities like Frankfurt and already mentioned Stuttgart keep their Data Science appetite still high, so some hopes to get interesting job offer there are alive.

Demand by #worklocation:

#BERLIN 14.2% [5 716 open positions]

#MUNICH 11.5% [4 629]

#HAMBURG 5.7% [2 295]

#FRANKFURT 6.7% [2 697]

#COLOGNE 2.0% [ 805]

#STUTTGART 3.3% [1 328]

Data engineering jobs by the #SENIORITY:

#Junior 6.9% [ 2 778]

#Midlevel (or unspecified) 67.0% [26 972]

#Senior 26.1% [10 057]

Data Analysts – Only midlevel BI keeping somewhat afloat

The market of the Data analysts is also in several months falling streak. In June the drop is -6.8%. The only sub-group of analytical jobs that keeps the line of demand are Business Intelligence analysts, who recorded +2%, all other analytical positions shrink the open positions demand. The trends are not positive for the edges of the seniority spectrum, where only the mid-tier was able to keep itself afloat. Within the last month the market has dropped appetite for both super Senior as well as Junior positions. Interestingly enough, the pricing analyst market is almost non-existent. In whole Germany, there is less than 30 open positions for Pricing analyst in total.

Geographically the development is having its own branches as well. Most big German cities (Munich, Hamburg, Cologne, Dusseldorf) are deep in the declining trend of the Data analysts’ positions. Contrary to development in Data Science and Data Engineering, where Stuttgart is booming, in Data Analytics it records the hardest percental drop (-33%). On the contrary two German hubs where the demand is still on rise in Berlin and Frankfurt, where MoM there were more open positions, despite the general decline on federal level. So where does the drop really happen? Well, it is smaller cities and rural areas that dropped the ball in last month. You can see that well also from the fact that while in May the share of top 7 cities together held 45% of all open data analysts offers, in June it is up to nearly 49%, signaling the higher absence of the smaller cities in the jobs mix.

Demand by #role:

#BI 25.4% [10 770 open positions]

#CONSULTANT 17.4% [7 372]

#MARKETING 2.2% [ 916]

#SALES 0.5% [ 226]

#PRODUCT MNG. 0.6% [ 254]

#PRICING 0.1% [ 21]

Demand by #worklocation:

#BERLIN 11.7% [4 951 open positions]

#MUNICH 9.8% [4 137]

#HAMBURG 8.2% [3 494]

#FRANKFURT 6.9% [2 936]

#COLOGNE 4.7% [2 005]

#DUESSELDORF 4.0% [2 697]

#STUTTGART 3.2% [1 338]

Data engineering job by the #SENIORITY:

#Intern 0.4% [ 196]

#Junior 9.1% [ 4 144]

#Midlevel (or unspecified) 80.9% [36 831]

#Senior 9.6.4% [ 4 367]

In general, after somewhat cloudy spring, the market of open positions in data jobs in full decline on all three important verticals (Data Engineering, Data Science and Data analytics). If you live in big cities it might not feel like that because there are usually 500+ positions to choose from (which sounds like a plethora of choice without relocation need). But one should realize that fewer and fewer open positions signal that companies are not in hiring sprees. That also means that budgets will be tighter and salary ceilings not that high above as before. From my own experience as hiring manager for data roles in www.flaconi.de I can also add that international candidates (mainly from outside of EU) are still eager to take their chance to shine. Thus, the competition is getting tighter as well. When you plan your next career move no German data jobs market, do a bit of your research before “jumping into water”. Good luck and see you in the next edition of this regular report.

Späť na domovskú stránku

DATA JOBS MARKET in GERMANY | 2023-05 overview

Data job market slowly shrinking | Most stable in Data Engineering, but leaning rather towards mid-spectrum | Munich still desperate for Data Scientists, in Hamburg and Cologne the Data Science demand dropped | Data Consulting jobs evaporated | Smaller German cities still in hunt for Data analysts

Data Engineers – close to stagnating

Gradual cool down of the data jobs demonstrates itself also in the Data Engineering space, but the drops in demand for this profession are the mildest and Data engineering is only less than 1% below stagnation trend. Interestingly, Berlin and Munich are still hungry for new Data Engineers (have higher number of open positions than last month), but secondary hubs (like Hamburg or Frankfurt) already filled in many positions (or withdrew their hiring intentions).

When it comes to demand for different seniority levels, vast open positions do not indicate any seniority requirement (and go with just generic Data engineer). Among those explicitly looking for Senior data engineers, the demand has dropped by ~ 300 positions, taking the share of open Senior positions to 15.9% from all vacancies. ON the other extreme of the spectrum, there is 200 less open Junior Data engineers job ads, accounting for 5.7% of all open data engineering positions.

Demand by #worklocation:

#BERLIN 19.7% [2 203 open positions]

#MUNICH 10.9% [1 219]

#HAMBURG 7.7% [ 861]

#FRANKFURT 6.0% [ 671]

#COLOGNE 4.2% [ 470]

Data engineering job by the #SENIORITY:

#Junior 5.7% [ 637]

#Midlevel (or unspecified) 78.4% [8 768]

#Senior 15.9% [1 778]

Data Scientists – Some cities getting into desperate mode

Also jobs in area of Data Science are slowly (by -6.5%) declining in number of open positions, though the base is still well above 40 000 vacancies. Generic positions were less prominent (their share dropped below 70%), companies’ demand rather grows in explicit Senior or Junior roles. That usually signals that companies with more clearer projects in mind spearhead the development in last weeks.

Geographically interesting play unveils. While Berlin (and Hamburg) slowly step-by-step saturate their Data Science needs, Frankfurt and Munich can’t get enough of what they want. The situation seems to be getting desperate mainly in Munich, which is the only larger German city where the demand for Data Scientist is still significantly growing (+21% vs. overall -7% drop in Germany). If the situation persists this might overheat the local market leading to compensation bands piking steeply up. On the contrary the market has almost frozen in Cologne, where within 1 month there is 1300 less open Data Science positions. With such a strong tempo of decline, this can’t be possibly just positions being filled-in do fast and thus rather signals a lot of companies with-drawing their original requisitions.

Demand by #worklocation:

#BERLIN 17.2% [7 438 open positions]

#MUNICH 13.0% [5 622]

#HAMBURG 6.5% [2 811]

#FRANKFURT 6.2% [2 681]

#COLOGNE 1.9% [ 822]

Data engineering job by the #SENIORITY:

#Junior 10.3% [ 4 454]

#Midlevel (or unspecified) 68.3% [29 538]

#Senior 21.4% [ 9 255]

Data Analysts – Consulting jobs evaporated month on month

Market of the Data analysts gets saturated the faster, where the drop in demand was -7.3%. The main driver for this is sudden drop in demand for consultants with data analytical roles, where almost 40% of last months consultant roles are not advertised any more (compared to month ago). The outlook of consulting companies (and units) is pretty distressed and, hence, hiring of these roles stepped understandably “on breaks”. More positive trends in specific analytical roles are in Marketing and Pricing, where number of open positions stagnates (or even slowly grows).

All major big German cities (Berlin, Hamburg, Munich, Frankfurt, Cologne) seem to be jumping on the declining trend of the Data analysts’ positions. The development was fastest in Frankfurt, where demand dropped by almost 38%. Very different picture we can see in lower tiers of the analytical hubs (like Stuttgart and Dusseldorf ), where similar number of open data analyst positions still preserves. So if you are willing to move (or work remotely) to smaller city, your chances of being premium (and wanted) candidate are significantly better there.

An interesting trend is also that data analytical positions are dropping so fast, that if they sustain this trajectory than next month (in June 2023) there might be more Data Science positions open than Data analysts. This would also confirm that with (generative) AI booming, companies rather seek talent from more sophisticated tiers of data skills. We will closely watch the development and debate it in more detail in next edition of the job market scan.

Demand by #role:

#BI 23.1% [10 497 open positions]

#CONSULTANT 16.9% [7 673]

#MARKETING 2.0% [ 920]

#SALES 0.5% [ 214]

#PRODUCT MNG. 0.4% [ 173]

#RICING 0.1% [ 48]

Demand by #worklocation:

#BERLIN 10.4% [4 714 open positions]

#MUNICH 9.6% [4 368]

#HAMBURG 7.7% [3 515]

#FRANKFURT 4.5% [2 045]

#COLOGNE 4.5% [2 039]

#DUESSELDORF 4.6% [2 088]

#STUTTGART 4.4% [1 983]

Data engineering job by the #SENIORITY:

#Intern 0.4% [ 196]

#Junior 9.1% [ 4 144]

#Midlevel (or unspecified) 80.9% [36 831]

#Senior 9.6.4% [ 4 367]

Späť na domovskú stránku

How ChatGPT really works in SIMPLE WORDS (and pictures)

Many of us have probably already played with new-kid-on the-block of the Artificial intelligence space, ChatGPT from OpenAI. Providing prompt of any question and getting no-gibberish, solid answer, very often factually even precise is fascinating experience. But after few awe moments of getting answer to your “question of the questions” you maybe wondered how does the Chat GPT actually really work?

If you are top-notch Data scientist you could probably go into documentation (and related white-papers) and can simulate (or even write own) transformer to see what is going under hood. However, besides those few privileged, usual person is probably deprived of this, ehm, joy. 😊 Therefore, let me walk you through the mechanics of ChatGPT in robust, but still human-speak explanation in next few paragraphs (and schemas). Disclaimer = I compiled this overview based on publicly available documentation for the 3.0 version of the GPT. The newer versions (like 4.0 ) work with same principles but have different size of neuron nets, look-up dictionaries and context vectors, so if you are super-interested into how the most recent version works, please extend your research beyond this article)

6 main steps

Even though our interaction with ChatGPT looks seamless, for every query to it, there are 6 steps going on (in real time). Media label the ChatGPT in single phrase as “artificial intelligence”, but it is worth mentioning that of these 6 steps, only 2 and half are actually real artificial intelligence components. Significant part of the ChatGPT run is actually relatively simple math of manipulating vectors and matrices. And that makes the details of the ChatGPT even more fascinating, even for the “lame” audience.

It’s start with compressing world into 2048 numbers

The first step of the ChatGPT work is that it reads through the whole query that you provided and scans for what are you actually asking. It analyzes the words used and their mutual relations ships and encodes the context (not yet the query itself, just the topic) of the question. You might be amazed by fact that ChatGPT converts whole world and possible questions you ask into combination of 2048 topics (represented by decimal numbers). In a very simplifying statement you can say that ChatGPT compresses the Internet world into 2048-dimensional cube.

Context first, then come tokens

As outlined in previous paragraph, in process of answering our prompt the ChatGPT first takes some (milliseconds) time to under the context of the query before actually parsing through the query itself. So after it decides, who area(s) of “reality” you are interested in, than it meticulously inspects your entire question. And it literally does so piece by piece, as it decomposes the given question into tokens. Token is in English usually a stemmed word (base) with ignoring the stop-words or other meaning non-bearing parts of the text. In other languages token can be obtained differently, but as rule of the thumb: number of tokens <= number of words in the question.

For every token the GPT engine makes a look-up into predefined dictionary of roughly 50K words. Using hashed tables (to make the search super fast), it retrieves a vector (again 2048 elements long one) for each token. This way each word of the query is linked to topic dimensions. As the system does not know in advance how many words will your request have, there needs to mechanism to accommodate for any (allowed) length of the query. To be flexible with this, chatGPT forms a extremely long vector (2048 * number of tokens), in which the sub-vectors coming from dictionary lookup for each token is arrange one after another into sequence. Therefore 100 words long query might have even up to 204 800 vector elements. even larger 500 words request might have more than 1 mil of the letters. This vector is than processed, but first we need to do one more important change.

Where to look (or How to swim in this ocean of data)

As we learned 500 words long request to ChatGPT might arrive at more than 1 mil numbers encoding this request. That is a real ocean of the data. If you as human received such a long prompt for answer, I guess you would struggle even with where to focus the attention first place. But no worries here, so would the GPT if it was not for the Attention mechanism. This AI technique researched only in last 10 years (papers from 2014 and 2017) is the real break0through behind GPT and is also the reason why language models were able to achieve the major step-up in “intelligence” of communication.

The way that Attention mechanism works, it calculates (still through linear algebra matrices) pair of two (relatively short vectors) for each of the token. These vectors are labeled as KEY and VALUE. They are representation of what is really important (and why) in the text. This way the engine does not force neural network to put equal weight ( = focus) on all million input numbers, but select which subsection of the query vector are crucial for answering the question. When then combined into transformed SUM of the elements, it provides the recipe for how to “cook” the answer to question. what might sound like (yet another) complication, is actually key simplifier and energy saver. While past approached to language moles assumed “memory” holding equally important each word of the query text (or assigning same, gradual loss of attention into previous words). That was prohibitive expensive and hence limited the development of better models. Therefore, jumping over the attention hurdle unlocked the training potential of AI models.

Finally AI part

It might be counter-intuitive for many, but first 3 steps of the GPT have actually nothing to do with Artificial Intelligence. It is only step 4, where the real AI magic can be spotted. Essence of the 4th step is the Transformer core. It is a deep neural network, with 96 layers of the neurons, a bit more than 3000 neurons in each of the layers. The transformer part can be actually named also the Brain of the GPT. Because it is exactly the transformer layers that store the coefficients trained from running large amounts of texts through neural network. Each testing text used for training of the AI, leaves potentially trace in the massive amount of the synopses between the GPT “neurons” in form of the weight assigned to given connections.

As unimaginable the net of hundreds thousands (or millions) neurons are to us humans, so is the actual result of the Transformer part of GPT is probability distribution. No, not a sequence of words or tokens, not a programmed answer generating set of rules, just probability distribution.

Word by word, bit by bit …

Finally in step 5 of the Chat GPT we are ready to generate the textual form of the answer. GPT does that by taking the probability distribution (from step 1) and running the decoder part of the Transformer. This decoder takes distribution and finds the most probable word to start the answer with. Then it takes the probability distribution again and tries to generate second word of the answer, and third, then forth and so on, until the distribution of probabilities calls special End-of-request token. Interestingly enough, the generation does not prescribe how many words will the answer have, neither it defines some kind of satisfaction score (on how much you answered the query already with so-far generated sequence of words). Though ChatGPT does not hallucinate the answer or bets on single horse only. During the process of the creation of the answer there are (secretly) at least 4 different versions (generated using beam search algorithm). Application finally chooses one that it deems most satisfactory for the probability distribution.

Last (nail) polish

As humans, we might consider the job done by step 5 already, so what on Earth is the sixth step needed for? Well anybody thinking so, forgets that human person talking formulates the grammatically correct (or at least most of us) sequence. But AI needs a bit of the help here. The answer generated by Decoder still needs to undergo several checks. This step is also place where filtering or suppressing of the undesirable requests is applied. There are several layers on top of the generated raw text from previous stage. This is also (presumably) place where translation from language to language happens (e.g. you enter you question in English, but you ask GPT to answer in Spanish). The final result of the query answer has been delivered, user can read through. And ask next question 🙂

The flow of the questions in the same conversation thread can actually lead to updating or tweaking the context parameters (Step 1) of given conversation. The answering context thus gets more and more precise. Strikingly, the Open AI’s GPT models actually store each of the conversation, so if you need to refer back to some past replica of conversation, GPT will still hold the original questions and answers of that talk branch. Your answer (and questions) remain thus historized and in full recall any time in future. Fascinating, given the number of users and queries that they file.

Steps Summarized

The above described steps of the GPT answer building have been neatly summarized into following slide, providing additional details and also indicating the transformations made in individual steps to enable the total answer flow. So if you want to internalize the flow or simply repeat the key training architecture/principles, please read through the following summary:

Few side notes to realize …

Though the actual mission of this blog post is to walk the reader through the (details of) process of generating the answer to the query prompt for GPT, there are few notable side facts stemming from the way that GPT is internally organized. So if you want to collect few “fun fact” morsels that make you more entertaining dinner buddy for your next get-away with friends (or for Sunday family lunch), here is few more interesting facts to be aware of (in GPT realm):

And bit of zoom-out view

Besides the fascination with HOW actually ChatGPT works, I often receive also questions about it’s future or speed of the past progress. I summarized the most common questions (I received) into below show-cased 1-pager. So if your curiosity is still on high level, feel free to charge yourself with these FAQs:

Späť na domovskú stránku

FILIP’s 2023 READING SUGGESTIONS

As you probably know, I am heavy, heavy reader. To the point that for last 14 years in row I committed in my NewYear’s resolution to read at least 10 000 pages of books in given calendar year. So far, I failed only once in last 14 years. So, cross your fingers for my 2023 reading endeavors, please.

In order to really read 10K pages in year, one needs several things. But good reading list is unavoidable. With average book volume enclosing 300-400 pages (you can do the math) that one needs about 30 different books per year. In most of the years I tried to publish the books suggestions, you can revert back to Here, or there. This year I turned the table and I am sharing my reading queue in advance. Thus, feel free to get inspired:

Alex J. Gutman, Jordan Goldmeier : BECOMING A DATA HEAD

https://www.amazon.de/-/en/Alex-J-Gutman/dp/1119741742

No, quite the contrary! This book really walks its talk. Namely walks you as user through different stages of Data analytics and Data Science smoothly. Even the basic concepts are explained in no-nonsense style that does not require any previous knowledge from you, but also does not insult (your intelligence) neither gets you bored, if ou are reading things already obvious to you. You can also decide how “far into the woods” do you want to dive and stop reading any time you think this is exactly the level of understanding that is enough for you. Or maybe you look even deeper to understand the principles of what you just read?

I strongly recommend this book for anybody trying to change career into data jobs. I find it also great present for any manager or executive if you want to enlighten them in data.

Daniel Vaughan: ANALYTICAL SKILLS for AI & DATA SCIENCE

Topic | Analytics, Data-driven, Decision making

Let me start the review with saying this book is really special for at least 2 reasons: 1] Through out 20 years of my analytical endeavors I have seen people entering the DS arena either from business acumen and bubbling up their tech skills standards OR from solid tech university major try to close the business understanding gap. I read many books that try to serve the technical part of DS ingredients (business audience), but I have been long looking for book that tries to upskill the already technically savvy Data Scientist (or Analyst) with extra toolbox on How to be really useful with their supreme skills (in business). Daniel Vaughan manages to deliver one. Therefore, this book is really good read primarily for those friends with algorithms but frustrated about not making DS impact.

2] I stumbled on this book while (one of my regular) searching through titles that still miss in my library. I read the reviews and got intrigued (for reasons stated above in 1] paragraph above). But it was a bet. A good one. How did I realized that it is natural extension for my reading line? Do you know that feeling when you come to party and get introduced to new person with whom after talking the whole evening with (as you can’t stop talking to that person) feel like would have (or will be) your best friend because you have so much in common from past? Well then this kind of feeling I gained when reading though this Analytical Skills for AI & DS book. Most of the arguments mentioned got me nodding (in consent) and whenever author provides suggesting reading I found 8/10 of these suggestions already in my private book-shelf.

Long story short: this is great book for teaching/cementing skills in how to apply analytical skills (or even heavy duty Data Science, if you dare) to business problems. If you are technical person, this might erase “understand the business” from your up-skilling to-do list. If you are already busy savvy, this gives you clear idea what to ask from your more technical colleagues. Here and there the author drops a heavier math annotation or snippet of Python code, but they are more of illustration and skipping them does not take your learning even inch lower. It is indeed amazing how smoothly Daniel Vaughan walks you through important concepts and principles, no matter what your starting literacy in Business, Econometrics, Data Analytics or DS algorithms is. Certainly worth the bucks.

Link | https://www.amazon.de/-/en/Daniel-Vaughan/dp/1492060941

Peter Pru : ECOMMERCE EMPIRE

Topic | E-commerce, Marketing strategies

This book got into my reading list over E-commerce community suggestion as one the must-reads.
After reading fully through, I honestly remain unimpressed. To be fair, book is really trying to encourage people to start/boost their e-commerce venture and doing it in easy explained and structured manner. It also has some interesting twist to #upsell strategies, that is not basic or first-thought type. It also has good chapter on competition research and initial product #assortment focus.

But overly “American” style of pushing for things, overselling own story as general truth and setting the #revenue (not CMx or profit) milestones as measures for success leave a bitter feeling. Also last chapter’s sudden pivot towards self-care and mental balance (as a lever to do e-commerce) is miss to the rest of the book content.

All in all, makes it a controversial choice. I was probably not a target group of this book (and will not become my book-present liebling either). But if you still want to start small e-shop, it´s probably worth few hours of read.

Link | https://www.amazon.de/-/en/Peter-Pru/dp/1736230905

Justin Grimmer, Margaret E. Roberts , Brandon M. Stewart: TEXT AS DATA

Topic | Data, NLP, Analytics

Link | https://www.amazon.de/-/en/Justin-Grimmer/dp/0691207550

Edosa Odaro: MAKING DATA WORK

Topic | Transformation, Data Engineering

Some books are easy read, you literally swing through them. Other are more sweat to finish. Edosa Odaro created a compelling data transformation back-bone in his Making Data Work. He really tries to explain how to approach moving from legacy data stacks to more contemporary mode of data operations and analytics. But …

The language chosen is very eloquent, but hard to read. As if the book should rather impress reader with author’s level of language use rather than want you to finish the book and take away lessons learned. Each chapter is narrated like story and should be fun to read. In theory. However, Edosa also starts the discussion from obscure Lehman brothers fall incident which leaves you puzzled for first 2-3 chapters on if the title of the book has not been (actually) mistakenly swapped with another book. The build up of the argument is slow in first parts of the book, you might feel several times like dropping the read without making it to the last page.

And there is another aspect to overcome. Author spent significant time in Finance and his views (on data innovation) match that environment well. However, they also indirectly reveal, how much financial institutions missed the (data) train in last decade. Many “issues” thoroughly debated on several pages are often smoothly resolved by other industries and thus drama in stories might seem a but “storm in the glass”. I was thinking how hard (and awkward) might the story be for people not having same financial background (which I fortunately did have), often probably confused why is this or that such a big deal.

But Making Data Work has some true thought gems (like well-reasoned proof that Silo thinking is actually desirable in Data & Analytics). Just they are well hidden in hay-stack of author’s memories and seek for noticeable language figure-skating. To finish this book is not an easy task. As much as I think it was worth the time, I also have to admit this is not book for everyone. So read on your own risk and be ready that first 30-40 pages you might not see the end of the tunnel.

Link | https://www.amazon.com/Making-Data-Work-Edosa-Odaro/dp/1032224436

Steve Krug : DON’T MAKE ME THINK (Revisited)

Topic | Design, Product analytics, UX

It is not long read and I strongly encourage anybody interacting in our with Web and App’s (or have a say in their design) to at least skim through this. No regret move!

Link | https://www.amazon.de/-/en/Steve-Krug/dp/0321965515

Kate Strachnyi: ColorWise

Topic | Visualization & Business Intelligence

Link | https://www.amazon.com/ColorWise-Storytellers-Guide-Intentional-Color/dp/1492097845

Tony Fadell: BUILD

Topic | Career (in Tech) , Product building, Management

The shortest version of review for this book would be “Masterpiece!”. It is really transformation, as Gary Hammel’s or Jim Collins’ books. But if you need a bit more reasons to buy/read this, here is why:

Many admire Tony Fadell for what he achieved. He built iPod for Apple and basically saved Apple from falling. And then humbly he built iPhone on top. And if that would not be enough for you, then he built (from scratch) the brand new company Nest that started the whole SmartHome category of technology and sold it to Google for few billions. So certainly inspiring person enough. But if you are not a tech geek, you probably did not hear his name before or care too much. Nor did I. And I regret so.

His book BUILD is interesting mixture of advice and guidance for people who want to have their life (and career) a bit more in their hands. He narrates the story from the adolescence through earlier years in job up to CEO-part of your life. And yes, maybe you will never (want to) be CEO, but the story is still a good guidance. He tells how to think about first job(s), what to look for, what to avoid. Are you transitioning from expert to your first managerial role, there is great chapter (two) for it. Do you dream to launch start-up, there is solid story on how to make it more likely to happen. Did you start the company? Well, here is the chapter on how to organizationally survive growth from 5 to 1000 employees. Your company has a first version of product built and does not know what version 2 and 3 (and beyond) should be about? Fadell tells clearly how to look for that pathway. It might sound fluffy, but whoever you are in business, I am quite sure you can take some benefit from some chapter of this book. Yes, occasionally you have to pardon him Tony’s american optics, but the smell of it is more like fragrance you know, but would not wear yourself, not a sensoric disgust.

I especially admire a chapter on how data plays different role in building individual phases of the product. It gives you clear idea guidance on where data is horse and where it is (still needed but rather) cart. Going through 3 layers of management (Team Lead to SVP) myself, I can confirm that his views of how to perceive your role(s) is very accurate and I was amazed how he can compress the essence into (often just) few pages of the text. All in all, this book is Masterpiece (uh, I told you that already, right? 🙄). And I strongly suggest you to read it. The earlier the better. Because some of the lessons he gives I had to learn hard way and I only wished he wrote that book earlier. Have a great read!

Link | https://www.amazon.de/Build-Unorthodox-Guide-Making-Things/dp/1787634108

Ludmila Filipova: THE PARCHMENT MAZE

https://www.amazon.de/-/en/Ludmila-Filipova/dp/1483969444

Juan Enriquez: RIGHT/WRONG | HOW TECHNOLOGY TRANSFORMS OUR ETHICS

https://www.amazon.de/-/en/Juan-Enriquez/dp/0262044420

Robert M. Sapolsky: BEHAVE | The bestselling exploration of why humans behave as they do

https://www.amazon.de/-/en/Robert-M-Sapolsky/dp/009957506X

Patrick Gilbert : JOIN OR DIE | Digital Advertising in the Age of Automation

https://www.amazon.de/-/en/dp/1632217686

Neil Hoyne: CONVERTED | The Data-Driven Way to Win Customers’ Hearts

Topic | Data-driven, Business, Marketing

Many progressive companies try to be (or declare themselves to be) Data-driven. As much as it is music for the Data team’s ear, with all honesty, it is often more aspirational “badge” than reality.

Because, as with many other phenomena, being data driven is more about what you do than what you declare. Neil Hoyne’s book is, in this regard, a nice mirror to look into. He takes the process of running the company (from client acquisition to profit cash-in) and topic by topic challenges you, if you really do it data based. With full disclosure sometimes I don’t agree with arguments to illustrate that, but his down-to-earth, no-bullshit zoom-out on business processes using data or not is admirable and appreciated.

This book is great gift for mid- and top-managers, who you want to inspire to take their business steering to (sustainable) higher level. It’s short read that any of the leaders can squeeze-in. It is also great read for data professionals who want to (finally) win the trust of their business leaders for plugging in data into the crucial decision making. I would not be surprised if this became one of the “must reads” for managers in years to come. Hence my suggestion for you to take a lead 😉

Link | https://www.amazon.de/-/en/dp/0593420659

Chuck Hemann, Ken Burbary : DIGITAL MARKETING ANALYTICS

https://www.amazon.de/gp/product/0789759608

Once I finish the books, I will write short review, so that you can be double sure, if worth the reading time for you. For all those non-reviewed suggestions, don’t be shy to take a bit of the reader’s risk together with me 🙂

Disclaimer: Please note that the links to the Amazon are without any referral id and I am not receiving any kind of commission or kick-back for whatever you chose to purchase. I am attaching the links here just to arm you with place to research more about the book.

Späť na domovskú stránku

Did ChatGPT pass Data Science technical interview?

On last day of November 2022, bit in the shadow of the Cyber week craze, there has been released by OpenAI team for free testing the new ChatGPT. It is aimed to be an chat-bot using strong GPT 3.5 natural language model, capable of not only casual conversation but also able to answer real (even tough) expert questions, or write creatively texts, poetry or even whole stories.

As the features (and performance) of the model are pretty awesome step-up to what we have seen so-far, its launch immediately rolled the snowball of testing it in plethora of the domains. The craze seems to be actually so intense, that it is believed to be the first digital tool/service to reach 1 million of new users within 5 days of its official release. (To be fair, I think it is the first recorded one only, I am quite sure that in countries like India or China it is not unheard of gaining 1 mil users fast for something really catchy 😊)

But back to core story. The ChatGPT use-case, that was bringing the most havoc on LinkedIn and many blogs and news portals, is fact that can produce real snippets of code based on very simple specification of what the code should do. You can go really like “Show me code to predict survival rate on Titanic” and it returns in snap the Python code to fetch the data, create predictive model and run it, all in gleaming, well commented Python coding language. Or so it looks.

In effort to create my own opinion, I tried (and collected others’) attempts on coding inquiries to investigate the real quality of the code. I made a short summary of this early investigation in this this LinkedIn post. Tl;DR = it was not flaw-less code; if you try to run it, you will still often stumble upon errors, BUT … For somebody not having a clue how to attack the problem, it might be more than an inspiration.

Few days later, my dear friend (and former colleague) Nikhil Kumar Jha came with the idea to ask the ChatGPT one of the technical interview questions he remembered from the time I was hiring him into my team. He passed me the question and answer in message. And I have to say, the answer was pretty solid. That made my mind twisting. So, we quickly agreed to take the whole battery of the test that I use for technical interview for Data scientist and submit the ChatGPT “candidate” through the whole interview hassle. Rest of this blog tries to summarize how did the robot do and what are the implications of that. But before we get there: What do you think: Has the ChatGPT passed the technical round to be hired?

Technical interview to pass

Before jumping into (obviously most) juicy answer to question at the end of previous paragraph, let me give you a bit of the context about my interview as such. The market of the Data Scientist and Machine Learning engineers is full of “aspirational Data Scientists” ( = euphemism for pretenders). They rely on the fact that it is difficult to technically screen the candidate into details. Also the creativity of the hiring managers to design very own interview questions is relatively low, so if you keep on going to interview after interview, over several tens of rounds you can be lucky to brute force some o them (simply by piggybacking on the answers from failed past interviews).

To fight this, I have several sets of uniquely designed questions, that I rotate through (and secret follow-up questions ready for those answering the basic questions surprisingly fast). In general, the technical round needs to separate for me the average from great and yet genius from great. Thus, it is pretty challenging in its entirety. Candidate can earn 0 -100 points and the highest score I had in my history was 96 points. (And that only happened once; single digit number of candidates getting over 90 points from more than 300 people subjected to it). The average lady or gentleman would end up in 40 – 50 points range, the weak ones don’t make it through 35 points mark even. I don’t have a hard cut-off point, but as a rule of a thumb, I don’t hire candidate below 70 points. (And I hope to get to 85+ mark with candidates to be given offer). So now is the time to big revelation…

Did the ChatGPT get hired?

Let me unbox the most interesting piece here first and then support it with a bit of the details. So, dear real human candidates, the ChatGPT did not get hired. BUT it scored 61 points. Therefore, if OpenAI keeps on improving it version by version, it might get over the minimal threshold (soon). Even in tested November 2022 version, it would beat majority of the candidates applying for Senior data science position. Yes, you read right, it would beat them!

That is pretty eye-opening and just confirms what I have been trying to suggest for 2-3 years back already: The junior coding (and Data Science) positions are really endangered. The level of the coding skills needed for entry positions are, indeed, already within the realm of Generative AI (like ChatGPT is). So, if you plan to enter the Data Science or Software engineering career, you better aim for higher sophistication. The lower level chairs might not be for the humans any more (in next years to come).

What did robot get right and what stood the test?

Besides the (somewhat shallow) concern on passing the interview as such, more interesting for me was: On what kind of questions it can and cannot provide correct answers? In general, the bot was doing fine in broader technical questions (e.g. asking about different methods, picking among alternative algorithms or data transformation questions).

It was also doing more than fine in actual coding questions, certainly to the point that I would be willing to close one-eye on technical proficiency. Because also in real life interviews, it is not about being nitty-gritty with syntax, as long as the candidate provides right methods, sound coding patterns and gears them together. The bot was also good at answering straight forward expert question on “How to” and “Why so” for particular areas of Data Science or Engineering.

Where does the robot still fall short?

One of the surprising shortcomings was for example when prompted on how to solve the missing data problem in the data set. It provided the usual identification of it (like “n/a’, NULL, …), but it failed to answer what shall be done about it, how to replace the missing values. It also failed to answer some detailed questions (like difference between clustered and non-clustered index in SQL), funny enough it returned the same definition for both, even though prompted explicitly for their difference.

Second interesting failure was trying to swerve the discussion on most recent breakthroughs in Data Science areas. ChatGPT was just beating around the bush, not really revealing anything sensible (or citing trends from decade ago). I later realized that these GPT models still take months to train and validate, so the training data of GPT is seemingly limited to 2021 state-of affairs. (You can try to ask it why Her Majesty Queen died this year or what Nobel prize was awarded for in 2022 in Physics 😉 ).

To calm the enthusiasts, the ChatGPT also (deservedly and soothingly) failed in more complicated questions that need abstract thinking. In one of my interview questions, you need to collect the hints given in text to frame certain understanding and then use this to pivot into another level of aggregation within that domain. Hence to succeed, you need to grasp the essence of the question and then re-use the answer for second thought again. Here the robot obviously got only to the level 1 and failed to answer the second part of the question. But to be honest, that is exactly what most of the weak human candidates do when failing on this question. Thus, in a sense it is indeed at par with less skilled humans again.

How good was ChatGPT in the coding, really?

I specifically was interested in the coding questions, which form the core of technical screening for Data Science role. The tasks that candidate has to go through in our interview is mix of “show me how would you do” and “specific challenge/exercise to complete”. It also tests both usual numerical Data Science tasks as well as more NLP-ish exercises.

The bot was doing really great on “show me how would you do …” questions. It produced code that (based on descriptors) scores often close to full point score. However, it was struggling quite on specific tasks. In other words, it can do “theoretical principles”, it fails to cater for specific cases. But again, were failing, the solutions ChatGPT produced were the usual wrong solutions that the weak candidates come with. Interestingly, it was never a gibberish, pointless nonsense. It was code really running and doing something (even well commented for), just failing to do the task. Why am I saying so? The scary part about it that in all aspects the answers ChatGPT was providing, even when it was providing wrong one, were looking humanly wrong answers. If there was a Turing test for passing the interview, it would not give me suspicion that non-human is going through this interview. Yes, maybe sometimes just weaker candidate (as happens in real life so often as well), but perfectly credible human interview effort.

Conclusions of this experiment

As already mentioned, the first concern is that ChatGPT can already do as good as an average candidate on interview for Senior Data Scientist (and thus would be able to pass many Junior Data Scientist interviews fully). Thus, if you are in the industry of Data analysis (or you even plan to enter it), this experiment suggests that you better climb to the upper lads of the sophistication. As the low-level coding will be flooded by GPT-like tools soon. You can choose to ignore this omen on your own peril.

For me personally, there is also second conclusion from this experiment, namely pointing out which areas of our interview set need to be rebuilt. Because the performance of the ChatGPT in coding exercises (in version from November 2022) was well correlated with performance of human (even if less skilled) candidates. Therefore, areas in which robot could ace the interview question cleanly, signal that they are probably well described somewhere “out in wild internet” (as it had to be trained on something similar). I am not worried that candidate would be able to GPT it (yes we might replace “google it” with “GPT it” soon) live in interview. But the mere fact that GPT had enough training material to learn the answer flawlessly signals, that one can study that type of questions well in advance. And that’s the enough of concern to revisit tasks.

Hence, I went back to redrafting the interview test battery. And, of course, I will use “ChatGPT candidate” as guineapig of new version when completed. So that our interview test can stand its ground even in era of Generative AI getting mighty. Stay tuned, I might share more on the development here.

Older articles on AI topic:

AI tries to capture YET another human sense

Want to learn AI? Break shopping-window in Finland

REMOTE LEARNING now on AI steroids

5+1 interesting AI videos

Späť na domovskú stránku