How ChatGPT really works in SIMPLE WORDS (and pictures)

Many of us have probably already played with new-kid-on the-block of the Artificial intelligence space, ChatGPT from OpenAI. Providing prompt of any question and getting no-gibberish, solid answer, very often factually even precise is fascinating experience. But after few awe moments of getting answer to your “question of the questions” you maybe wondered how does the Chat GPT actually really work?

If you are top-notch Data scientist you could probably go into documentation (and related white-papers) and can simulate (or even write own) transformer to see what is going under hood. However, besides those few privileged, usual person is probably deprived of this, ehm, joy. 😊 Therefore, let me walk you through the mechanics of ChatGPT in robust, but still human-speak explanation in next few paragraphs (and schemas). Disclaimer = I compiled this overview based on publicly available documentation for the 3.0 version of the GPT. The newer versions (like 4.0 ) work with same principles but have different size of neuron nets, look-up dictionaries and context vectors, so if you are super-interested into how the most recent version works, please extend your research beyond this article)

 

6 main steps

Even though our interaction with ChatGPT looks seamless, for every query to it, there are 6 steps going on (in real time). Media label the ChatGPT in single phrase as “artificial intelligence”, but it is worth mentioning that of these 6 steps, only 2 and half are actually real artificial intelligence components. Significant part of the ChatGPT run is actually relatively simple math of manipulating vectors and matrices. And that makes the details of the ChatGPT even more fascinating, even for the “lame” audience.

 

It’s start with compressing world into 2048 numbers

The first step of the ChatGPT work is that it reads through the whole query that you provided and scans for what are you actually asking. It analyzes the words used and their mutual relations ships and encodes the context (not yet the query itself, just the topic) of the question. You might be amazed by fact that ChatGPT converts whole world and possible questions you ask into combination of 2048 topics (represented by decimal numbers). In a very simplifying statement you can say that ChatGPT compresses the Internet world into 2048-dimensional cube.

 

Context first, then come tokens

As outlined in previous paragraph, in process of answering our prompt the ChatGPT first takes some (milliseconds) time to under the context of the query before actually parsing through the query itself. So after it decides, who area(s) of “reality” you are interested in, than it meticulously inspects your entire question. And it literally does so piece by piece, as it decomposes the given question into tokens. Token is in English usually a stemmed word (base) with ignoring the stop-words or other meaning non-bearing parts of the text. In other languages token can be obtained differently, but as rule of the thumb:  number of tokens <= number of words in the question.

For every token the GPT engine makes a look-up into predefined dictionary of roughly 50K words.  Using hashed tables (to make the search super fast), it retrieves a vector (again 2048 elements long one) for each token. This way each word of the query is linked to topic dimensions. As the system does not know in advance how many words will your request have, there needs to mechanism to accommodate for any (allowed) length of the query. To be flexible with this, chatGPT forms a extremely long vector (2048 * number of tokens), in which the sub-vectors coming from dictionary lookup for each token is arrange one after another into sequence. Therefore 100 words long query might have even up to 204 800 vector elements. even larger 500 words request might have more than 1 mil of the letters. This vector is than processed, but first we need to do one more important change.

Where to look (or How to swim in this ocean of data)

As we learned 500 words long request to ChatGPT might arrive at more than 1 mil numbers encoding this request. That is a real ocean of the data. If you as human received such a long prompt for answer, I guess you would struggle even with where to focus the attention first place. But no worries here, so would the GPT if it was not for the Attention mechanism. This AI technique researched only in last 10 years (papers from 2014 and 2017) is the real break0through behind GPT and is also the reason why language models were able to achieve the major step-up in “intelligence” of communication.

The way that Attention mechanism works, it calculates (still through linear algebra matrices) pair of two (relatively short vectors) for each of the token. These vectors are labeled as KEY and VALUE. They are representation of what is really important (and why) in the text. This way the engine does not force neural network to put equal weight ( = focus) on all million input numbers, but select which subsection of the query vector are crucial for answering the question. When then combined into transformed SUM of the elements, it provides the recipe for how to “cook” the answer to question. what might sound like (yet another) complication, is actually key simplifier and energy saver. While past approached to language moles assumed “memory” holding equally important each word of the query text (or assigning same, gradual loss of attention into previous words). That was prohibitive expensive and hence limited the development of better models. Therefore,  jumping over the attention hurdle unlocked the training potential of AI models.

 

Finally AI part

It might be counter-intuitive for many, but first 3 steps of the GPT have actually nothing to do with Artificial Intelligence. It is only step 4, where the real AI magic can be spotted. Essence of the 4th step is the Transformer core. It is a deep neural network, with 96 layers of the neurons, a bit more than 3000 neurons in each of the layers. The transformer part can be actually named also the Brain of the GPT. Because it is exactly the transformer layers that store the coefficients trained from running large amounts of texts through neural network. Each testing text used for training of the AI, leaves potentially trace in the massive amount of the synopses between the GPT “neurons” in form of the weight assigned to given connections.

As unimaginable the net of hundreds thousands (or millions) neurons are to us humans, so is the actual result of the Transformer part of GPT is probability distribution. No, not a sequence of words or tokens, not a programmed answer generating set of rules, just probability distribution.

 

Word by word, bit by bit …

Finally in step 5 of the Chat GPT we are ready to generate the textual form of the answer. GPT does that by taking the probability distribution (from step 1) and running the decoder part of the Transformer. This decoder takes distribution and finds the most probable word to start the answer with. Then it takes the probability distribution again and tries to generate second word of the answer, and third, then forth and so on, until the distribution of probabilities calls special End-of-request token. Interestingly enough, the generation does not prescribe how many words will the answer have, neither it defines some kind of satisfaction score (on how much you answered the query already with so-far generated sequence of words). Though ChatGPT does not hallucinate the answer or bets on single horse only.  During the process of the creation of the answer there are (secretly) at least 4 different versions (generated using beam search algorithm). Application finally chooses one that it deems most satisfactory for the probability distribution.

 

Last (nail) polish

As humans, we might consider the job done by step 5 already, so what on Earth is the sixth step needed for? Well anybody thinking so, forgets that human person talking formulates the grammatically correct (or at least most of us) sequence. But AI needs a bit of the help here. The answer generated by Decoder still needs to undergo several checks. This step is also place where filtering or suppressing of the undesirable requests is applied. There are several layers on top of the generated raw text from previous stage. This is also (presumably) place where translation from language to language happens (e.g. you enter you question in English, but you ask GPT to answer in Spanish).  The final result of the query answer has been delivered, user can read through. And ask next question 🙂

The flow of the questions in the same conversation thread can actually lead to updating or tweaking the context parameters (Step 1) of given conversation. The answering context thus gets more and more precise. Strikingly, the Open AI’s GPT models actually store each of the conversation, so if you need to refer back to some past replica of conversation, GPT will still hold the original questions and answers of that talk branch. Your answer (and questions) remain thus historized and in full recall any time in future.  Fascinating, given the number of users and queries that they file.

 

Steps Summarized 

The above described steps of the GPT answer building have been neatly summarized into following slide, providing additional details and also indicating the transformations made in individual steps to enable the total answer flow. So if you want to internalize the flow or simply repeat the key training architecture/principles, please read through the following summary:

 

Few side notes to realize …

Though the actual mission of this blog post is to walk the reader through the (details of) process of generating the answer to the query prompt for GPT, there are few notable side facts stemming from the way that GPT is internally organized. So if you want to collect few “fun fact” morsels that make you more entertaining dinner buddy for your next get-away with friends (or for Sunday family lunch), here is few more interesting facts to be aware of (in GPT realm):

And bit of zoom-out view

Besides the fascination with HOW actually ChatGPT works, I often receive also questions about it’s future or speed of the past progress. I summarized the most common questions (I received) into below show-cased 1-pager. So if your curiosity is still on high level, feel free to charge yourself with these FAQs:

FILIP’s 2023 READING SUGGESTIONS

As you probably know, I am heavy, heavy reader. To the point that for last 14 years in row I committed in my NewYear’s resolution to read at least 10 000 pages of books in given calendar year. So far, I failed only once in last 14 years. So, cross your fingers for my 2023 reading endeavors, please.

 

In order to really read 10K pages in year, one needs several things. But good reading list is unavoidable. With average book volume enclosing 300-400 pages (you can do the math) that one needs about 30 different books per year. In most of the years I tried to publish the books suggestions, you can revert back to Here, or there. This year I turned the table and I am sharing my reading queue in advance. Thus, feel free to get inspired:

 

Alex J. Gutman, Jordan Goldmeier  : BECOMING A DATA HEAD

https://www.amazon.de/-/en/Alex-J-Gutman/dp/1119741742

Decision to put this book on my reading list was stemming from the curiosity.  The book reviews suggest that this book is good entry-book for executive trying to be data-driven or AI-ready. Being SVP Data & Analytics (and seasoned Data Scientist) myself, hardly the fit for my career phase. But I have seen so many books claim (and fail) to introduce you to Data Science bushes, that I was tempted on how this book will be doing? Yet another flat-falling promise?

No, quite the contrary! This book really walks its talk. Namely walks you as user through different stages of Data analytics and Data Science smoothly. Even the basic concepts are explained in no-nonsense style that does not require any previous knowledge from you, but also does not insult (your intelligence) neither gets you bored, if ou are reading things already obvious to you. You can also decide how “far into the woods” do you want to dive and stop reading any time you think this is exactly the level of understanding that is enough for you. Or maybe you look even deeper to understand the principles of what you just read?

I strongly recommend this book for anybody trying to change career into data jobs. I find it also great present for any manager or executive if you want to enlighten them in data.

 

Daniel Vaughan: ANALYTICAL SKILLS for AI & DATA SCIENCE

Topic | Analytics, Data-driven, Decision making

Let me start the review with saying this book is really special for at least 2 reasons: 1]  Through out 20 years of my analytical endeavors I have seen people entering the DS arena either from business acumen and bubbling up their tech skills standards OR from solid tech university major try to close the business understanding gap. I read many books that try to serve the technical part of DS ingredients (business audience), but I have been long looking for book that tries to upskill the already technically savvy Data Scientist (or Analyst) with extra toolbox on How to be really useful with their supreme skills (in business). Daniel Vaughan manages to deliver one. Therefore, this book is really good read primarily for those friends with algorithms but frustrated about not making DS impact.

2] I stumbled on this book while (one of my regular) searching through titles that still miss in my library. I read the reviews and got intrigued (for reasons stated above in 1] paragraph above). But it was a bet. A good one. How did I realized that it is natural extension for my reading line? Do you know that feeling when you come to party and get introduced to new person with whom after talking the whole evening with (as you can’t stop talking to that person) feel like would have (or will be) your best friend because you have so much in common from past? Well then this kind of feeling I gained when reading though this Analytical Skills for AI & DS book. Most of the arguments mentioned got me nodding (in consent) and whenever author provides suggesting reading I found 8/10 of these suggestions already in my private book-shelf.

Long story short: this is great book for teaching/cementing skills in how to apply analytical skills (or even heavy duty Data Science, if you dare) to business problems. If you are technical person, this might erase “understand the business” from your up-skilling to-do list. If you are already busy savvy, this gives you clear idea what to ask from your more technical colleagues.  Here and there the author drops a heavier math annotation or snippet of Python code, but they are more of illustration and skipping them does not take your learning even inch lower. It is indeed amazing how smoothly Daniel Vaughan walks you through important concepts and principles, no matter what your starting literacy in Business, Econometrics, Data Analytics or DS algorithms is. Certainly worth the bucks.

Link | https://www.amazon.de/-/en/Daniel-Vaughan/dp/1492060941

 

Peter Pru : ECOMMERCE EMPIRE

Topic | E-commerce, Marketing strategies

This book got into my reading list over E-commerce community suggestion as one the must-reads.
After reading fully through, I honestly remain unimpressed. To be fair, book is really trying to encourage people to start/boost their e-commerce venture and doing it in easy explained and structured manner. It also has some interesting twist to #upsell strategies, that is not basic or first-thought type. It also has good chapter on competition research and initial product #assortment focus.

But overly “American” style of pushing for things, overselling own story as general truth and setting the #revenue (not CMx or profit) milestones as measures for success leave a bitter feeling. Also last chapter’s sudden pivot towards self-care and mental balance (as a lever to do e-commerce) is miss to the rest of the book content.

All in all, makes it a controversial choice. I was probably not a target group of this book (and will not become my book-present liebling either). But if you still want to start small e-shop, it´s probably worth few hours of read.

Link | https://www.amazon.de/-/en/Peter-Pru/dp/1736230905

 

 

Justin Grimmer, Margaret E. Roberts , Brandon M. Stewart: TEXT AS DATA

Topic | Data, NLP, Analytics

As the title of the book rightly suggests, text has been for long perceived as special “animal”. On the edge of the data analytics, much more obscure than analysis of the relational data by SQL or by Predictive analytics. Text analytics was also managed by dedicated (python) packages and often by NLP-specializing-only staff. If you were not one, you would probably just reach out for (simplified) predefined functions in NLTK (or similar code library).
Those times are over. Text is mainstream. If you were not convinced before ChatGPT burst, now there is no way to disprove it. But Text analytics still finds the audience (and practitioners) left in pre-text era, only having rough idea how to address data that is stored in troves of text.

Therefore, This book comes as a kind of gift. If you admit to be one of those having general (read limited) only understanding of insight extraction from text and how to set-up the text analytics in your team, if you have not been treating text equally heavy as ML or Reinforcement learning, this book helps you to close that gap. It’s well written and always illustrated on telling examples. If you missed to buy the ticket for departing text analytics “train”, this is your fast track to get on it.

Linkhttps://www.amazon.de/-/en/Justin-Grimmer/dp/0691207550

 

Edosa Odaro: MAKING DATA WORK 

Topic | Transformation, Data Engineering

Some books are easy read, you literally swing through them. Other are more sweat to finish. Edosa Odaro created a compelling data transformation back-bone in his Making Data Work. He really tries to explain how to approach moving from legacy data stacks to more contemporary mode of data operations and analytics. But …

The language chosen is very eloquent, but hard to read. As if the book should rather impress reader with author’s level of language use rather than want you to finish the book and take away lessons learned. Each chapter is narrated like story and should be fun to read. In theory. However, Edosa also starts the discussion from obscure Lehman brothers fall incident which leaves you puzzled for first 2-3 chapters on if the title of the book has not been (actually) mistakenly swapped with another book. The build up of the argument is slow in first parts of the book, you might feel several times like dropping the read without making it to the last page.

And there is another aspect to overcome. Author spent significant time in Finance and his views (on data innovation) match that environment well. However, they also indirectly reveal, how much financial institutions missed the (data) train in last decade. Many “issues” thoroughly debated on several pages are often smoothly resolved by other industries and thus drama in stories might seem a but “storm in the glass”. I was thinking how hard (and awkward) might the story be for people not having same financial background (which I fortunately did have), often probably confused why is this or that such a big deal.

But Making Data Work has some true thought gems (like well-reasoned proof that Silo thinking is actually desirable in Data & Analytics). Just they are well hidden in hay-stack of author’s memories and seek for noticeable language figure-skating. To finish this book is not an easy task. As much as I think it was worth the time, I also have to admit this is not book for everyone. So read on your own risk and be ready that first 30-40 pages you might not see the end of the tunnel.

Link | https://www.amazon.com/Making-Data-Work-Edosa-Odaro/dp/1032224436

 

Steve KrugDON’T MAKE ME THINK (Revisited)

 

Topic |  Design, Product analytics, UX

Web and App’s became our window of everyday activities, social interaction, shopping and most of of work (certainly so during COVID). In 1990’s and 2000’s institutions and businesses were trying to impress us by physical real estate. But how do us digital institutions treat now?

This book is for everyone, who wants to grasp the basics (yes, it is starting from ground) of how to design digital interface on web or app. Even though this might sound like UX designer guideline (which I was happy user if it was), it is really served in down to earth language and does not require from you any design domain knowledge. (but it leaves you with some after you read through).

It is not long read and I strongly encourage anybody interacting in our with Web and App’s (or have a say in their design) to at least skim through this. No regret move!

Linkhttps://www.amazon.de/-/en/Steve-Krug/dp/0321965515

 

Kate Strachnyi: ColorWise

Topic | Visualization & Business Intelligence

 

As somebody shaping (literally) thousands of visualization year after year, I welcome books describing the rules and good (and bad) practices for creating visualizations. I have few in my library (and suggested them in my previous reading lists), but they often talk more about what kind of graph to chose and how to shape the composition. Many of them take use of color for granted (or touch the issue only from the side).

The ColorWise is book giving “color choice” and “color coding” in graphs and visualization full spot-light.  It explains the background of colors in very non-academic way and surely taking you beyond your previous knowledge about color usage. It also gives clear guidance on how to create your graph color schemes, if you are anchored with some of the brand (must-have) colors. What is more, it goes also deeper into psychology of different color schemes and warns you about cultural or color deficiency pitfalls of your graphs. If you are already pro, you will often nod your head with “Exactly!” on your lips … and you still learn few new aspects to think about. If you are “regular” color user, your color coding skills will take significant boost. I strongly recommend for anybody , who needs to produce dashboards or presentations regularly in their work.

Link | https://www.amazon.com/ColorWise-Storytellers-Guide-Intentional-Color/dp/1492097845

 

Tony Fadell: BUILD

Topic | Career (in Tech) , Product building, Management

The shortest version of review for this book would be “Masterpiece!”. It is really transformation, as Gary Hammel’s or Jim Collins’ books. But if you need a bit more reasons to buy/read this, here is why:

Many admire Tony Fadell for what he achieved. He built iPod for Apple and basically saved Apple from falling. And then humbly he built iPhone on top. And if that would not be enough for you, then he built (from scratch) the brand new company Nest that started the whole SmartHome category of technology and sold it to Google for few billions. So certainly inspiring person enough. But if you are not a tech geek, you probably did not hear his name before or care too much. Nor did I. And I regret so.

His book BUILD is interesting mixture of advice and guidance for people who want to have their life (and career) a bit more in their hands. He narrates the story from the adolescence through earlier years in job up to CEO-part of your life. And yes, maybe you will never (want to) be CEO, but the story is still a good guidance. He tells how to think about first job(s), what to look for, what to avoid. Are you transitioning from expert to your first managerial role, there is great chapter (two) for it. Do you dream to launch start-up, there is solid story on how to make it more likely to happen. Did you start the company? Well, here is the chapter on how to organizationally survive growth from 5 to 1000 employees.  Your company has a first version of product built and does not know what version 2 and 3 (and beyond) should be about? Fadell tells clearly how to look for that pathway. It might sound fluffy, but whoever you are in business, I am quite sure you can take some benefit from some chapter of this book. Yes, occasionally you have to pardon him Tony’s american optics, but the smell of it is more like fragrance you know, but would not wear yourself, not a sensoric disgust.

I especially admire a chapter on how data plays different role in building individual phases of the product. It gives you clear idea guidance on where data is horse and where it is (still needed but rather) cart. Going through 3 layers of management (Team Lead to SVP) myself, I can confirm that his views of how to perceive your role(s) is very accurate and I was amazed how he can compress the essence into (often just) few pages of the text. All in all, this book is Masterpiece (uh, I told you that already, right? 🙄). And I strongly suggest you to read it. The earlier the better. Because some of the lessons he gives I had to learn hard way and I only wished he wrote that book earlier. Have a great read!

Link | https://www.amazon.de/Build-Unorthodox-Guide-Making-Things/dp/1787634108

 

Ludmila Filipova: THE PARCHMENT MAZE

https://www.amazon.de/-/en/Ludmila-Filipova/dp/1483969444

<review to come soon>

 

 

 

 

Juan Enriquez: RIGHT/WRONG | HOW TECHNOLOGY TRANSFORMS OUR ETHICS

https://www.amazon.de/-/en/Juan-Enriquez/dp/0262044420

<review to come soon>

 

 

 

 

Robert M. Sapolsky: BEHAVE | The bestselling exploration of why humans behave as they do

https://www.amazon.de/-/en/Robert-M-Sapolsky/dp/009957506X

<review to come soon>

 

 

 

 

Patrick Gilbert : JOIN OR DIE | Digital Advertising in the Age of Automation

https://www.amazon.de/-/en/dp/1632217686

<review to come soon>

 

 

 

 

Neil Hoyne: CONVERTED | The Data-Driven Way to Win Customers’ Hearts

Topic | Data-driven, Business, Marketing

Many progressive companies try to be (or declare themselves to be) Data-driven. As much as it is music for the Data team’s ear, with all honesty, it is often more aspirational “badge” than reality.

Because, as with many other phenomena, being data driven is more about what you do than what you declare. Neil Hoyne’s book is, in this regard, a nice mirror to look into. He takes the process of running the company (from client acquisition to profit cash-in) and topic by topic challenges you, if you really do it data based. With full disclosure sometimes I don’t agree with arguments to illustrate that, but his down-to-earth, no-bullshit zoom-out on business processes using data or not is admirable and appreciated.

This book is great gift for mid- and top-managers, who you want to inspire to take their business steering to (sustainable) higher level. It’s short read that any of the leaders can squeeze-in. It is also great read for data professionals who want to (finally) win the trust of their business leaders for plugging in data into the crucial decision making. I would not be surprised if this became one of the “must reads” for managers in years to come. Hence my suggestion for you to take a lead 😉

Linkhttps://www.amazon.de/-/en/dp/0593420659

 

 

Chuck Hemann, Ken Burbary : DIGITAL MARKETING ANALYTICS

https://www.amazon.de/gp/product/0789759608

<review to come soon>

 

 

 

 

 

Once I finish the books, I will write short review, so that you can be double sure, if worth the reading time for you. For all those non-reviewed suggestions, don’t be shy to take a bit of the reader’s risk together with me 🙂

Disclaimer: Please note that the links to the Amazon are without any referral id and I am not receiving any kind of commission or kick-back for whatever you chose to purchase. I am attaching the links here just to arm you with place to research more about the book.

Did ChatGPT pass Data Science technical interview?

On last day of November 2022, bit in the shadow of the Cyber week craze, there has been released by OpenAI team for free testing the new ChatGPT. It is aimed to be an chat-bot using strong GPT 3.5 natural language model, capable of not only casual conversation but also able to answer real (even tough) expert questions, or write creatively texts, poetry or even whole stories.

As the features (and performance) of the model are pretty awesome step-up to what we have seen so-far, its launch immediately rolled the snowball of testing it in plethora of the domains. The craze seems to be actually so intense, that it is believed to be the first digital tool/service to reach 1 million of new users within 5 days of its official release. (To be fair, I think it is the first recorded one only, I am quite sure that in countries like India or China it is not unheard of gaining 1 mil users fast for something really catchy 😊)

But back to core story. The ChatGPT use-case, that was bringing the most havoc on LinkedIn and many blogs and news portals, is fact that can produce real snippets of code based on very simple specification of what the code should do. You can go really like “Show me code to predict survival rate on Titanic” and it returns in snap the Python code to fetch the data, create predictive model and run it, all in gleaming, well commented Python coding language. Or so it looks.

In effort to create my own opinion, I tried (and collected others’) attempts on coding inquiries to investigate the real quality of the code. I made a short summary of this early investigation in this this LinkedIn post. Tl;DR = it was not flaw-less code; if you try to run it, you will still often stumble upon errors, BUT … For somebody not having a clue how to attack the problem, it might be more than an inspiration.

 

Few days later, my dear friend (and former colleague) Nikhil Kumar Jha came with the idea to ask the ChatGPT one of the technical interview questions he remembered from the time I was hiring him into my team. He passed me the question and answer in message. And I have to say, the answer was pretty solid. That made my mind twisting. So, we quickly agreed to take the whole battery of the test that I use for technical interview for Data scientist and submit the ChatGPT “candidate” through the whole interview hassle. Rest of this blog tries to summarize how did the robot do and what are the implications of that. But before we get there: What do you think: Has the ChatGPT passed the technical round to be hired?

Technical interview to pass

Before jumping into (obviously most) juicy answer to question at the end of previous paragraph, let me give you a bit of the context about my interview as such. The market of the Data Scientist and Machine Learning engineers is full of “aspirational Data Scientists” ( = euphemism for pretenders). They rely on the fact that it is difficult to technically screen the candidate into details. Also the creativity of the hiring managers to design very own interview questions is relatively low, so if you keep on going to interview after interview, over several tens of rounds you can be lucky to brute force some o them (simply by piggybacking on the answers from failed past interviews).

To fight this, I have several sets of uniquely designed questions, that I rotate through (and secret follow-up questions ready for those answering the basic questions surprisingly fast). In general, the technical round needs to separate for me the average from great and yet genius from great. Thus, it is pretty challenging in its entirety. Candidate can earn 0 -100 points and the highest score I had in my history was 96 points. (And that only happened once; single digit number of candidates getting over 90 points from more than 300 people subjected to it). The average lady or gentleman would end up in 40 – 50 points range, the weak ones don’t make it through 35 points mark even. I don’t have a hard cut-off point, but as a rule of a thumb, I don’t hire candidate below 70 points. (And I hope to get to 85+ mark with candidates to be given offer). So now is the time to big revelation…

Did the ChatGPT get hired?

Let me unbox the most interesting piece here first and then support it with a bit of the details. So, dear real human candidates, the ChatGPT did not get hired. BUT it scored 61 points. Therefore, if  OpenAI keeps on improving it version by version, it might get over the minimal threshold (soon). Even in tested November 2022 version, it would beat majority of the candidates applying for Senior data science position. Yes, you read right, it would beat them!

That is pretty eye-opening and just confirms what I have been trying to suggest for 2-3 years back already: The junior coding (and Data Science) positions are really endangered. The level of the coding skills needed for entry positions are, indeed, already within the realm of Generative AI (like ChatGPT is). So, if you plan to enter the Data Science or Software engineering career, you better aim for higher sophistication. The lower level chairs might not be for the humans any more (in next years to come).

What did robot get right and what stood the test?

Besides the (somewhat shallow) concern on passing the interview as such, more interesting for me was: On what kind of questions it can and cannot provide correct answers? In general, the bot was doing fine in broader technical questions (e.g. asking about different methods, picking among alternative algorithms or data transformation questions).

It was also doing more than fine in actual coding questions, certainly to the point that I would be willing to close one-eye on technical proficiency. Because also in real life interviews, it is not about being nitty-gritty with syntax, as long as the candidate provides right methods, sound coding patterns and gears them together. The bot was also good at answering straight forward expert question on “How to” and “Why so” for particular areas of Data Science or Engineering.

Where does the robot still fall short?

One of the surprising shortcomings was for example when prompted on how to solve the missing data problem in the data set. It provided the usual identification of it (like “n/a’, NULL, …), but it failed to answer what shall be done about it, how to replace the missing values. It also failed to answer some detailed questions (like difference between clustered and non-clustered index in SQL), funny enough it returned the same definition for both, even though prompted explicitly for their difference.

Second interesting failure was trying to swerve the discussion on most recent breakthroughs in Data Science areas. ChatGPT was just beating around the bush, not really revealing anything sensible (or citing trends from decade ago). I later realized that these GPT models still take months to train and validate, so the training data of GPT is seemingly limited to 2021 state-of affairs. (You can try to ask it why Her Majesty Queen died this year or what Nobel prize was awarded for in 2022 in Physics 😉 ).

To calm the enthusiasts, the ChatGPT also (deservedly and soothingly) failed in more complicated questions that need abstract thinking. In one of my interview questions, you need to collect the hints given in text to frame certain understanding and then use this to pivot into another level of aggregation within that domain. Hence to succeed, you need to grasp the essence of the question and then re-use the answer for second thought again. Here the robot obviously got only to the level 1 and failed to answer the second part of the question. But to be honest, that is exactly what most of the weak human candidates do when failing on this question. Thus, in a sense it is indeed at par with less skilled humans again.

How good was ChatGPT in the coding, really?

I specifically was interested in the coding questions, which form the core of technical screening for Data Science role. The tasks that candidate has to go through in our interview is mix of “show me how would you do” and “specific challenge/exercise to complete”. It also tests both usual numerical Data Science tasks as well as more NLP-ish exercises.

The bot was doing really great on “show me how would you do …” questions. It produced code that (based on descriptors) scores often close to full point score. However, it was struggling quite on specific tasks. In other words, it can do “theoretical principles”, it fails to cater for specific cases. But again, were failing, the solutions ChatGPT produced were the usual wrong solutions that the weak candidates come with. Interestingly, it was never a gibberish, pointless nonsense. It was code really running and doing something (even well commented for), just failing to do the task. Why am I saying so? The scary part about it that in all aspects the answers ChatGPT was providing, even when it was providing wrong one, were looking humanly wrong answers. If there was a Turing test for passing the interview, it would not give me suspicion that non-human is going through this interview. Yes, maybe sometimes just weaker candidate (as happens in real life so often as well), but perfectly credible human interview effort.

Conclusions of this experiment

As already mentioned, the first concern is that ChatGPT can already do as good as an average candidate on interview for Senior Data Scientist (and thus would be able to pass many Junior Data Scientist interviews fully). Thus, if you are in the industry of Data analysis (or you even plan to enter it), this experiment suggests that you better climb to the upper lads of the sophistication. As the low-level coding will be flooded by GPT-like tools soon. You can choose to ignore this omen on your own peril.

For me personally, there is also second conclusion from this experiment, namely pointing out which areas of our interview set need to be rebuilt. Because the performance of the ChatGPT in coding exercises (in version from November 2022) was well correlated with performance of human (even if less skilled) candidates. Therefore, areas in which robot could ace the interview question cleanly, signal that they are probably well described somewhere “out in wild internet” (as it had to be trained on something similar). I am not worried that candidate would be able to GPT it (yes we might replace “google it” with “GPT it” soon) live in interview. But the mere fact that GPT had enough training material to learn the answer flawlessly signals, that one can study that type of questions well in advance. And that’s the enough of concern to revisit tasks.

Hence, I went back to redrafting the interview test battery. And, of course, I will use “ChatGPT candidate” as guineapig of new version when completed. So that our interview test can stand its ground even in era of Generative AI getting mighty. Stay tuned, I might share more on the development here.

 

Older articles on AI topic:

AI tries to capture YET another human sense

Want to learn AI? Break shopping-window in Finland

REMOTE LEARNING now on AI steroids

5+1 interesting AI videos

 

 

A/A test? No typo, it really exists!

Everybody knows A/B test, as it became essential tool for exploring new user preferences or patterns and also a way to systemically innovate through chain of (managed) experiments. But A/A test, seriously? Yes, this is not typo (after all on QWERTY keyboard A and B are quite far-away from each other 😉 ) and there are good reasons for this test to exist (and being used). But let’s get to it step-by-step …

 

Ups, it happened again …

Imagine you had a glitch in system that led to sudden (maybe even undesired) A/B test of the customer experience. Some people received the service as expected, while others have been cut off from this element of experience. You do a post-mortem and you see that this positively influenced their shopping behavior. However, as this was a complicated glitch, you can’t easily replicate the test to find out if actually switching off the experience element completely would be a better long-term strategy. I mean, it was still valid A/B test, but as the assignment to groups was not controlled you are not sure if the up-tick in B-group shopping behavior can be attributed to the change in experience or just skewed (not random) assignment of user sample by glitch itself. How would you tell?

 

 

Wow, really?

Your existing A/B testing platform had fallen behind the curve and thus you decided to shop for alternative solution. After implementing the new tool, all of the sudden some of your experiments start to show significantly larger gaps between test and control group. Some of the gains are almost too good to be true. But you already cancelled your previous software subscription, so you cannot replicate the same experiment on previous platform any more. How would you find out if all of the sudden your campaigns started to work better or just new tool is “wired differently” to run the tests?

 

There is a way

As you can read above, both of the depicted scenarios are stemming from real life of online marketers. I bet you might have experienced some variation of those firsthand.  But luckily, for both there is solution, you don’t have to throw away the data and start all over the again. And, yes, the solution is indeed our mysterious A/A test. So how does it really work?

 

A and A (again) , seriously?

The original idea of A/B test is quite simple: If mutually comparable groups of users are subjected to different treatment (and this treatment is the only substantial difference they have), if any of the groups behaves significantly differently as a result of the test, then there is high probability (yes, don’t forget it is still probabilistic finding) that these change in experience and change in behavior are linked to each other. The major vulnerability of this experiment, unfortunately, is the assumption that user samples A and B are really comparable. So what happens if we already have result of A/B testing but we have no proof/info how correctly the selection into groups happened?

Well, luckily this logic of the experimenting works also the other way around. If we have two groups undergoing identical treatment and resulting in also same behavior, we have high probability (yes, probabilistic here again), that the groups have had similar user distribution as well.

Therefore, if we have two groups and want to find out if they are some-what similar in user mix, we can subject them to same treatment and watch if they receive significantly similar result on high confidence level. And that exactly is the essence of our A/A test. Here the A+A signifies not that same group was used twice, but that different groups (in original experience A- and B-group) are subjected to same treatment in second experiment. This way we can try to “learn something about similarity’ of the groups ex post, already after completed A/B test.

 

Using A/A test is thus easy way to double check (or compensate for lack of) initial user assignment. Please note that while A/B test uses sample similarity to point out behavior difference, it’s A/A cousin uses (absence of) behavior difference to point out samples similarity. That also means that A/A does not:

  • say anything (additional) about the strength of behavioral difference from original A/B test, nor it serves as any prove of it. (if the difference in final behavior between A/B groups was statistically insignificant, it remains so even after successful A/A test).
  • prove general similarity of original A and B groups in general. It signals just similarity for behavior(s) relevant for original A/B experiment.
  • generate any new insights about the users (which is often contested by opponents of A/A as waste of testing capacities).

 

The main value added of the A/A testing is that it is possible to run (almost) no matter what the original experiment was and is easy to set-up. After all, you just need to wire the groups to same branch of the process. Therefore, A/A test is a quick remedy to “unusual” set-ups/hick-ups in proper A/B testing.

Its simplicity, of course, comes with some controversy. Some practitioners argue against A/A tests as not being the most robust way to prove A/B groups similarity (heavy multinomial distribution comparisons are), being not fully possible to rerun (e.g., if original B-group condition altered long-term perception of service) as well as being opportunity (or real) costs of not running other experiments instead.

I am far from promoting A/A tests as silver bullet, evade it if any of the (above mentioned) counter-arguments are very true for your own situation. However, the mere A/A tests existence and its proper set-up should be part of your toolbox; May the situation turn it to be the cheapest (and quickest) way to heal the experiment (improper) set-up. Especially so, if you are to assess the results of A/B tests conducted by somebody else before.

SURPRISING CLUBHOUSE AUDIENCE INSIGHTS (you have not seen yet)

Do you love Clubhouse? But would you also appreciate a bit more data on how this social media tick? Well, then here you are. Read on for some insights you most likely have not seen so far.

The project MEASURING CLUBHOUSE is get-together of Data analysts and scientist, grouped under THE MIGHTY DATA CLUB who happen to enjoy Clubhouse, but also see lack of serious data about this social media. Hence, we decided to run several deep-dive studies on Demographics, User behavior, Clubhouse rooms and their dynamics. All in all, the team generated already deck with more than 120 slides. Though the outputs serve primary to help local League of  Club Owners, there are quite a few slides that might be eye-opening for general public. Let’s walk you through selection of them.

[ If you are interested in learning more about the project or you would like to use/cite some of the findings, don’t hesitate to contact us on info@mocnedata.sk or directly on Clubhouse:  @FilipVitek ]

HOW DID CLUBHOUSE GOT VIRAL (beyond USA)

 

EARLY ADOPTERS CAME FROM 4 MAJOR AREAS

 

DEMOGRAPHICS – HOW TO GET TO THE HOLY GRAIL OF MARKETING

 

DEEPER LOOK REVEALS CLUBHOUSE’S  CUMULATION IN METROPOLITAN AREAS

 

THERE IS ACTUALLY QUITE A LOT TO KNOW, IF YOU TRY (HARDER) …

 

 ROOM DYNAMICS STUDY: CLEAR PATTERN OBSERVED …

 

DOES MODERATOR ACTIVITY REALLY PAY OFF ? 

 

SPEAKING OF CH ROOMS: HOW IN ADVANCE DO WE PLAN THEM?

 

… AND HOW DO WE NAME/DESCRIBE THEM?

CLUBHOUSE DATA – FIRST INSIGHTS

This is showcase of the MEASURING CLUBHOUSE IN CEE project yielding first real data about user preferences and behavior on Clubhouse. If you are interested to learn more about the project do not hesitate to contact the authors at info@mocnedata.sk or directly at Clubhouse:

Author’s profile on Clubhouse = https://www.joinclubhouse.com/@filipvitek  | TheMightyData Club profile = https://www.joinclubhouse.com/the-mighty-data-club

Let’s get to sample data graphs extracted about Clubhouse userbase:

HOW CLUBHOUSE GOT VIRAL 

DISTRIBUTION OF USER INTEREST (as declared in BIO)

 

(SLOVAK) CLUBHOUSE USERBASE DEMOGRAPHICS

 

CLUBHOUSE IS CLOSEST TO LINKED-IN

 

IN ROOM DYNAMICS : MEASURING THE TOTAL UNIQUE VISITORS

MEASURING CLUB REACH

CLUB REACH vs FOUNDER’S REACH dynamics

 

IF YOU WANT TO BE GLOBAL PLAYER …

 

UNCLEAR ABOUT CLUB AUDIENCE STRATEGY