When AI Outperformed Financial Analysts - Alex Kim

In this episode Glenn Hopper talks to the researcher responsible for the groundbreaking study which found that AI is better at conducting financial analysis than humans. Alex Kim, University of Chicago Booth School of Business, provides a full overview of his findings, methodology and the impact on FP&A, CFOs and finance from the attention-grabbing study “Financial Statement Analysis with Large Language Models”. The analysis, which made headlines across the world, found AI produces a 60% rate of accuracy in predictive financial performance. Human experts’ accuracy tends to fall between 53% and 57%.

In this episode Alex Kim reveals the implications for finance professionals:

Alex’s finance background – from a Master’s degree in Business Administration to a Accounting and a dual Bachelor’s degree in Economics and Business Administration- to his doctoral and PHD career
How he self taught himself coding and AI
Practically how do finance pros take the insights from this paper and use them in their day to day?
Why the model didn’t do so well with loss-making or startup companies
Improving on the performance models using a startup company data
How can you combine AI and Human Intelligence
What humans can do better than AI in financial forecasting
Future research projects into information processing for investors
How to keep up to date on the latest ground breaking research in AI and Finance
My military experience stationed with US soldiers in South Korea
My favorite Excel feature ( and why one thing about Excel still cannot be rivaled).

Read the full paper here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4835311

Check out the analyzer for yourself here: https://chatgpt.com/g/g-9P3sIn487-financial-statement-analyzer

Follow Alex Kim on Linkedin Ph.D. Student at the University of Chicago: https://www.linkedin.com/in/alexgunwookim

Full Transcript

Glenn Hopper:

Welcome to FP&A Today, I’m your host, Glenn Hopper. Today we have the pleasure of speaking with Alex Kim, a PhD student at the University of Chicago, and co-author of the super interesting new study from the University of Chicago Financial Statement Analysis with large language models, which showed some results that might surprise a lot of our listeners. We’ll be sure and provide a link to the full paper in the show notes. Alex, welcome to the show.

Alex Kim:

Thank you, Glen. Uh, thank you so much for having me here.

Glenn Hopper:

So, super excited about this topic today because this is a subject that’s near and dear to my heart. I actually teach courses on how to use Generative AI to do financial analysis. So when I saw this paper, I immediately knew I’ve gotta get you on the show. So I really appreciate you. Before we dive into the study, tell us a little, I know you, I know you’re a PhD student, and the funny thing is, before we talked, I had this expectation that you were gonna be a PhD student in machine learning, but I was so excited to to see what your actual field of study is. So tell me a little bit about your, your background, your educational journey, and what you’re currently researching.

Alex Kim:

Yeah, sure. Once again, thank you so much for having me here. My name’s Alex, currently a second year PhD student at the University of Chicago. My main research area is accounting. I do research at the intersection of accounting and finance, uh, especially related to investors information processing. I was born and raised in South Korea. I did my bachelor’s degree in economics and business administration from my home country, and also did my master’s degree,with a concentration in accounting at the same institution. So, as you can see, I actually do not have any formal education related to computer science, but what fascinated me the most was this information processing part of accounting. Um, as you know, accounting is a language that companies use to communicate themselves with stakeholders, investors, uh, with many external parties. So what is really interesting here is that accounting information is processed by so many people out there.

And then as technologies advance, um, they’re communicating them, companies are communicating themselves with so many other methods such as images, videos, texts, audio, like so many other methods. And then I became fascinated by how people process this information, how this information can be quantified, how it affects capital markets, how it affects people’s economic and financial decisions, and so on and so forth. So it has been my research interest for more than six or seven years. And then I just realized like five or six years ago when AI was not even a popularity or fad back then, I just realized, oh, to understand how people process this kind of new data, I have to learn how to code and how to process this new type of information. And then that was the beginning of how I started to learn about machine learning, AI image processing, voice processing, um, like, uh, video processing, stuff like that.

Um, so mo pretty much everything was self taught. I was like paying attention to, um, you know, like computer science lectures online. I also published several papers in one of the top computer science conferences with my conferences with my friends. Um, so I’ve been like learning by doing. Um, and then this process has helped me a lot, um, in my research and accounting and finance. So as I said, my primary research area is where, um, like, uh, is the intersection between the accounting and finance and how people process information and like understanding how people actually process information inside their brains is critical to my research. And that’s where AI machine learning, large language models are helping me out. And then here I am after finishing my master’s degree, I started my doctoral career, uh, in the United States. It’s been two years ever since I came to the United States. Um, yeah, I’ve been like doing my coming studies and then I recently did my comprehensive exams looking forward to the research phase of my PhD career. Yeah, that’s a brief story about myself. Yeah,

Glenn Hopper:

That’s great. And it’s, it’s so, I mean, I just, I think about how much more information is available now. Everything from the, the Twitter fire hose to earnings calls to, you know, just all the information that’s out there, the internal company data, plus the external data and how it weighs in. And I feel like we’re kindred spirits in a lot of ways because I’m always looking to find these correlations to take these different pieces of information. I’ve been playing around a little bit with using RAG with, with graph databases to try to find these, you know, not just the direct correlations, but the nodes and all that. And I think it’s just, I mean, it’s, you know, you’re looking for that correlation that gives you the sort of the leading indicator. If you can find something <laugh> that’s connected before the rest of the market realizes that, I mean, there’s just o obviously in, in investing and in in capital deployment. That’s a, a, a great thing to be able to try to identify. So the the work you’re doing really speaks to me

Alex Kim:

Yeah. As well as like, uh, you know, you talked about like practical standpoint, but I also do think that it has a lot of comparative advantage in terms of theoretical standpoint as well. I’m a social science researcher and I am also very interested in like, practical applications of my research. But at the same time, I do care about my theoretical and academic contributions in my field. I mean, there have been so many concepts, theoretical concepts that were not been able to be measured by researchers just because of a lack of technology or lack of new things. I mean, as you said, text information. It is super, super value relevant. It has been out there for more than 40, 50 years. It’s been comprising like major portion of the market reactions, how people process information. But only recently it has become available to analyze systematically such information. I mean, that’s actually a breakthrough in terms of like theoretical standpoint as well.

Glenn Hopper:

I’m so excited and I really want to dive straight into this paper because this is, I I’m sure you’re getting a, a lot of questions and it, this has gotten a lot of media attention and, um, I, I’m, I’m sure that, uh, you know, people are podcast hosts and, and everybody else is reaching out to you about the paper. And I was gonna kind of tee up the paper, but I don’t wanna steal your thunder. So tell me about this paper and, and what it, uh, what you guys found in your research.

Alex Kim:

So it started from a very straightforward motivation. So in the first round of research related to large language models, um, in the field of finance and accounting, like last year, um, many papers, including my papers, have been talking about what these language models can do. And I think the general consensus of those papers was, uh, language models are great. They can perform many tasks related to textual information. And as a whole, they are a nice textual supporting tool. They can summarize things, they can extract information, and they can actually provide reports for managers. But we wanted to test the boundaries of these language models even further. One of the main weak points, uh, caveats of these language models was that they were very weak in terms of quantitative analysis. However, uh, recent advances in large language models have enabled them to perform some basic quantitative tasks.

They can now do multiplications, like adding deductions and stuff like that. And then we wanted to test how far these language models can go in terms of financial and economic decision making, and wanted to understand whether they are more than just merely a supporting tool, whether they can be a, they can play a more central role in financial or economic decision making. So we tested this notion on one of the most foundational tasks in accounting and finance, uh, which is financial statement analysis. Financial statement analysis is a unique task, hybrid task that combines quantitative and qualitative analysis. So quantitative part is that analysts, or like information processors have access to financial statement information, which is largely numerical information. They perform some quantitative quantitative analysis such as trend analysis, ratio analysis, uh, identify some notable changes and stuff like that. And then they provide some economic intuitions or insights based on the numbers that they analyze.

And then after getting some economic intuitions run insights, they synthesize and combine all these pieces of information. And in the end of the day, they provide one piece of prediction or economic decisions buy or by or sell recommendations, earnings predictions or sales predictions, so on and so forth. So, as I said, that letter part of forming expectations, forming guidance, forming their economic insights, the letter part is actually very qualitative and requires multi-layer economic reasoning. So we wanted to test whether LLMs can successfully perform this highly complex task, which combines quantitative and qualitative, uh, aspects. And then, uh, what we do in the paper is pretty much very straightforward. Uh, we pass standardized and anonymous financial statements, balance sheet and income statement to the model without any specific context or narrative information, and try to mimic how human analysts process information within their brain. So we referred to a paper that was published more than 40 years ago about how analysts, human analysts process numerical information.

And then, um, according to the paper, human analysts process information, um, by doing some ratio analysis such as profitability, liquidity, um, efficiency, um, et cetera. And then they also do some trend analysis or, um, identify some notable changes based on financial statements. And we do exactly the same. We adopt a technique called chain of thought prompting, which is actually a very famous technique in natural language processing or computer science studies, um, to our study. So this is, um, simply put very simple and straightforward concept. It is actually providing the model, uh, some guidelines step-by-step guidelines. And then those guidelines, um, are very similar to how humans process information. And then in this context of financial statement analysis, we take the processes for steps, um, introduced in that paper, convert them into prompts, and then ask the model to process numerical data as human analysts process them.

And then, and then what we get is outcome variable to evaluate the, uh, quality of the financial statement analysis is the direction in the change of earnings. Uh, so that benchmark is widely used in accounting and finance literature. Um, it is a binary prediction whether earnings were likely to go up in the next period or likely to go down in the next period and try to see whether models prediction, um, is reasonable or accurate. And then what we find in the paper pretty much summarized is that GPT is on average better than human analysts in predicting, um, the direction of the future earnings changes. And the second is that, um, GPT is on par with highly specific or narrowly trained machine learning models in terms of predicting the changes of future earnings. And then one important intuition is that machine learning models, GPT is predictions and human analyst predictions are not mutually exclusive, they are complimentary with each other at this moment. And then that is actually one economic intuition or main takeaway of the paper. Yeah, that’s, um, high level understanding or introduction of the paper.

Glenn Hopper:

So I think the part that probably got everyone reaching out to you about this paper is it’s an easy click bait headline that is everybody’s so scared about AI taking their, their jobs <laugh>. So if you say, LLMs outperform human analysts at predicting directions of earnings per share, people freak out and they, but then they click the link and they wanna learn, learn more about it. Mm-Hmm. But I think I really, I like what you said there, and this has been part of my message as well, is it’s not either or it’s working together. Could you elaborate on, on that a little bit?

Alex Kim:

Actually, I think that is one of the main selling points of the paper. Um, I’d like to say quote unquote on average, on average is everything in our paper. So especially when we are comparing GPT’s predictions with human analyst predictions, we are saying that GPT’s predictions are on average better than human analyst predictions. And this finding itself is not very surprising because starting from early 2010 or like 2015, there have been like machine learning papers arguing that machine learning predictions are actually better than human analyst predictions. They say that it’s simply because, uh, machine learning models do not have human biases, and I personally agree with those studies. And then the findings are replicable. And then the finding that GPT on average is better than human analysts is not very striking. What is actually striking is that they are not mutually exclusive. So I mean, in, in, in our paper, we perform several analyses to show the complementarity between GPTs predictions and human analysts predictions.

Lemme walk through the findings a little bit. Um, so in the first set of the tests, we try to understand why like there are incorrect predictions for both models, like for analysts and for GPT. And we find that, uh, GPT and analysts both struggle when firms report losses when firms are small. And when firms have high volatility earnings, which is a finding that is pretty much consistent with prior studies. When information environment is opaque, um, analysts tend to struggle in terms of predicting earnings, which makes total sense. But one interesting finding, but preliminary in that regression, specific regression was that GPT analysts were relatively better than GPT in predicting earnings of those information. Opaque firms, I mean, they were struggling, both struggling, but what I want to emphasize is that analysts were doing relatively better than GPT, especially when they were predicting earnings of firms that are small, um, that report loss and, uh, have high volatility earnings.

That finding was pretty much interesting to us. And then we just wanted to understand why this, that that’s happening. And according to our interpretation, the reason is that for predicting firms that are small and opaque , you know, prior studies show that soft information or private communications between analysts and managers matter more. So basically numerical analysis or GPTs analysis, there are based on general knowledge and like, you know, data that is publicly available.

But what prior studies suggest is that for those specific firms, you know, small firms or lost reporting firms, private communications matter a lot. And then that’s why analysts might have comparative advantage over GPT in terms of predicting earnings. And then after understanding a little bit about why analysts are having like comparative advantage, especially in that specific sector or like information informationally of pack firms, we try to horse race the measures.

Um, we try to horse race GT’s predictions with analyst predictions and try to see whether GPT’s predictions, subsumes analyst predictions. And what we find is the opposite. They are actually not mutually exclusive GPT’s predictions are not subsuming analyst predictions. They remain statistically significant, both remain statistically significant, which implies that, um, they convey some sort of like orthogonal information. They contain some information that is independent from each other, um, which implies that, you know, uh, GPT’s predictions and analyst predictions are at, you know, are are actually complimentary rather than substituting each other at this moment. Of course, in the future we really don’t know what’s gonna happen. This technology is changing too much too fast. Um, you know, uh, we, we never know what’s gonna happen after five years, but at least at this time being, um, what we can say for sure is that analysts predictions and GPTs predictions are not mutually exclusive. They have some sort of like independent information and then there are some areas where human analysts are relatively doing better than the AI model. So what I’m suggesting here, and it’s our view that, you know, um, human analysts haven’t lost their comparative advantage. Um, what has become even more important at this moment is actually identifying areas where humans can maintain their comparative advantage and try to reallocate and better allocate, uh, human resources to areas where they can excel. I think

Glenn Hopper:

Just fascinating. And I know another part of your paper compared the, um, results of the large language models to the fine tuned machine learning algorithms that would also be used here. Can you talk a little bit about that part of the paper?

Alex Kim:

Yeah, sure. Um, actually that part of the paper is where we were surprised. Actually the first part of the paper. Um, I would say that we were not very surprised. Um, we were sort of expecting a little bit, but the second part of the paper we were very surprised because, um, it has to do with something like how the models are trained. So, um, like for those who are not very familiar with large language models, I’m gonna talk very briefly about how the models are trained, especially general purpose, um, large language models. So general purpose large models are trained on, um, a large corpus of textual data. So the primary training purpose for large language models is actually to produce sentences that sound natural. So for example, say that there’s a sentence, I am a boy, and then we just make boy a blank. And then the model see, uh, the model processes, I am a and then try to identify what’s gonna come next to I am a and then, and then it has a large corpus of vocabularies inside its memory and try to assess the probability, um, of what’s gonna come next.

Um, boy will be assigned a high probability for sure, but for example, if, um, we have a vocabulary such as like, um, a notepad, um, it is less likely to come right after I am a ’cause I am a notepad. Sounds a little bit weird. So basically, based on the corpus of textual data that is trained on, uh, a large language model is trained to produce the most naturally sounding sentences. In other words, it is not trained on a very specific task. That is something that surprised us the most because machine learning models or deep learning models are trained on a very specific purpose or task. In here, in this context, we train the model to specifically predict the changes in earnings, especially the direction of the changes in earnings based on 59 variables used in a seminal paper by UIN Penman published in 1988, 1989.

So these variables are actually calculated from financial statements. Um, they, they are very comprehensive and then, and then are used by many other papers. And then we design and fine tune a large artificial neural network model, uh, which enables non-linear interactions of all the variables, all the predictors. And then we train the model to specifically predict the changes in earnings. So I would say that it is not a fair comparison between machine learning models and large language models ’cause large language models, although they do have general knowledge about finance and accounting, you know, they are not trained on earnings prediction tasks. However, machine learning models are trained on specific tasks. And then we are horse racing these two models. And then we were actually expecting machine learning models to be slightly better than large language models. But what we find is actually the opposite, not the opposite, but like, not in the way that we had expected.

So what we find in the paper is that, uh, large language models and, um, machine learning models are performing pretty much similar in a, in a similar manner. Um, their performance, uh, like in terms of accuracy or F1 score were very similar with each other. And then, um, you know, that that’s something that we were very surprised about. And then we repeat the analysis once more. We try to see whether, uh, the two, whether the predictions from ChatGPT and machine learning models are mutually, um, exclusive or whether they’re complimentary, we find the same results. Um, they are complimentary with each other. Um, they are both statistically significant in the horse racing and they, what what this result suggests is that like, you know, machine learning models and GPT might convey some orthogonal information. And yeah, that’s pretty much what we find in the second part of the paper.

Glenn Hopper:

Fp and a today is brought to you by Data rails. The world’s number one fp and a solution data rails is the artificial intelligence powered financial planning and analysis platform built for Excel users. That’s right, you can stay in Excel, but instead of facing hell for every budget month end close or forecast, you can enjoy a paradise of data consolidation, advanced visualization reporting and AI capabilities, plus game changing insights, giving you instant answers and your story created in seconds. Find out why more than a thousand finance teams use data rails to uncover their company’s real story. Don’t replace Excel, embrace Excel, learn more@datarails.com.

How do you picture outside of academia? If someone is trying to apply this research today and they’re trying to figure out how to work in, whether it’s the machine learning learning models, and I know a lot of financial analysts have been using machine learning models for years, but if they’re really trying to leverage the power of these LLMs, how do you see someone being able to take the insights from your paper and be able to leverage ’em in a practical application?

Alex Kim:

Yeah, sure. That’s a great question, by the way. Um, so we’ve been thinking about this issue for quite a long time as well. So as you said, academic papers are different from practical applications. Um, first of all, what we think as a caveat, like there are two things actually. First one is that, uh, we are not using any textual information as our input. This is purely because of, um, you know, look ahead bias of the model. Uh, what what we mean by look ahead bias is, as I explained before, is trained on a large corpus of textual data and then say that if we provide some information to the model, uh, especially textual information, the model knows what the company is. And then if we ask the model, Hey, chat GPT, um, based on the information, predict the earnings of 2021. And then the model knows, for example, hey, uh, this information is actually from Apple in 2021.

I know that they released iPad, iMac or something like that. And then they experienced a nice fiscal year. And then it’s gonna say, uh, based on my prediction, they’re gonna, they’re, they’re gonna experience a nice year in 2021, um, because they released some nice products. This is not reasoning, it’s an answer based on their knowledge. It is not based on reasoning, it is just based on their memory. And then actually, if we include some textual information in our input data, it becomes super, super difficult to control for the look ahead bias. And then that’s basically why we didn’t include, uh, any textual information in our input. But in reality, um, analysts and like information processors, they do refer to many sources of textual information. As you said, 10 Ks MD&Asor press releases, conference calls. There are so many sources of textual information out there that are value relevant.

And then, and then they contain so much information about the future earnings. So in reality, I know that many practitioners are trying hard to incorporate some of the textual information into our model to provide a holistic view of the future earnings. And then that’s actually pretty nice in a sense that the product that financial experts, professionals are developing in practice nowadays, they’re developing products to predict the future that nobody has ever seen. So I mean, if they’re trying to predict the future that nobody has ever seen, they’re free, entirely free from the look ahead bias. And they can actually use whatever they can what whatever they want to, whatever they can, like, whatever data they want to use, like textual data, image, voice, any data, and then try to augment the model and improve the performance of the model. And then I think that is actually harnessing the full power of the large language models in terms of predicting earnings or performing financial statement analysis, which has not been done in our paper.

That’s one thing that could be directly applied to financial professionals right now into the, in the field. And then the second one is, uh, one thing that we would like to mention as a caveat is that the A&N model that we present in our paper, you know, it’s one of the standard models in, um, academic research, but one can do better actually in the real world. There could be many more hyper parameter settings, there could be many more specifications with more deep layers where the number of layers can be changed or like there could be other, you know, breakthroughs and machine learning methodologies. Recently there have been like KAN model, um, which is a developed version of the deep learning model that we currently use in our paper. You know, I mean, there are so many other things that people can try. And at the same time, there are many other things that people can try to the large English models that probably they can add one more fine tuning layer, they can actually construct a dictionary of the embeddings of the, uh, you know, the other company’s conference calls, transcripts or other things.

And then the model can actually search the database and give you more insights. I mean, there could be so many other things that people can do, but everything is limited in academic papers just because of the, uh, look ahead bias.

Glenn Hopper:

I completely forgot about that. And that’s an, that’s an important point, uh, is that, uh, because you had to prove your hypothesis, you couldn’t just have it make predictions based on the most current information you had to go give it historical information and completely blind that out and then see how it actually did and then see how the analyst performed. So it’s, it’s funny that you talk about that because I, for everyone else who’s listening in the paper, there’s a companion app, uh, that comes with it. It’s a, it’s a custom GPT that follows this chain of thought prompting where you can upload. So I’ve been testing this a lot, and I used, after reading about how the model didn’t do as well with loss making companies and with startup companies and smaller companies, I used Rivian , the electronic vehicle manufacturer, and I used all the way through their latest quarterly filing.

And, and then I had it make a, a prediction. And it’s funny, the model, it really wants to caveat everything. So I was trying to really drive it to predicting the earnings per share. And it said, well, if they do X, Y, Z, then this will will happen. But one thing because whatever, you know, information it has about rivian. But then I took it a step further and I’m, I’m working on a, a workflow where 10k analysis is, uh, an area where I think we could really leverage these LLMs and I’m using, uh, flowwise or make, I’m, you know, just different tools where you can sort of string these things together. So, um, you upload the 10 k, you upload the financials, and I’ll usually go three years back on the financial statements. Then I’ll add in like maybe there’s a, a, a part of the workflow that where perplexity is going out and looking for new relevant information and just adding that into its thoughts and everything.

But then you go through and I’ve got different assistance that will, you know, this one does the financial ratios. This one creates charts, this one does the, um, qualitative analysis, analysis and uh, and then they sort of aggregate into a report where the predictions come out. And I think that to the practical application point is because people don’t have these breaks on that are slowing ’em down, they might be able to actually improve, improve performance on those models. And that is, that’s a very important point. And I do see that being a way that people are, are practically applying this. Yeah, I

Alex Kim:

Totally agree with you. That’s actually a practical, that was actually a practical consideration. ’cause we wanted to make sure that the model follows all the steps that we wanted the model to follow. And then we probably experimented so many times. And then if we just ask the model to follow all the steps all at once, you know, for some reason it sometimes didn’t follow our instructions, just probably because the prompt was excessively long. I mean, the, the prompt that we use in the, uh, companion app, um, is different from what we use in our actual paper, uh, primarily because it includes like textual information and there’s way more to consider. And as you know, um, when the context window gets longer, the higher the probability is for the model to ignore your instructions. And then we experience that issue quite often. And then that was our second loss resort, um, to, to make sure that every, uh, everything goes, uh, flows in the way that we had expected. So, um, the step-by-step thing, I know that it’s kind of like annoying. I find it annoying by myself as well, but, uh, I mean that by that way we can ensure that the model’s following all the steps that be intended. Yeah. But like, as you said, uh, something that people can develop and like, uh, improve further. For sure.

Glenn Hopper:

And I think what we have to learn is, you know, and we gain so much from the types of research that you’re doing, but then we have to learn how to apply it. And I really think it’s so important right now for people to understand. And there’s a, i I don’t know who’s gonna, you know, who ultimately gets credit for it, but you hear that the phrase over and over that humans aren’t gonna be replaced by ai, but humans who use AI are gonna replace those who don’t. And, um, and that’s really, uh, I think we’re realizing that right now in, in companies we’re seeing where, you know, if a company, uh, doesn’t have a clear policy on use of generative AI or, or if it’s, it’s not enforced or they’re trying to lock people down, if the company’s not figuring out how to use it, then you’ve got employees going out on their own kind of these mavericks who are becoming basically cyborgs where they’re, you know, where they’re using AI on their own. And it’s not a, a company wide thing. And I, and I think part of the balance right now is we see the advantages of it. We don’t know how to use it, we don’t know where we can trust it. And I’m wondering, I mean, how do you see, how do you see that intersection and where finance teams are able to use to sort of combine that AI and human experience to, to enhance their overall decision making? How do you see that being applied?

Alex Kim:

It’s difficult to answer, uh, first of all, ’cause think about like two years before, uh, before Open AI had released, uh, GPT three, you know, could you even imagine this thing coming up within two years? I mean, last two years have been crazy. So many new things coming up every day. I mean, for Major NLP conferences, they’re receiving 8,000 plus submissions for each conference. I mean, it’s crazy. The number of submissions has tripled, quadrupled, uh, over the past two years. So this area is developing and will develop very fast, uh, in the coming years. And then for me, I’m not an expert in computer science. I do research in computer science, but I do not consider myself as an expert in this area. So I cannot answer for sure what’s gonna happen in the next five years. Nobody can actually answer what’s gonna happen in the next five years.

But my view related to the application of LLM in the field of finance, especially in practice, um, is a little bit more positive than other people. Um, so first of all, I think there are two main options, um, related to how firms can adopt this new technology. The first one is using LLMs as a supporting tool. So supporting tool means like, you know, textual supporting tool. As I said, you know, LLMs can summarize information, they can answer, uh, questions from customers, extract some information from complex textual sources. I mean, overall they can use Infor LLMs as their assistant. Um, in re in re in related to like routine tasks that involve textual information processing. And I think even now, uh, LLMs are doing pretty well in these tasks. I have several other papers on LLMs on how they can summarize information. And those summaries are actually very informative.

They explain better explain the contemporaneous market reactions. And I also have like other papers, um, extracting risk related information from conference calls, extracting tax audit related information, which is pretty much hidden in 10 K filings. I mean, even now LLMs are doing pretty well in extracting information. This thing is gonna happen, I’m pretty sure about that. Um, ’cause it doesn’t require that much verification, that much knowledge, that much capability or ability to maneuver this new technology is just happening. It’s just an assistant tool, um, that’s gonna improve the productivity of the workers or employees. And it’s gonna happen gradually. Some firms already adopting this new technology and then employees are actively using this new tools in their everyday workflow. But what remains uncertain is whether LLMs can play a more central role in economic or financial decision making. So I think our paper is one of the first to answer this question.

There are several other papers talking about this, our paper. And those other papers are a little bit different from other papers in a sense that they’re actually asking, Hey, can LLMs, you know, do more things like, instead of just doing some like RA stuff, uh, research assistant stuff, it is actually making some economic decisions. It is providing some human readable outputs. And then now it has become responsibility of humans to verify the human readable outputs and try to understand what is going on inside these models. What I am fascinated about large language models compared to other machine learning models is that large language models produce something that is interpretable. Machine learning models. Although they are really good at predicting things or classifying things, what remains a black box is how they do their tasks. It is complex. Things are messed up inside the model, and it is practically very difficult to go inside the model and try to try to identify what’s going on.

For large language models, it’s different. We can see the outputs and the outputs are human readable. So for financial experts, uh, they should be able to interpret and also read through the outputs that the models produce. And most importantly, I think the next direction is gonna be understanding where humans are doing better. As I said before, you know, there are areas where machines cannot do well and where humans are ought to be doing well or relatively doing well than the machine. And then identifying those areas and criticizing what the models are, what language models are producing, um, you know, it’s gonna be the future of the, uh, financial industry. I think, I mean, this is just my personal opinion, but that’s, I think where we are heading at, at least for now. Yeah,

Glenn Hopper:

I I agree completely. And you know, as you were talking through that, um, I end up, so you guys in this study used, uh, chat GPT-4 Turbo. Obviously since you’ve done the study, there’s GPT-40 or whatever that has slight Im improvements when I, my default and this, I’m not trying, this isn’t, I’m not trying, trying to make a commercial for open ai, but my default model is to use chatt GPT because of the data analyst tool. And as you’re going through and talking about that, I think about, I love there are advantages to Claude. I’ve used, uh, LAMA and, and Gemini and, but because of the built-in data analyst capabilities, I’m gonna default to chat GPT because I haven’t seen the equivalent. And I think it’s really important when we talk about what these LLMs do, you know, math is not a, a skill that’s inherent.

I mean, there’s this weird emergent capability where sometimes, uh, LLMs, you know, can do some math, but it’s, uh, it’s, it’s, it’s not what they’re designed to do. You know, it’s like a asking a, a cobbler to tune your piano or whatever. It’s, um, but so, but with chat GPT, with the data analyst tool, you see it’s going and it’s writing the Python code under the hood. So if you ask, um, Gemini or Claude to build you an amortization table or something as simple as that, chances are it’s gonna get it wrong because it’s looking in its knowledge base and it’s trying to find it. An important thing for us to, to keep in mind is we don’t want the language model itself trying to do our financial analysis for us because it’s not inherently good at math. Um, but if it has something, but, you know, we could, the, the OMS can write code, so we could have it create Python applications for us where we’re, you know, uh, running in, uh, in another environment. But I, I guess the, the question and all that is, have you done any research around or done any work with these other models that don’t have the equivalent of the data analyst tool where it’s actually writing the Python under the hood to do the math? And are you, have you seen any sort of interesting results from using those?

Alex Kim:

I see, uh, that’s a great question, by the way. So in our paper as an additional analysis, I, I, now I really don’t remember the number of the figure, but we tested G PT 3.5, um, which didn’t have any access to quantitative analysis. And then we also tried Gemini Pro 1.5. So actually, one thing that I would like to note about the paper is that, uh, we don’t include any tables in our analysis. We did a trick. So what we did as a trick is, um, we converted the tables into CSV format, and then we reconverted the CSV tables into TXT format. So basically what we see is tables with commas and line separators, and then, uh, we instruct a model directly and say, Hey, this is comma delimited file, so what will you see as coma is different column, what you see as line separators, different row.

And then actually they’re not interpreting the tables per se. Now what remains is whether the model can do the math or not. So we tried Gemini Pro, um, we tried GPT-3 0.5 G, PT 3.5 failed. I mean, they failed, but they were on par with analysts. They’re slightly below analyst performance g uh, uh, Gemini Pro was slightly worse than GPT-4 Turbo. Um, I believe that is because, you know, their quantitative capability is not as good as G PT four Turbo. But I mean, um, you mentioned Claude, uh, the most recent version of Claude O 3.5. Um, it’s also fascinating. It can do a lot of math. Uh, there are, quantitative reasoning is actually one of the best in the industries. So, uh, if we try again with Claude 3.5, I believe that we might be able to get a similar result, um, or even slightly better because, you know, their quantitative skills are actually better than GPT-4 turbo, but it’s slight, it’s not like a breakthrough breakthrough, it’s like a slight difference. So, um, it’s an empirical question whether it’s gonna be better or worse, but as you said, um, after, after like couple months, if Open AI releases GT 5and then if GT 5 is a genius in math, um, you know, that’s gonna be another breakthrough. Yeah.

Glenn Hopper:

Yeah. And it’s, uh, you know, those of us, and I, I’m, I’m sure you’re among us who are watching every day the news on this and watching the, the benchmark boards and see, and see that the new, uh, Claude 3.5 has, has jumped to the top in certain areas. And it’s, uh, and everybody kind of waiting for G PT 5, which I’ve heard rumors now some, uh, I saw something the other day, I couldn’t independently verify this, but that it may be late 25, early 26 before we see G PT 5. So it’s, uh, it’s fascinating to watch. And, you know, by the time this podcast airs, there may be a new leader out there. So it’s, you know, it’s, it’s, it’s an exciting time to be, uh, watching all this. Yes. So I’m gonna have to, I’m gonna have to try, uh, Claude 3.5 though on some of the, uh, this type of analysis because I am just defaulting to GPT-4 Oh, just because I know that data analyst tool, I can get my charts and now they’ve got those interactive charts and all that. So, uh, but we do need to, uh, keep up with the other ones as well. So that paper’s out your, your Fielding media calls, <laugh> on that and, and, and everything. Uh, tell me about what, um, kind of what’s next for you? What are you working on right now? Are there any new projects or areas of research that you’re excited about right now?

Alex Kim:

Yeah, of course. Um, <laugh>, actually, that’s a pretty, pretty nice question. There are several projects that we are currently working on, but, um, it’s a trade secret. We cannot disclose everything. Please stay tuned. We’re gonna disclose everything once they’re ready to share. Let me talk a little bit about my research interests. Um, and then I’m gonna just flow naturally into some of the ongoing projects that I’m currently working on. My interest still is related to information processing of investors. Um, you know, it’s the core question of accounting, as I said. Um, I mean, but if you dive deeper into information processing research, I think there are two lines of research. They’re interrelated with each other, but they’re a little bit different. Um, first line of research talks about the processing process itself. I mean like how people process information, how can we quantify, uh, the processing of like non unstructured data, textual information, voice information, how can we measure some characteristics of textual data, stuff like that.

That is actually the first, um, research interest that I’m currently working on. And then actually another line of research that is interrelated. But the first line of research is talking about the benefits and costs of technology. I mean, there are some big questions that are yet to be answered. Which group is likely to benefit more from the new technology? Which group is likely to lose or get replaced by the new technology? What are actually people doing with this new technology inside the firm? Um, I mean, our FSA paper is somewhat related to this second line of research. Um, it’s actually the combination of both, actually. It is talking about whether the model can do this, like, I mean FSA and then like whether the model is benefiting analysts or whether, um, it is indicating something where, um, analysts should focus more on like, stuff like that.

I mean, this second part of information processing research, I personally feel that we need more, a lot more. ’cause it is new. Many papers have been talking about LLMs can do, can do this, this, this, this, this, this. And then we’ve been talking about this for more than two years. I mean, we still don’t understand what people are doing with this new technology. Whether the technology itself is efficient, whether it’s giving you right answers or hallucinations, whether people who are using this new technology are benefiting or losing something in the market or in terms of productivity. There are some big questions that accounting researchers or finance researchers should answer. And my current research interests lie there, especially talking about whether investors are using this technology, if they have access to this technology, whether they’re information processing or whether financial decision making is gonna improve or not, what kind of prompt or what kind of, you know, the, the output data they should use.

Stuff like that. I mean, these questions are big questions that should be answered. I mean, this provides more causal evidence to the benefits and costs of large language models, or in broader terms, AI to the society or to the welfare of the society. Um, then my ongoing research projects are broadly investigating these issues, um, from a more broader, broader perspective. Um, this might include like surveys, experiments or archival research methods. Uh, but I’m very open to like many research methods to answer these causal and core questions that the new technology, new technologies are gonna bring to the financial market and the market participants,

Glenn Hopper:

The intersection where you’re working, you have to keep up with not just the latest in tech, but the latest in what’s going on in, in finance and in accounting, and how people are using this information and all. So I’m wondering, I mean, obviously you, I’m sure you read millions of papers and are are, are very, uh, tuned in and all that. But how do you, I mean, with everything moving so fast, as we were just talking about, how do you stay updated with the latest trends, both in, you know, in AI and in what’s going on in, in finance and, you know, keeping up with that and how you incorporate those into your work?

Alex Kim:

I, I try my best, but I cannot say that I’m the best person to do that. But like I try my best. Um, so as you said, there are two things that I have to consider. One thing is finance, accounting, research, and the second one is computer science or technology. So for finance and accounting, it’s easier in a sense that the number of new papers, especially in this field of accounting finance related to information processing, AI, large link models, asset pricing is relatively not that many. Like, um, there’re couple probably like couple 20, 30 papers every day that I have to keep up with. Um, so what I do is, like, I keep track of SSRN, which is, uh, an equivalent of archive of computer science and social sciences. So they, uh, researchers post their preprints, uh, for review and comments. Um, I, it’s my daily routine.

Every single day in the morning, I just go to SSRN, um, try to see whether there are any papers related to my area. I just read their abstracts if they’re, if I, if I’m interested in those abstracts, I try to read all the papers, papers. Um, I think I spend at least like one hour every day in the morning to try to see whether there are new papers. Um, if there are interesting papers, I share them with my co-authors with my friends, um, and then try to think about extensions or potential research ideas stemming from the new papers. I mean, one thing that I really don’t like about <laugh> social science research is that it takes too long to publish a paper. Uh, normally it takes about like one to two years to publish one paper in top journals. They had like within two years, even though they get published, and I even, even though I keep track of all the papers that are published in top three journals, uh, they are papers that were written like two or three years ago.

So they are pretty much outdated. So what I do is just like keeping track of new papers posted on SSRM for finance and accounting, but for computer science, it’s impossible. As I said, like for each conference, people submit like 8,000 papers. And then, um, I’m not a computer science researcher and, and I cannot do things like that. So my view about computer science research is that I have to get my hands dirty to learn something, especially related to coding technology. I have to get my hands dirty. Reading things helps, but it doesn’t help ultimately. So what I do is I try to get involved in at least one computer science research. Um, almost every time, like, um, this year I published one paper with my friends, they’re not researchers, they’re just my friends. Um, and one of the top AI , um, natural language processing conferences called ACL, um, it’s about like multi-agent, large language models debating with each other and talking about evaluation of texts.

So it’s pure natural language processing paper. But actually it stemmed from my interest in, um, you know, conference calls. ’cause I was interested in how conference calls were administered. And then I was interested in how managers and analysts debate with each other and conference calls and reach conclusions, or sometimes they do not reach conclusion. So that grabbed my attention and I was talk, I’m thinking about whether I can model that situation with multiple language model agents, and then I developed the idea into the paper. And I mean, like, once I have to write a paper with new idea, I will have to do the literature search, um, um, extensive literature review, like have to do the coding talk with my co-authors, even though it takes a lot of time. I learn a lot. I mean, I, a lot if you write one paper and if it, if it gets published in one of the top conferences, it has to go through the rebuttal process. You learn a lot by doing. So that’s how I kept track of the most recent technologies in computer science.

Glenn Hopper:

I love I love to hear you say that because it’s, um, I’m, uh, like you, I’m trying to keep up with as, as many papers as I can. And I’m, I’m gonna give a plug. Are you, do you know who Ethan Molik is? He’s a professor at Wharton. He’s actually, I think his PhD was in entrepreneurship, but he’s got <laugh>, um, he’s got a GPT that he built that is, I think it’s called, why is this important? And if there’s a paper that it’s, I’m only tangentially interested in, I’m not gonna read the whole paper, I just dump it into his GPT and it’s spit out, you know, the, the key factors of it. And it it’s the best summarization tool I’ve used. But to your point, I, yeah, I, I think it’s, um, I think it’s great. It’s, if you’re just operating in this sort of research only realm, it’s easy to get removed from the, the practical applications of it. So to hear you talking about going out there and doing and, and building, that’s, that’s for me too because I, I also teach a lot of courses on this, but if I’m, if I stop and I’m just going down that, that teaching path, and I’m not also making and building stuff along the way, then I feel like I’m, there’s a remove between what’s actually happening and, and what we’re talking about. So I, I love to hear you say that. That’s a great answer.

This has been just a, a fascinating episode. I’m, I’m so appreciative of you, of you coming on, and I do before we let our guests off the hook. I do because we, we’ve spent, um, some time together. I do at the end of the show, like, to find out a little bit more about you. So kind of our, our our question that we always ask is, um, what’s something, you know, we, we know, uh, obviously the nature of your research and, and the kind of work you’re doing, but what’s something, uh, maybe something on the personal side that not many people know about you

Alex Kim:

<laugh>? Yeah, actually, yeah, that’s a good question. So my life as a PhD student cannot be more transparent. I mean, I wake up every day. I, I just scroll to my computer, start my research it lunch, do research, eat dinner, uh, sometimes do some meetings, do research and go to bed. So my life history student cannot be more transparent. But one thing that people don’t know a lot about is, um, as my military experience, um, as a South Korean citizen, I had to serve in the military for 19 months. But my military experience was pretty, um, special. Because I, I got a chance to serve in the US military stationed in South Korea. So I mean, um, they chose several soldiers who were, uh, actually okay in con English conversations. And then they sent them to the, uh, US military bases stationed in South Korea. I was chosen as one of the, um, Korean augmentation to the United States, Germany and South Korea, uh, was sent there and then spent 19 months with US soldiers. Uh, that experience was super unique. I mean, I learned culture, uh, language, uh, mostly slang with f words, a lot of f-words

And then, um, and, um, what was actually very nice for me was like I, um, I did a lot of exercise, physical exercise, and then at the same time, I had a lot of free time, uh, starting from like five to 6:00 PM I was free. And then I used that time to take some lectures, uh, from Stanford, uh, talking about NLP. And then that was actually basically when I learned how to code and how to, um, think about these computer science projects that was like in the middle of my master’s program. Um, so it was like a fresh restart for me. Like I was, um, I was like thinking about, thinking about something completely different. I was thinking about computer science, I was thinking about natural language processing. I was like doing the training during the day, like the learning English culture and English language. So I mean, it was a pretty unique experience. I made lifelong friends that I talk to even, even now. Um, so I mean, like people, like many people know that, uh, south Koreans have to serve in the military, but when, whenever I say that I served in, um, the US military, they are kind of like surprised to know that I, I was like, you know, I working with, uh, working with the actual US soldiers. Yeah. Yeah. That’s,

Glenn Hopper:

That’s incredible. Yeah. Yeah. That’s a great story. And and what a, what a great use of your, your time too. I, I’m trying to picture how busy, I mean, you, you, you just, obviously your life stays busy with now as a PhD student, but serving in the military, working on your masters and doing the other courses at the same time, you <laugh>, that’s a lot. And, uh, and the, probably the, the physical activity helped to sort of balance things out and, and clear your head along the Yes, yes. A lot along the way too.

Alex Kim:

So <laugh>

Glenn Hopper:

Mm-Hmm. Alright, so I’m gonna throw you a little bit of a curve ball here too, but we have to ask all of our questions, uh, all of our guests this question. And it’s, uh, you know, we, we get a, a variety of different answers and I know you in, in the nature of your research, probably don’t spend as much time in Excel as many of our listeners do. But I’m wondering if you have a favorite Excel function, and if so, why

Alex Kim:

Okay, that’s another very interesting question, but, uh, okay. As you said, um, to be honest, to be a hundred percent honest, I do not use Excel too much, especially in terms of data manipulation or data processing. Um, this is purely because of the reproducibility of academic research. As a social science researcher, I take full reproducibility of accounting or finance research very seriously. I personally believe that based on the descriptions provided in the paper, everybody should be able to reproduce the findings to the main results in the paper without many difficulties. But if we do things in Excel, there is no way that we can provide the codes to other researchers. Um, especially like say that we create variables, say that we run regressions. Um, even though we get the results, there is no way to communicate ourselves to other people. So that’s primarily the only reason that I, that we don’t use Excel that much.

But there’s one thing, I’m a big fan of Excel in terms of creating figures. Um, I know that Python R data, like statistical software, um, they provide a lot of great figure functions. But one thing that I really don’t like about those functions is that they lack flexibility. They do provide codes to change colors and like everything, but they’re not user friendly. Uh, you have to memorize the num name of the numbers or like, you know, the codes for the shapes of the dots and everything like lines. But in Excel, you know, you can produce publication ready quality figures in several clicks away. I mean, they do provide a lot of flexibility changing colors. It can be done within like five minutes, like changing formats and everything. Um, and the resulting figures look pretty nice. That’s the primary reason that I love Excel. Um, so whenever I get the final data sets out from like any statistical software, I convert them into Excel file and then try to, um, try to get some figures that look nice from Excel. So, I mean, I don’t, I don’t have any specific function in mind, but, uh, figures in general are great in Excel.

Glenn Hopper:

Yeah. As, as someone who’s tried to battle through matplotlib and Seaborn and all that, I fully appreciate that. And, and yes, Excel does have, uh, have wonderful, uh, charts and graphs. And I will say though, again, I’m, I’m, I’m advertising for Open AI for some reason, but the, uh, new interactive charts that are on, uh, GPT-4, oh, I do love those. And those look good, and you can drill in and interact with them, but it’s still, it’s not gonna replace Excel and, uh, yeah. Yeah. So, so, so great response on that.

Alex Kim:

Mm-Hmm. <affirmative>. So,

Glenn Hopper:

Uh, well, Alex, we’re coming to the end of time here and I guess, uh, you know, just I’m, what a fascinating conversation and I’ve really, uh, I love the work that you’re doing and I’m, I’m looking forward to, uh, uh, continuing to follow your work. And for our, um, for our listeners is if they wanted to, uh, follow you and keep up with the work, what’s the best way for them to find you?

Alex Kim:

I have a website. Uh, I have my email address there. Um, or you can hit me off on LinkedIn. Yeah. I cannot guarantee that. I cannot, ans I can answer to all the emails that I get. But I’ll try my best, uh, to respond as soon as I can. So if you have any questions, especially related to my research, um, please don’t hesitate to hit me up on, on LinkedIn or via email. Yeah. All right.

Glenn Hopper:

Alex, thank you so much for coming on.

Alex Kim:

Yeah, thank you so much for having me. It was my pleasure. Thank you so much.

When AI Outperformed Financial Analysts – Alex Kim

Written by

Zvi Korn

Become a Partner

Drive Business Performance With Datarails

Drive Business Performance With Datarails

Drive Business Performance With Datarails

Drive Business Performance With Datarails

Drive Business Performance With Datarails

Drive Business Performance With Datarails

Drive Business Performance With Datarails