Alternative data has quickly become a crowded tech niche for financial services firms and large institutional investors seeking nontraditional information sources to gain a competitive edge.
Fueled by multimodal AI technology, alternative data vendors scrape information from social media, satellite imagery, customer audio logs and structured text from documents, and stream it to clients by the terabyte.
Using alternative data has many benefits, but involves some significant problems. These include ensuring data quality and securing usage rights for video and photographic images that include people, trademarked products or other proprietary information.
Even so, the market for alternative data is expected to grow to $3.2 billion this year and reach $13.9 billion by 2026 at a compound annual growth rate of 44%, according to Research and Markets.
Among the top alternative data vendors are 1010data, Advan Research, Eagle Alpha, Preqin, RavenPack, Earnest Research, Thinknum, UBS Evidence Lab, YipitData, Dataminr, M Science, 7Park Data, Convergence, Geotab, JWN Energy and TalkingData, Research and Markets reported.
In this Q&A, Julia Valentine, a fintech expert and managing partner of AlphaMille, a New York City-based strategy and advisory firm that specializes in alternative data, multimodal AI and conversational AI, discusses how "alt data" and multimodal AI work, and why they're so popular.
Who uses alternative data?
Julia Valentine: That's probably the easiest answer because most companies do -- certainly financial services firms and investment management firms. If you are investing and you have an investment thesis, then this data is going to, first of all, help you formulate that thesis. And secondly, if you have an investment, it will give you a preview, or you can use it almost as a leading indicator.
Julia ValentineManaging partner, AlphaMille
In other words, before a company reports financial results, you can [use alternative data to] get a very good idea of its sales, how much they're selling or anything else about the company. You can learn about what's going on with this company or what its clients are thinking about it.
How do you determine the trustworthiness of alternative data?
Valentine: You determine it through analysis. Once you start using this data, you will need to create your analytics, and you are going to use the data in some kind of an analytic model that you build. If it's predictive, if it does give you actual insight, you can see that there's a link between what you're seeing through the satellites and the price of whatever the company is selling. Then, if you see that there is a value in it, you do statistical analysis. If it generates a valid prediction, then you keep using it.
Also, there are modeling risks -- models need to be thoroughly trained and tested -- and regulatory issues. Users of alternative data need to make sure that models are free of bias and not discriminatory.
If you're a data scientist or citizen data scientist at an enterprise, can you create your own alternative data stream?
Valentine: You can. And the tool that allows you to create it is multimodal AI. With multimodal AI, you're getting into something very sophisticated because now not only are you supplementing the financial data that everyone has with alternative data that everyone maybe doesn't have but could potentially buy, but you're also investing in and creating your own. You're collecting your own data streams. It's very powerful. It is used by sophisticated financial institutions, as well as by mainstream companies that want to understand their clients better and want to do it in real time.
Multimodal AI is indifferent in terms of whether it's processing text, video, computer vision, photos or audio. The world is not just text.
For example, you can transcribe audio when you're looking at the complex logistical chains of a company and parts of the chains are in different geographic locations, with different languages and companies involved in production. Sometimes your information is about a chain in various continents, and companies that are working on some of the value chain are foreign and could report in Chinese, Spanish or other languages. Multimodal AI has built-in multilanguage recognition.
How long has multimodal AI been around and to what extent is it a part of alternative data?
Valentine: Probably the last five years is when it's really hit the market. Multimodal AI exists on its own. Out in the data market, you buy that alt data from an alt data provider, while the use of multimodal AI essentially means that you are the creator of that data.
From the standpoint of the end user, if you buy alternative data, most likely your alt data provider used multimodal AI somewhere in the process. If the alt data provider sells you credit card data, they probably just bought it from credit card companies. But if they wanted to go out on social media and somehow supplement that data, they could use a number of tools to do that. And one of those tools is multimodal AI.
How are AI and machine learning actually used to provide alternative data?
Valentine: Anytime you're working with data, you can use machine learning. What you can do with machine learning and data is create an ontology. If there's a ton of data and you don't know what all of the different categories could be for that data, instead of saying, 'Here's credit card data,' just use demographics, or sort everybody into male, female or age category. This is an ontology -- a way to group things. Or you feed it into a machine learning model and let it suggest its own ontology.
We can use the model to look at all of the patterns that we cannot see by ourselves because it's overwhelming. It's millions of records. All of a sudden, [the ML model] can come up with something really insightful and cool because it will see the patterns and create the ontology that maybe we didn't even think of.
Here's an example. Banks have call centers. Customers call the call centers constantly. Some of them have a quick issue to resolve and that's it. Others call with all sorts of complaints. Everything is on a recorded line because the banks want to learn from it.
For our bank clients, we use multimodal AI to listen to all of these anonymized conversations. Then there's an ontology for all these phone calls. If you listen to enough of them, at some point you're going to end up with something really cool. First of all, you're going to end up with a list of the biggest issues that annoy the heck out of that bank's clients. And secondly, here's your list of the most pressing product improvements. This gives you the ability to reduce the attrition of your clients.
Editor's note: This interview has been edited for clarity and conciseness.