πŸš€
Identifying emerging opportunities with alternative data
πŸ’‘
Key Idea: Can you use data to identify valuable opportunities before others realise they are valuable?
I'll ignore the Red Queen effect. If a lot of people have access to the same data, the opportunity may already be fairly priced. For now, most investors don't seem to be using alternative data. Will we know if this data becomes mainstream, though? πŸ€·πŸ½β€β™‚οΈ

A. What is alternative data, and what makes it useful?

Data that is not traditionally used by investors in making capital allocation decisions. Generally includes non-GAAP and non industry-accepted metrics.
Useful alternative data is:
  1. Unfalsifiable – cannot be gamed by people with incentives to game the system
  1. Indicative of future performance, based on models
  1. Vetted as "sensible" by high-believability people (bulwark against lucky correlations)
Β 

B. What are the kinds of alternative data

Geography related

  • Satellite images showing an area with increasing construction actiity
  • Hyperlocal vehicle (car and motorcycle) sales data
  • Change in job listings from a given area on online job sites

Market related

  • How are search trends related to a market changing?
  • How are forum and blog descriptions of a market changing?
  • How are podcast discussions about a market changing?
  • How are discussions from key influencers about a market changing?
    • Did a key person suddenly signal interest publicly in a market?

Company related

  • How are search trends related to a company changing?
  • How are forum and blog descriptions of a company changing?
    • Particular relevant for D2C companies (consumer complaints) and SaaS companies (mentions on Hacker News and key blogs)
  • How are discussions from users about the company changing?
    • Did a really believable person tweet about the company? What did they say?
  • What kind of people talk about this company on social media?
  • How is this product performing on marketplaces?

Regulation related

  • Are key political leaders or influencers talking about a particular topic in their speeches, social media, or blog?
  • Have influential think tanks or economists just published papers about a topic of interest?

C. Meta approach to acquiring and analyzing data

  1. Be a data hoarder and gather more than you need. Storage is cheap (only compute is expensive). You never know what will come in handy
  1. Trends over time are super useful. Save this data in databases that make querying easy
      • PostgreSQL for numerical data
      • ElasticSearch for text data
  1. Visualize this data in easy to access dashboards. Critically important that you do this instead of relying purely on models (more on that in Section D) β€” helps create sanity checks and identify flaws in the data
  1. Always assume that bad actors are trying to game the system. Never only look number of comments or sentiment associated with a topic. Also add a believability rating to the accounts making those comments (on Hacker News, Amazon, Twitter, Product Hunt etc), and incorporate this believability in your metrics

D. Creating models with this data

  1. Quantify what is useful, not just what is readily available. If things that are useful are not easily quantifiable, then use machine learning to attempt to quantify it
  1. Prioritize investment models whose predictions you can understand over black-box models – even when the black box models perform better on test data. Inputs to investment model can be based on black-boxes (for example: using image or text predictions from deep learning models as inputs to your investment model), though
  1. Regardless of how well your models perform on training data, do not start making decisions based on them straight away. Instead, test them in the real world for a while to see if they result in good decisions. If the first 2 or 3 early decisions are bad (or good) – wait for more data! Randomness exists, so small numbers don't matter much