📖
What is Full Stack Data
💡
One-liner: Tools that supercharge data-driven decisions in organisations.

Vision

We bring transparency and efficiency to Information Supply Chains by creating open-source analytics and machine learning software.

Philosophy

We always put our customers first. We do this by:
  • Reducing vendor lock-in and encouraging forkability: We want to enable our partners to look at the internals of how a piece of software works, and fork it for their own use as they see fit
  • Reducing costs: We want to help companies save cash by allowing other companies to build on top of our code, and bring the cost of enterprise data projects from 7-figure to 5-figure annual dollar amounts
  • Reducing continuity risk: We want to ensuring sustainability of projects that we embark on with partners, regardless of our future choices and financial outcomes

Financially Sustainable Open-Source

We want to add value and create savings for our customers, while being a financially sustainable company that also creates good jobs. To do this, we:
  1. Charge a commercial license fees to organisations that want to make changes to the code-base without making these changes open-source (our open-source code will still be available under a GPLv3 license)
  1. Charge consulting and training fees to organizations that want to engage us to customise the code base for them, as well as charging a subscription fees for our cloud products
  1. Charge recurring fees for data feeds and cloud products, if our clients choose to use them

Specifics

We provide tools that work across the information supply chain.

1. Data Acquisition

We have 4 ways of acquiring data:
  1. Scraping publicly information from the internet, and then observing changes over time. This includes listings on job sites, property and car sites, government portals, flights and rails, electricity production, and more. This also includes scraping text, transcripts, and metadata from social media feeds, media reports, photos, videos, and audio recordings/podcasts
  1. Aggregating reports put out by research agencies
  1. Using paid feeds for hard to acquire data, like data about financial markets and paywalled data about the real economy
  1. High resolution and high frequency satellite imagery to detect changes over time

2. Data Cleansing, Augmentation, and Standardization

This includes 3 things:
  1. Cleaning HTML data to extract relevant numerical and text insights, and converting both these kinds of data into a standardized format for downstream storage and processing
  1. Tagging images, videos, and audio with specific metadata using machine learning (including transcription)
  1. Geo-spatial analysis on satellite imagery using machine learning

3. Data Storage

This is a simple database storage layer, and consists of 3 kinds of databases:
  1. A PostgreSQL database for all numerical data
  1. An ElasticSearch cluster for all text data
  1. Static storage buckets for static data (photos, videos, audio)

4. Algorithmic Insights

This involves querying the databases, and aggregating data from them to create useful insights for the end user. Examples of this could include an regional or sectoral economic index for a country, a "state of discourse"

5. Data Visualization

This involves dashboards where users can drill down to whatever granularity they want to, and where important emerging trends are automatically highlighted. These dashboards also allow users to download data or visualizations in any form they want – as CSVs, JSON, images, or SVGs

6. Automated Reporting

This involves automation creation and delivery of reports (web pages, PDFs, slide decks, and emailers) that are created either at a regular pre-defined frequency, or when interesting trends emerge

Development Plan

Phase 1: India

Create basic monitoring and reporting algorithms. Scrape things for India.

Phase 2: Greater South Asia

(Ideally) use the algorithms and dashboards created in Phase 1 to do analysis for Greater South Asia — Pakistan, Sri Lanka, Bangladesh, Nepal, and Bhutan

Phase 3: ASEAN

(Ideally) use the algorithms and dashboards created in Phase 1 to do analysis for ASEAN countries — Singapore, Malaysia, Indonesia, Thailand, Vietnam, Laos, Cambodia, Brunei, Myanmar