One-liner: Tools that supercharge data-driven decisions in organisations.
VisionPhilosophyFinancially Sustainable Open-SourceSpecifics1. Data Acquisition2. Data Cleansing, Augmentation, and Standardization3. Data Storage4. Algorithmic Insights5. Data Visualization6. Automated ReportingDevelopment PlanPhase 1: IndiaPhase 2: Greater South AsiaPhase 3: ASEAN
Vision
We bring transparency and efficiency to Information Supply Chains by creating open-source analytics and machine learning software.
Philosophy
We always put our customers first. We do this by:
- Reducing vendor lock-in and encouraging forkability: We want to enable our partners to look at the internals of how a piece of software works, and fork it for their own use as they see fit
- Reducing costs: We want to help companies save cash by allowing other companies to build on top of our code, and bring the cost of enterprise data projects from 7-figure to 5-figure annual dollar amounts
- Reducing continuity risk: We want to ensuring sustainability of projects that we embark on with partners, regardless of our future choices and financial outcomes
Financially Sustainable Open-Source
We want to add value and create savings for our customers, while being a financially sustainable company that also creates good jobs. To do this, we:
- Charge a commercial license fees to organisations that want to make changes to the code-base without making these changes open-source (our open-source code will still be available under a GPLv3 license)
- Charge consulting and training fees to organizations that want to engage us to customise the code base for them, as well as charging a subscription fees for our cloud products
- Charge recurring fees for data feeds and cloud products, if our clients choose to use them
Specifics
We provide tools that work across the information supply chain.
1. Data Acquisition
We have 4 ways of acquiring data:
- Scraping publicly information from the internet, and then observing changes over time. This includes listings on job sites, property and car sites, government portals, flights and rails, electricity production, and more. This also includes scraping text, transcripts, and metadata from social media feeds, media reports, photos, videos, and audio recordings/podcasts
- Aggregating reports put out by research agencies
- Using paid feeds for hard to acquire data, like data about financial markets and paywalled data about the real economy
- High resolution and high frequency satellite imagery to detect changes over time
2. Data Cleansing, Augmentation, and Standardization
This includes 3 things:
- Cleaning HTML data to extract relevant numerical and text insights, and converting both these kinds of data into a standardized format for downstream storage and processing
- Tagging images, videos, and audio with specific metadata using machine learning (including transcription)
- Geo-spatial analysis on satellite imagery using machine learning
3. Data Storage
This is a simple database storage layer, and consists of 3 kinds of databases:
- A PostgreSQL database for all numerical data
- An ElasticSearch cluster for all text data
- Static storage buckets for static data (photos, videos, audio)
4. Algorithmic Insights
This involves querying the databases, and aggregating data from them to create useful insights for the end user. Examples of this could include an regional or sectoral economic index for a country, a "state of discourse"
5. Data Visualization
This involves dashboards where users can drill down to whatever granularity they want to, and where important emerging trends are automatically highlighted. These dashboards also allow users to download data or visualizations in any form they want – as CSVs, JSON, images, or SVGs
6. Automated Reporting
This involves automation creation and delivery of reports (web pages, PDFs, slide decks, and emailers) that are created either at a regular pre-defined frequency, or when interesting trends emerge
Development Plan
Phase 1: India
Create basic monitoring and reporting algorithms. Scrape things for India.
Phase 2: Greater South Asia
(Ideally) use the algorithms and dashboards created in Phase 1 to do analysis for Greater South Asia — Pakistan, Sri Lanka, Bangladesh, Nepal, and Bhutan
Phase 3: ASEAN
(Ideally) use the algorithms and dashboards created in Phase 1 to do analysis for ASEAN countries — Singapore, Malaysia, Indonesia, Thailand, Vietnam, Laos, Cambodia, Brunei, Myanmar