My Personal ML / AI / Data Stack
Referencing the excellent 2023 MAD (ML / AI / Data) Landscape put together every year by Matt Turck of FirstMark Capital, both the number of categories and solutions within each category provide an almost overwhelming number of options a developer can choose.
While reading through these categories, I realized it would help me remember which ones I like the most and also which ones I wanted to go back and research further, so I've made a list.
I don't claim to use anywhere near all of these all the time but thought it would be useful (at least for myself) to note which tools / categories I find myself reaching for the most, and why.
This list will definitely change over time, which is another reason to document what I'm doing today so I can easily go back and compare down the road.
Primary categories:
Storage
- Backblaze
- Cloudflare
Data Warehouses
- Google BigQuery
Analytics
- Looker
Visualization
- Plotly
- Streamlit
Data Science Notebooks
- Jupyter
- Google Colab
Data Science Platforms
- Anaconda
NoSQL Databases
- Redis
Automation & Operations
- Zapier
Data Analyst Platforms
- Airtable
Applications - Horizontal
- Github CoPilot
NLP
- HuggingFace
- Google Cloud Natural Language API
- Lots of open-source and homegrown
Product Analytics
- Google Analytics
- Plus lots of others not listed here
ELT / ETL / Data Transformation
- Airbyte
- Airflow
Vector Databases
- ChromaDB
- Pinecone (if absolutely necessary)
Horizontal AI / AGI
- OpenAI
Closed Source Models
- OpenAI GPT-3.5, GPT-4
- Google Bard
Databases
- Postgres
- Redis
OLAP
- DuckDB
Streaming / Messaging
- RabbitMQ (I'm a total noob but we use it a lot at DemandSphere)
Stat Tools & Languages
- Python
- Pandas
- NumPy
- Dask (not listed)
AI Frameworks & Libraries
- Tensorflow
- PyTorch
- Keras
This is obviously a lot and, as mentioned, I don't use all of these all the time.
These are just the tools I have some level of familiarity with and find myself turning to vs. others listed here.
I generally favor tools that have at least an open source basis or are mainstream enough (such as Google's tools) that I'm likely to be able to integrate with them easily on client engagements.