Databases#
This chapter will give a brief introduction to databases on a level sufficient for usage in the course. As many databases use SQL (Structured Query Language / “sequel”) or a variation of this, we will also show SQL examples.
Categories#
Relational databases/SQL, e.g., MySQL, Microsoft SQL, Oracle DB
NoSQL, e.g., Cassandra, MongoDB
Traditional tables, but no relations between tables
Vector databases
Graph databases
This list is non-exhaustive both with regard to categories and examples.
Relational databases#
Tables with fixed attributes
Each column has a name and a storage type.
One (ore more) column(s) defined as a key, only accepting unique values.
Tables are connected (relations) in one of three ways:
One-to-many, one-to-one, and many-to-many
We will be using MySQL as an example of this category
NoSQL#
Cacheable, parallelizable, scalable to exabytes of data.
E.g., XML databases for querying XML documents based on attributes.
May not use tables at all, only key-value pairs and corresponding unstructured data/objects.
Some use a subset of SQL commands (NoSQL = Non-relational SQL).
Trades a bit more strict querying for speed and scale.
We will be using Cassandra as an example of this category.
Vector databases#
Like NoSQL, these are made to overcome limitations in relational databases.
Data stored as high-dimensional vectors representing features or attributes.
E.g., transformations or embeddings.
Applied in natural language processing (NLP), computer vision (CV), recommendation systems (RS), etc. when searching for similar objects.
E.g., find the image most similar to my search image.