Database Guide

Data is one of the most important parts of geospatial or I would say almost every industry. Each Industry is now taking data management seriously, thus it becomes an important part of our career as IT people to understand it. This guide is a one-stop article walking you from the ground (What is Data) to the sky ( Understanding and using geospatial queries).

⭐ Basics of Database

✅ What is Data?
Data can be any information that you store to access it in the future. It can be numeric, text, media, location, date, etc. This can be written on paper, can be saved on a computer hard-drive, or even stored in the cloud.
✅ What is Database?
The collection of multiple data entries together creates a database. Databases are usually created so that users can access a large chunk of data and perform certain operations on it altogether.
The database can be anything —
Your grandma’s notebook with all her delicious recipes, your parents’ account books stating all transactions, your Facebook page listing all your friends, etc. All these examples show that all the data in a database are more or less of the same type.
✅ The need for a Database?
Creating a Database makes it easy for different users to access data sets, above examples showed that we can store a similar type of information for each data entry in a database, which is just partially true, with the rise of No-SQL databases, this definition is now changing (More about it can be read below in the article ).
As websites are becoming more user interactive and larger in size, data of the users, customers, orders, etc. are important assets to the companies, thus need to have a scalable and reliable database and engineer who can understand it is the need of an hour.
✅ Database Management System ( DBMS )
So far, we know that data and databases are important, but how do we work with databases in computer systems? that’s where DBMS comes into the picture. DBMS are the software programs that provide us a way to interact with databases on a computer to perform various operations such as Creating data, editing data, inserting data, etc. DBMS lets us do this by providing us APIs that can be used to do so. Rarely, any software won’t use DBMS to play with the data stored on disk.
DBMS software apart from data operations, also handles backups, securities checks, database health checks, etc. hence it is advisable to always use a DBMS whenever working with databases.
✅ Spatial Data and Database
Since we are focusing more on the spatial side rather than the generic ones, I would discuss such data type here. Spatial Data types are a little different than other data types. We need to save co-ordinates in a specific format ( generally given in the documentation of the database website ) so that the databases can read it and make sense out of it. While Generally the data can be queried like Get all results where age > 15, for the spatial query we usually look for something like Get all results within 10 KM radius of a specific point. Hence make sure you are saving the Spatial data in the correct format.

⭐ Types of Databases

Databases are generally divided into two types Relational Databases and Non-Relational Databases. Both of these have their own pros and cons. It would be foolish to say one is better than others, but it will depend upon the use case. Personally for Spatial Data, I, 99% of the time use Relational Databases, and you’ll see why soon.
✅ Relational Databases and RDBMS
Consider that your boss asks you to create a sheet of vital information such as the name, address, email, phone number, the job title of all employees. You will immediately pull up an excel sheet or google sheet, write all these column titles and start feeling information

Sample Information Sheet

The pattern here is that each entry has limited and fixed columns that we need to feel up. Thus we have created a table of all information, each entry will have a unique Primary Key making it unique and accessible for all operations. In Relational Databases, each type of table contains fix numbers of columns, we can also establish relations between multiple columns

Relation Between 2 columns

We’ll have a deep dive into relationships in Rational Database further in the article.
The downside to using a Relational Database is it is a little bit slow to fetch result when data increases massively in comparison with No-SQL Database, another downside is that in Relational Database, each entry must follow the rules of adding data ( Column types, column numbers, etc.), we can’t just add a separate column for just one entry.
Relational Databases uses SQL (Structures Query Language) to let users communicate and interact with data stored in tables. SQL has proven to be one of the most widely used languages for this purpose, We’ll study more about SQL in the later part.
Examples of some famous and most widely used Relational Databases are PostgreSQLMySQLMSSQL, etc. Each major Relational Database companies have their own version of SQL which for most parts looks same, but sometimes small changes are needed to run the same query to get same results in different Database ( e.g. from PostgreSQL to MySQL)
✅ Non-Relational Databases (No-SQL)
Any databases which are not relational databases fall in the category of non-relational databases. Generally, the data is stored in any format other than a table such as
1️⃣ Key-Value Pair
2️⃣ JSON, XML format
3️⃣ Graph Format
The major benefit of a No-SQL database is each row is independent, each row can have different columns. As shown in the image below, both users belong to the same table Core_user but both user entries hold different information

No-SQL Realtime Database in Google Firebase

In No-SQL Database, users sometimes have to write their own logic to add a unique key to each entry to make it accessible, most of the Industry standard No-SQL Databases such as FirebaseMongoDB uses JSON format to store the data. Because of this, it is very easy and convenient to perform operations on data from web applications using Javascript, Python, Ruby, etc.
✅ Database type recommendation for Spatial Data
It is obvious that we want to save the point, line, polygon, rasters, etc. in the way that it makes sense, rather than saving it as just coordinates, apart from saving data, we also need a DBMS who understands and helps us to query the data in Spatial Way (Buffer, intersection, distance calculation, etc. ). So far, Relational Databases have proven more powerful for such conditions because of functions available in SQL language for such operations. Using Extensions like PostGIS on PostgreSQL opens the window of opportunities for developers to write complex spatial queries. On the Other hand, No-SQL is still working on the spatial domain, Companies like MongoDB provides some functionalities to perform a geospatial operation, but Relational Databases still owns the market by a large margin.

⭐ Working with RDBMS

We’ll be focusing more on RDMBS in this article since that is what we’ll be using most of the time to store and work with spatial data. For the sake of choice, we’ll be using PostgreSQL as it is The World’s Most Advanced Open Source Relational Database, and its extension PostGIS gives us the ability to work on spatial data as well. You can install PostgreSQL by following the steps in the documentation. Along with PostgreSQL, It is also advisable to download and install the pgAdmin, pgAdmin provides a web interface to communicate with a database, which otherwise can be done either by downloading some installable software or command line.

pgAdmin 4 on mac

Users can change a lot of settings for databases such as Port, Username, Password, accessibility with the outside world, memory allocation, etc. but that is the topic for another day. In this article, we’ll focus on the data side of the database.
✅ Creating Database — We need to create the database (ideally for each project, there should be one database)

Creating a new database for the project

The Query Tool way to create a Database is as follows :

CREATE DATABASE <database_name>

✅ Creating Tables — Creating a table needs little more thoughts since this is where we need to define all columns and their data types. You can find all data types that can be used in PostgreSQL here.
pgAdmin allows us to select various constraints and keys such as Not Null, Primary Key, etc. from the table. We’ll discover more about it later.

Creating users table

Notice that we didn’t add a primary ID column to the column list, because PostgreSQL does that automatically for us. We can create as many tables as we want in one database. Once the tables are created, we can then establish relations between different tables based on specific columns (usually id ).
The Query Tool way to create a Database is as follows :

CREATE TABLE <table_name> (
<column_1> <datatype>,
<column_2> <datatype>,
..
.
..
<column_n> <datatype>
PRIMARY KEY (<column>)
);

✅ CRUD Operations on data in tables
CRUD ( Create, Retrieve, Update, Delete ) are the ‘hello world’ of DBMS, and since these are the most commonly used operations, all RDBMS has the same verbs to perform them. We’ll be writing and executing queries in the Query tool in pgAdmin, which can be used as follows.

Query tool in pgAdmin

1️⃣ Creating New Entry
Adding a new entry to the table follows the following command

INSERT INTO <tablename> (column1, column2, column3,...) VALUES (value1, value2, value3,...);

INSERT, INTO, VALUE are the verbs used by SQL, thus they should not be used as variables or values, etc. To add a new entry to our user table, we’ll write the following query to the Query tool.

INSERT INTO users(name, employed, address) VALUES ('Sheldon Cooper', true, 'Pasadena');
Insert Query

Please note that string should always be written in ‘ ’ (single quotes) not in “ ” (double quotes)

2️⃣ Retrieving Entries ( All or few )
Data stored in the database can be retrieved and displayed. We can fetch entire data or limited entries as well. Code to Fetch the data is

select <column1, column2 ,...> from <tablename> 

This will fetch the entire dataset, if you want to fetch only 20 entries, then you can write

select <column1, column2 ,...> from <tablename> limit 20

If you want to fetch all columns, then instead of writing all column names, you can write

select * from <tablename>

If you want to fetch the result, with some specific condition, you can use the Where keyword, as follows

select * from <tablename> where <key> = <value>

You can even create complex queries, which we’ll talk about later.
In our example, we can retrieve the data we want

--Retrieving Specific columns for all users
select name,employed from users--Retrieving all columns for all users
select * from users--Retrieving all columns for first 3 users
select * from users limit 3--Retrieving all columns for all users where employed = true
select * from users where employed = true
Retrieving Data from table

3️⃣ Updating Entries ( All or few )
RDMBS allows us to update all or limited data entries to set new values to columns.

UPDATE <tablename> 
SET <column1> = <value1>, <column2> = <value2>

If you want to update specific rows, then you can add conditions by using Where

UPDATE <tablename> 
SET <column1> = <value1>, <column2> = <value2>
WHERE <column> = <value>

In our case, we’ll update tables with the following queries

-- Make all rows as  employed = true
update users set employed = true-- change employed = false for entries with address = 'nebraska'
update users set employed = false where address = 'nebraska'
Update entries

4️⃣ Deleting Entries ( All or few )
Deleting entries is easy in SQL, user can either delete all rows or delete specific rows by adding the Where clause

-- Deleting all entries 
Delete from <tablename> -- Deleting entries based on conditions
Delete from <tablename> where <column> = <value>

For our example, queries will be as following

-- Deleting all entries 
Delete from users-- Deleting entries based on conditions
Delete from users where employed = false
Delete Entries from table

CRUD Operations are used a lot since it provides basic functionalities to Database.

It is a good time to take a break and play with CRUD before moving into advanced SQL and Spatial Queries.

Part 2 of this series focuses on using Keys & Constraints to make complex relations between tables in RDMBS