Extracting Data From Google Sheet With Matillion

Matillion Series — Part 1

Matillion is an ETL Tool with an ELT Architecture. There are many Data Warehouses that can be used with Matillion. Here, we are using Snowflake as a Data Warehouse.

ELT — Extract, Load and Transformation

So, before starting or building an ELT process, you should know that in Matillion, ELT is divided into two main parts:

  1. Orchestration Job

Orchestration Job:

The Orchestration Job is an area in which you can create the workflow for the Job from source to target. It is an arrangement of different components according to the job need with a specific starting point. The whole workflow can be designed as an Orchestration Job. It is majorly used for creating, dropping and also extracting data from various sources.

Transformation Job:

The Transformation Job is an area in which you can transform your data as per the requirements, it does not have any specific starting point as an Orchestration Job. In Transformation Job we can change the data which are already present within the tables.

Project:

In this project we are extracting data from Google Sheet. Then we are appending the new data to the Table in Snowflake and making sure the old data is still present.

Let’s start….

By making a Google Sheet as shown below. I have given the name of the spreadsheet as ‘spreadsheet’ and tab name as ‘Sheet1’.

Fig-1
Fig-2

Then, in Matillion start by creating a folder. Just do a right click on default version and then you will see first option as shown in the image create a folder. I have named the folder as Matillion New Job here.

Fig-3

As, over here I have given the name of the folder as a Matillion New Job. Then just do a right click on the folder name and click on the option Add Orchestration Job. Then it will ask for the name of the Orchestration Job. After, you have given the name then it will pop up ‘Job Successfully Created’ and will say ‘Switch to new job now?’ click on ‘Yes’.

Fig-4

Go to the components section which would be right below the Development Section then Search for for ‘Google Sheets Query’ Component. Drag that thing to job near to the start component it will automatically gets connected to the start component.

Fig-5
Fig-6

Then, as soon as you click on the Google Sheets Query Component then in the ‘Properties’ Tab down below there will be multiple options we have to fill by clicking the icon show in Fig-5.

Let’s start

  1. Name — ‘Component Name’
Fig-7

9. Data Source Filter- It is a filtering options on which you can ingest the data which follows certain conditions.

10. Combine Filters- keep it as ‘And’

11. Limit- No need to change

12. Type- Standard

13. Primary Keys- No need to Change

14. Select the options : Warehouse, Database, Schema, Target Table options as per your requirement.

15. S3 Staging area- whatever bucket you want to choose

16. Encryption- None

17. Load Options- Keep Default , it will recreate the table everytime when it runs.

18. Auto Debug- Off

Now, run

Fig-8

Now, create a Transformation Job like we created Orchestration Job. Right click on folder name and select ‘Add Transformation Job’.

Hint: Give the Job name as 01 Orchestration Job 
02 Transformation Job
So on to keep in order.
Fig-9
Fig-10

As, shown in the Fig-10., in the Section of Components search for Table Input then drag it to the transformation job. Keep the Database, Schema, Target Table same as given to the ‘Google Sheet Query’ Component as shown in Fig-9 and in Column Names select all columns same as shown in Fig-7.

Fig-11

Drag Table Update Component and Join it to the Table Input component.

Then, In snowflake create a table by running the query given below:

--Extracted table from Google Sheet Query Component 
--You can also use Rewrite Component of Matillion
--Same as shown below in Fig-12
CREATE OR REPLACE TABLE "DATABASE_NAME"."SCHEMA_NAME"."TABLE_NAME" AS
SELECT * FROM "DATABASE_NAME"."SCHEMA_NAME"."GOOGLE_SHEET_TABLE_NAME"

Run the above query in Snowflake to create target or final table.

Fig-12
Fig-13

Select the Database, Schema and Target Table as according to the Table created in Fig-12. Then, change the Join Expression option as shown in Fig-13.

Fig-14

Then, change the ‘When Matched’ options as shown above by clicking on the “+” and adding the case and operations column as shown above in Fig-14. So, when running this transformation job will delete all the rows from the target our main table which are having matching email.

Then, creating a transformation Job as shown in Fig-8.

Then, you can copy paste the Table Input Component from previous Transformation Job or else you can follow Fig-10 Steps.

Then drag the Table Output Component and attach it to the Table input component. Follow, the following steps given in Fig-15.

Fig-15

This final transformation Job will append all the data which are present in source table to target table.

Fig-16

Now, drag the components as shown in Fig-16. First with the transformation job with Table update Component. So, it will match according to email and delete the row which matches. Then, attach that with other transformation job that will append from that same table. So, what happens is that whenever the data is erased from google sheet and new applicants data is added, in this case it will retain the old data with different email address and it will update with the one having the same email address.

Let’s change the spreadsheet and run the job. So as to get better insights.

Fig-17

If you will compare the Fig-17 with Fig-1 you will see there is a change in the first row.

Fig-18

Now, do a right click on the Orchestration Job and run the job as shown above in Fig-18.

After the job gets successfully completed, go in Snowflake and check. You will see as shown below.

Fig-19

You, can see in the Fig-19 that the Jack data still exists in our database and in row number 2 you can see the new data appended.

Congrats, you have created a job in Matillion using Google Query Component.

You can get connected with me on Linkedin with more questions.

Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store