I spent the last week going through the experience of data prep using the recently released beta Project Maestro tool from Tableau. Here are some few lessons I’ve learned while conducting a set of datasets into a Maestro workflow:
1 | Set my final goal at the beginning:
For this Maestro practice I had one main table of daily transactions recorded in a Baby tracking app. This main table was then complemented by others within the relational SQL database of the app. Here is a snippet of the main source table and its headers:
From this table my main goals were:
- Replace all Unix dates with regular datetime dates (YYYY-MM-DD HH:MM:SS)
- Replace all data IDs with their respective human-readable labels from the auxiliary tables
- Some extra IDs were also present on the Value column (e.g boolean 1 or 0), so clean up those as well
- Export a tde and use it on Tableau
2 | Clean, Extract and Transform the dates
After literally breaking Maestro for the third time I realized that it is way better to spend some time cleaning up the tables before joining them. Once I put that on my mind Maestro worked like a charm. The first field I had to remove was the Blob column which only contained binary text for pictures that were added to the app.
Next, in order to transform the dates I’ve added a “New Step” to my Datasource which opened a “Clean” node in the workflow. In this single step I was able to create a couple of Calculated Fields to Extract and Transform the Unix dates from the Value column:
IF CONTAINS([AttributeName],’time’) THEN [Value]
Translation for humans: Maestro, whenever you see “time” in the AttributeName column, please bring the data from Value column.
Next I had to use another famous Tableau Calculation to convert the times into the format I wanter:
Translation for humans: Maestro we are going to create a field with a Date and Time format. To build this field we are going to make the field “Time extract” and convert it to a number, then we will add one second for each digit of the “Time extract”, this all starting on January 1st 1970
This calculation allowed me then to remove the [Time extract] field from the Maestro workflow. And this is really helpful tip in order to keep you sane using the tool. As soon as you don’t need a field, remove it!
At last I built another Calculation to replace the original Unix dates from the Value column, filling it with the new formatted dates:
IF CONTAINS([AttributeName],’time’) THEN STR([Transaction Date])
Translation for humans: Maestro, whenever you see “time” in the Attribute Name field, pick the Transaction Data value (as a TEXT); for all other fields just pick the data from [Value] column.
I repeated these steps to also transform the “ModificationDateTime” and built the [Recorded Date] field:
3 | Extract IDs from Value column
While exploring the [Value] column I found a type of attribute named “listvalue” that contained values like this:
This ended up being lists that we could build as we were using the app (e.g. list of foods our baby was eating). I then realized that prefix (the string the .) was always the same. I also found a table that was suspiciously named “FieldListItem”, in which I was able to see the labels that we added for each list entry. So, I’ve decided to trim this prefix out so I could deal only with the known code. For this I’ve used the good old Tableau’s Custom Split option:
TRIM( SPLIT( [ListValue extract], “.”, -1 ) )
This created the column that I needed with only the IDs that I would later replace with their labels. As a final manipulation on this stage I just needed to replace the original “listname”codes with the extracted ones:
IF [AttributeName] = “listvalue” THEN [CategoryFieldListItemId]
ELSE [Value fixed]
Translation to humans: Maestro you will pick the data from [CategoryFieldListItemId] column and place it on a new [Value end] field whenever you see the Attribute “listvalue”. For all the users just pick the [Value fixed]
Note: before I forget again, I’ve deleted the [Value fixed] field at the end of this Clean section.
4 | Joining the files
The first join that I made to my master file was the simplest one that I had. I’ve added the CategoryField table so I could bring the Names of the categories into the workflow. The table has these columns:
As soon as I loaded this to Maestro I’ve added the Clean step and removed the [DataType], [OrderValue] and [ViewInSummary] fields. This kept only the fields I needed for my join to be perfect match:
Note how the list of options from the two tables are identical. This started to look promising.
After this Join I’ve added a new Clean step where I was able to remove both fields that I used for the Join (seems weird right?) and kept only the [Name] column (which is what I just wanted to have on my final file).
My second Join was for the [Category] file and it followed a pretty similar logic. After adding datasource to workflow, cleaned all unnecessary fields, joined it with my last Clean and kept the column that I needed only (Category Name).
I followed the same rule with all other files that I joined. As you can see on my workflow below, after new files there is always a Clean followed by a Join step:
And here is my final piece of Maestro art:
As someone who started my professional career doing data migrations, this is simply an amazing powerful tool. And I can’t wait for my next Maestro trial project which is work related.
I hope you enjoy the reading and go ahead, give Maestro a try today!