Tableau on Netezza is a common use case – and generally makes both look pretty good. Tableau is a great tool for visualizing data and allowing new analytic insight to be gained. Netezza as a backend allows for quick and interactive interrogation of the data. Where there can be a little challenge is in having one tool do the work or another; for example, an aggregation is going to be faster in Netezza – but Tableau may ask for raw data and want to do the aggregation itself locally. This is going to incur a network penalty to move the data, and the aggregation is likely occurring on a much smaller application server than the computing power of Netezza represents.
Tableau data source design comes in to play with how and where this aggregate data gets executed. Tableau “helping” with some fields can cause negative behavior in Netezza; for example, logic around Tableau dates may cause poor restrictions to be sent to the database, particularly in the case where dates are on a fact.
The difference between a CPU restrict and zone map FPGA restrict against a fact is an order of magnitude difference in speed – 100 second queries can go to 0.1 second queries. Date restriction is in almost every query, so this is an important component to get right when running Tableau on Netezza.
Referential integrity in a database should always be ensured by the data model and ETL process; one option to check when using Tableau on Netezza is the Assume Referential Integrity mark under the Data menu. This has the effect of the generated query only accessing the tables it needs for the query. Without this option checked, the generated query will connect all tables in a model data source. One fact table with five dimensions tables, but a graph only interested in facts and a single dimension, will go from a query hitting 6 tables to one hitting 2 tables when this option is checked. This will be particularly beneficial against the large datasets we typically see held in Netezza.
When running queries it may be beneficial to specify some session level parameters at run time, to alter query plan behavior. Although not commonly needed, this can be done through the Run Initial SQL option available in Tableau.
These are just some of the things to consider when working with these technologies together to optimize performance between both platforms.