Peak offers its customers the flexibility to choose between using their own Snowflake data warehouse or opting for a Peak-managed Snowflake cluster owned and operated by Peak. This document's scope is dedicated to setting up the Data Bridge, allowing customers access to their data on the Peak platform without the need to move or duplicate it from their existing infrastructure.
Snowflake is a cloud-based data warehouse platform, and it boasts the second-largest market share in the data warehousing category, following AWS Redshift.
The Data Bridge feature, when used with customer-managed Snowflake, empowers customers to leverage Snowflake's capabilities and extends the reach of the Peak platform to any customer that stores their data on Snowflake.
This article describes how to connect your Snowflake data warehouse to Peak.
TABLE OF CONTENTS
Process overview
To connect Peak to your Snowflake data warehouse, you will need to complete the following steps in both your Snowflake cluster account and in your Peak organization.
In your Snowflake cluster account:
To set up the data warehouse, you would require the below setups to be done at your Snowflake data warehouse:
Schema Creation: Peak requires a schema in the customer data warehouse from where:
Peak applications can read and process the input data
Peak platform can write the AI-driven data outputs back into the Snowflake data warehouse.
You need to create a user role in the Snowflake data warehouse which will have permissions to execute the operations needed for the smooth functioning of the AI application deployed.
You are also required to set up a storage integration between your Snowflake data warehouse and S3 tenant data lake to enable faster and more efficient loading/unloading of the data.
In Peak:
There are four steps to complete during this process.
Details
This step lets you name your data warehouse and specify the data warehouse type that you want to use (in this case, Snowflake).Configuration
This step lets you specify the Snowflake cluster details for your organization.
Use the details that you have extracted from your Snowflake account.Database
Enter the data extracted from your Snowflake account to set-up the connection with the database.Data Lake
This step lets you link your data warehouse to a data lake.Review
Once you have entered all your configuration details, this step lets you review everything before saving.
Pre-requisites
Get details of your Snowflake cluster
Go to your Snowflake account and get the following details for your Snowflake cluster:
account ID
password
user
role: This will be the role using which connection will be established.
region
- database
schema: Schema in which Peak will have both read and write access.
warehouse
storage_integration
Please note the snowflake account region must be the same region where your data lake is located for data bridge to work. In Above, it has been assumed that user, role and schema has already been created. In case user need to create a new one for Peak, follow steps in below section
Create user, role and Schema(s)
Peak requires a user, a role, and at least one schema (default schema) to connect your Snowflake account with Peak. The user and role need write access to the default schema for the setup process. Additionally, you can grant read access for any additional schemas to the user and share the schemas with the Peak team when required.
If you prefer to use an existing user, role, and default (write) schema, you can skip this step.
Please execute the following script in your Snowflake account by logging in using the admin role. The script performs the following operations. You can comment out parts of the script as per your requirements.
Create resource monitors to track usage and suspend warehouse if needed (recommended)
Create a warehouse and enable resource monitor for the warehouse You can use the existing warehouse if needed. (recommended)
Create a database or use an existing database.
Create a role.
Create a user with a password. The MUST_CHANGE_PASSWORD is set to False so the user can directly access Snowflake without logging into the web interface. If you set it to True, you are responsible for changing the password as per Snowflake requirements and using it wherever required. The default role and warehouse are set to the above-created role and warehouse.
The schemas based on values of DEFAULT_DBSCHEMA and READ_DBSCHEMA are created.
DEFAULT_DBSCHEMA: The role has write access for this schema.
The role has all grant for future tables on schema as we assume there are no existing tables in this schema and the role will be used to create/update/delete tables as per requirements.
The role has create stage grant to create a stage on the schema for loading/unloading data to/from tables in the schema. This is also allowed for future stages to make sure new stages are accessible to the role.
READ_DBSCHEMA: The role has read access for this schema.
The role has select grant for current and future tables.
The role has create stage grant to create a stage on the schema for unloading data from tables in the schema. This is also allowed for future stages to make sure new stages are accessible to the role.
Please save the username/password for the created user in a secure location for later use.
-- REPLACE THE VARIABLES BEFORE RUNNING THE SCRIPT
-- Note: If Account Edition = 'STANDARD' (Mostly used in non-prod environment) Please comment line SCALING_POLICY = STANDARD
-- and set DEFAULT_DATA_RETENTION_TIME_IN_DAYS = 1
SET TENANT = upper('<tenant_name>'); -- tenant name
SET DEFAULT_WAREHOUSE_SIZE = 'XSMALL' ; -- Machine type used for warehouse e.g. XSMALL | SMALL | MEDIUM | LARGE | XLARGE | XXLARGE | XXXLARGE | X4LARGE | X5LARGE | X6LARGE
SET DEFAULT_DBSCHEMA = 'default_schema'; -- Name of the input data schema e.g. stage
SET READ_DBSCHEMA = 'read_schema' ; -- Name of the read-only data schema e.g. read
SET DEFAULT_DB = $TENANT; -- DB that will be created, to use an existing one, replace the variable name
SET DEFAULT_ROLE = $TENANT||'_GP_ROLE'; -- Role that will be created, to use a different one, replace the variable name
SET DEFAULT_USER = $TENANT;
SET DEFAULT_WAREHOUSE = $TENANT||'_GP_WH'; -- Warehouse that will be created, to use an existing one, replace the variable name
SET DEFAULT_RESOURCE_MONITOR = $TENANT||'_RM'; -- Resource monitor that will be created, to use an existing one, replace the variable name
SET DEFAULT_DATA_RETENTION_TIME_IN_DAYS = 90; -- For the Standard account, this value needs to be set as 1
-- Create a resource monitoring policy
USE ROLE accountadmin;
-- Comment the below line if you are using the existing resource monitor or change the configuration as per requirement
CREATE or REPLACE RESOURCE MONITOR identifier($DEFAULT_RESOURCE_MONITOR)
with credit_quota = 40 -- Change the value as per requirement
FREQUENCY = DAILY
START_TIMESTAMP = IMMEDIATELY
triggers
on 50 percent do notify
on 75 percent do suspend
on 100 percent do suspend_immediate;
-- it takes some time for the resource monitor to become available
call system$wait(20);
-- Comment the below line if you are using the existing warehouse or change the configuration as per requirement
CREATE OR REPLACE warehouse identifier($DEFAULT_WAREHOUSE)
WAREHOUSE_SIZE = $DEFAULT_WAREHOUSE_SIZE
MAX_CLUSTER_COUNT = 2
MIN_CLUSTER_COUNT = 1
SCALING_POLICY = STANDARD
AUTO_SUSPEND = 100
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE
RESOURCE_MONITOR = $DEFAULT_RESOURCE_MONITOR;
-- use the warehouse for the rest of the script
USE warehouse identifier($DEFAULT_WAREHOUSE);
-- Generate random password
SET (DEFAULT_PASSWORD) = ((select randstr(10, random())|| ABS(MOD(RANDOM(2),100))));
-- Create the database if not exists
USE ROLE sysadmin;
CREATE DATABASE IF NOT EXISTS identifier($DEFAULT_DB) DATA_RETENTION_TIME_IN_DAYS = $DEFAULT_DATA_RETENTION_TIME_IN_DAYS;
--- Create the user and role
USE ROLE useradmin;
CREATE OR REPLACE role identifier($DEFAULT_ROLE);
CREATE OR REPLACE USER identifier($DEFAULT_USER)
password = $DEFAULT_PASSWORD
MUST_CHANGE_PASSWORD = false;
-- Use sysadmin role to grant permissions
USE ROLE sysadmin;
-- grant usage on the database
GRANT usage
ON database identifier($DEFAULT_DB)
TO role identifier($DEFAULT_ROLE);
-- Use the database
USE DATABASE identifier($DEFAULT_DB);
-- Create the schemas
CREATE SCHEMA IF NOT EXISTS identifier($DEFAULT_DBSCHEMA);
CREATE SCHEMA IF NOT EXISTS identifier($READ_DBSCHEMA);
-- grant usage on schemas in the database
GRANT all
ON SCHEMA identifier($DEFAULT_DBSCHEMA)
TO identifier($DEFAULT_ROLE);
GRANT usage
ON SCHEMA identifier($READ_DBSCHEMA)
TO ROLE identifier($DEFAULT_ROLE);
GRANT select
ON ALL TABLES IN SCHEMA identifier($READ_DBSCHEMA)
TO ROLE identifier($DEFAULT_ROLE);
-- grant create a stage on schemas
GRANT create stage
ON SCHEMA identifier($DEFAULT_DBSCHEMA)
TO role identifier($DEFAULT_ROLE);
GRANT create stage
ON SCHEMA identifier($READ_DBSCHEMA)
TO role identifier($DEFAULT_ROLE);
-- Use securityadmin role to grant permissions
USE ROLE securityadmin;
SET FULLY_QUALIFIED_DEFAULT = $DEFAULT_DB||'.'||$DEFAULT_DBSCHEMA;
SET FULLY_QUALIFIED_READ = $DEFAULT_DB||'.'||$READ_DBSCHEMA;
-- grant all on future tables
GRANT all
ON FUTURE TABLES
IN SCHEMA identifier($FULLY_QUALIFIED_DEFAULT)
TO role identifier($DEFAULT_ROLE);
GRANT select
ON FUTURE TABLES
IN SCHEMA identifier($FULLY_QUALIFIED_READ)
TO ROLE identifier($DEFAULT_ROLE);
-- Use role useradmin to grant permissions
USE ROLE useradmin;
-- Assign the role to the user and set default role and warehouse
GRANT
ROLE identifier($DEFAULT_ROLE)
TO user identifier($DEFAULT_USER);
ALTER user identifier($DEFAULT_USER)
set
DEFAULT_WAREHOUSE = $DEFAULT_WAREHOUSE
DEFAULT_ROLE = $DEFAULT_ROLE;
-- Use the role of account admin to grant permissions
USE ROLE accountadmin;
-- grant operates on the warehouse and stage. Comment the below line if you are using the existing warehouse
GRANT operate
ON warehouse identifier($DEFAULT_WAREHOUSE)
TO role identifier($DEFAULT_ROLE);
-- grant usage on the warehouse and future stages
GRANT usage
ON warehouse identifier($DEFAULT_WAREHOUSE)
TO role identifier($DEFAULT_ROLE);
GRANT usage
ON future STAGES
IN SCHEMA identifier($FULLY_QUALIFIED_READ)
TO role identifier($DEFAULT_ROLE);
GRANT usage
ON future STAGES
IN SCHEMA identifier($FULLY_QUALIFIED_DEFAULT)
TO role identifier($DEFAULT_ROLE);
-- Revoke the modify permission on the read schema
REVOKE MODIFY ON SCHEMA identifier($READ_DBSCHEMA) FROM identifier($DEFAULT_ROLE);
select
'"tenant": "'||getvariable('TENANT')||'",' as "tenant",
'"database": "'||getvariable('DEFAULT_DB')||'",' as "database",
'"role": "'||getvariable('DEFAULT_ROLE')||'",' as "role",
'"user": "'||getvariable('DEFAULT_USER')||'",' as "user",
'"password": "'||getvariable('DEFAULT_PASSWORD')||'",' as "password",
'"warehouse": "'||getvariable('DEFAULT_WAREHOUSE')||'",' as "warehouse",
'"resource_monitor": "'||getvariable('DEFAULT_RESOURCE_MONITOR')||'",' as "resource_monitor",
'"warehouse_size": "'||getvariable('DEFAULT_WAREHOUSE_SIZE')||'",' as "warehouse_size",
'"default_schema": "'||getvariable('DEFAULT_DBSCHEMA')||'"' as "default_schema",
'"read_schema": "'||getvariable('READ_DBSCHEMA')||'"' as "read_schema";
Set up storage integration
Before you configure your Snowflake data warehouse to work with Peak, you must set up storage integration between your Snowflake cluster account and the Amazon S3 data lake.
For more information on this process, see Snowflake docs: Create storage integration.
Storage integration in Snowflake is used to store a generated identity and IAM for external storage like Amazon S3, Google Cloud Storage, etc. Once set up, the same storage integration can be used with multiple stages to load/unload data to/from Snowflake. Peak requires storage integration setup so you can load/unload data from your warehouse to datalake as and when required.
The storage integration is associated with Amazon S3/Google Cloud Storage etc. In Peak’s context, Amazon s3 is considered as datalake. The data lake setup is already one as part of the prerequisite.
You will need the following data lake details:
STORAGE_DATA_LAKE : Name of the S3 bucket
STORAGE_DATA_LAKE_ROOT_PATH: Prefix for the folder/directory for which you want to allow access.
STORAGE_AWS_ROLE_ARN: AWS IAM Role ARN with permissions to access the bucket. The value of role Arn will be arn:aws:iam::794236216820:role/prod-<tenant_name>-System. where the tenant name is the organization name.
You can get the above details from the Data Lake details page on the Data Bridge Setup Page by Data Bridge -> Click on Data Lake Panel.
The value of role Arn will be arn:aws:iam::794236216820:role/prod-<tenant_name>-System.
To set up storage integration, go to your Snowflake cluster account and run the following script:
-- REPLACE THE VARIABLES BEFORE RUNNING THE SCRIPT
SET TENANT = upper(''); -- tenant name
SET STORAGE_DATA_LAKE = ''; -- S3 bucket name
SET STORAGE_DATA_LAKE_ROOT_PATH = ''; -- S3 bucket prefix (with a trailing slash) the access will be limited to. if want to give a bucket level then set '*'
SET STORAGE_AWS_ROLE_ARN = ''; -- IAM role arn which will have access to the S3 bucket
SET DEFAULT_ROLE = $TENANT||'_GP_ROLE'; -- Replace the variable name with the role name created in the previous steps
SET STORAGE_INTEGRATION = $TENANT ||'_STORAGE_INTEGRATION';
SET STORAGE_ALLOWED_LOCATION = 's3://'|| $STORAGE_DATA_LAKE || '/' || $STORAGE_DATA_LAKE_ROOT_PATH;
USE ROLE accountadmin;
-- Create storage integration
CREATE STORAGE INTEGRATION identifier($STORAGE_INTEGRATION)
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = $STORAGE_AWS_ROLE_ARN
storage_allowed_locations = ($STORAGE_ALLOWED_LOCATION)
storage_aws_object_acl = 'bucket-owner-full-control';
USE ROLE accountadmin;
-- grant usage on the integration to the role
GRANT usage
ON integration identifier($STORAGE_INTEGRATION)
TO role identifier($DEFAULT_ROLE);
select
'"integration": "'||getvariable('STORAGE_INTEGRATION')||'"' as "integration";
Script details
- tenant Name
This must be entered in lower case with no special characters or spaces. - Bucket Name
This is the name of the S3 bucket that was entered when your organization’s data lake was configured.
To find the name, go to Dock > Data Bridge > Data Lake and click the expand icon. - Root path
This is the root path of the S3 Bucket that was entered when your organization’s data lake was configured.
To find the name, go to Dock > Data Bridge > Data Lake and click the expand icon. - Account admin role
Use the accountadmin role to create the storage integration. - Storage integration name
The name of STORAGE_INTEGRATION can be changed.
If you do this, make sure you also change the name:- in the script at GRANT usage
- when setting up data warehouse connection under default_storage_integration.
Getting to the screens
To connect to a new data warehouse:
Go to Data Bridge section.
Click ADD DATA WAREHOUSE.
Entering the data warehouse details
Name your data warehouse connection: This is a unique identifier for your connection on the Peak platform. The name must be unique to your tenant. It cannot be changed after the connection has been set up.
Do not use space or special characters
The minimum character limit is 3 and the maximum character limit is 40
Use only alphanumeric characters
No spaces or special characters except underscore "_"
Shall not start with numbers and must end with an alphanumeric character
Choose Snowflake then click NEXT to move to the Configuration stage.
Snowflake configuration
Select platform:
Select the cloud platform where your Snowflake account is hosted. For Snowflake, the Peak platform supports AWS, Azure, and GCP.
Select region:
The region you select is where your data warehouse exists. Make sure that the region complies with your local data storage regulations. Based on the cloud platform chosen above, the system will show the list of regions supported by the Peak platform. Select the region.
Select Authentication type:
Peak supports both Basic and OAuth authentication. If you choose OAuth, you need to set up a Security Integration in your Snowflake cluster. Snowflake requires a security integration to authenticate using OAuth. Peak platform provides all the details required to create a security integration in your Snowflake account.
Step 1: Register the client:
Copy the SQL command provided under the "Create Security Integration" expansion panel on the Peak platform UI
Log in to your Snowflake account using the "ACCOUNTADMIN" role.
Execute the command in your Snowflake account to create the security integration.
Step 2: Gather details for client configuration
After creating the security integration,
Copy and execute the SQL command under "Get Client Details" to retrieve the Client ID, Authorization URL, and Token URL.
Copy and execute the SQL command under "Get Client Secret" in your Snowflake account to retrieve the Client Secret.
Step 3: Save all details for Data warehouse configuration and confirm the set-up is done
Save all the details (retrieved in step 2 above) to use them in the database stepper.
Check the confirmation checkbox to confirm the security integration has been set up.
Database
Snowflake data warehouse credentials
To facilitate connectivity between the Peak platform and your Snowflake data warehouse, we require specific connection details.
Details required: For Basic Authentication type
Account ID: Your Snowflake account ID
Password: Your snowflake account password.
Default User: This is the user name created in your database account which Peak will use while accessing your database.
Default Role: User role that you have assigned to give access to Peak.
Database: Name of the database on which you want to give access to Peak.
Schema: Schema that will be used for reading/writing data in the database.
Warehouse: Warehouse that will be used for computations on Snowflake.
Storage Integration: Storage integration with access to data lake.
The database user should have write access to existing and future objects in the provided schemas.
Details required: For OAuth Authentication type
Here you will use the security integration details that you have generated and saved in the Configuration Step(here)
Account ID: Your Snowflake account ID
Client ID: As generated in the security integration setup
Client Secret: As generated in the security integration setup
Authorization URL: It will be used to get the authorization code.
Token URL: It will be used to generate the token in exchange for the code.
Default Role: User role that you have assigned to give access to Peak.
Database: Name of the database on which you want to give access to Peak.
Schema: Schema that will be used for reading/writing data in the database.
Warehouse: Warehouse that will be used for computations on Snowflake.
Storage Integration: Storage integration with access to data lake.
In the case of Authentication type as OAuth for Snowflake data warehouse, the System will first redirect the user to the Snowflake page wherein the user will enter their Snowflake account credentials to Authorize Peak to make the connection. Upon being redirected to the Snowflake authorization page, the user must use the account details of a user who has access to the role specified in the “default role” field provided during the database setup above.
Test connection
After successfully entering all the mandatory fields, the system will enable the 'Test' button. This button tests whether Peak can connect to your database using the provided credentials. Once all information is entered correctly, click the 'Test' button.
Datalake
Link to a data lake
Here you will primarily connect your data warehouse with your data lake for faster speed flexibility and reduced latency. The system will show you the list of the data lakes that have been set up in your Peak tenant account for the data warehouse region. Please choose the data lake connection name that you want to link with the Snowflake data warehouse.
Reviewing your connection
Before you complete the configuration process, you can review the details you have given at each stage of the process.
To make changes, click Edit next to the option you want to change
Once the details are verified, click FINISH.
You will be taken back to the Data Bridge listing screen and your newly configured data warehouse will be shown as ‘Active’.
