This article introduces you to the Data Bridge configuration options.
For a guide to the onboarding process, see Data Bridge onboarding overview.
Contents
- How does Peak store data?
- Why use Data Bridge?
- Data Bridge configuration options
- Supported configurations
- Peak managed Amazon S3 and Peak managed Amazon Redshift
- Customer managed Amazon S3 and Peak managed Amazon Redshift
- Peak managed Amazon S3 and Customer managed Snowflake
- Peak managed Amazon S3 and Peak managed Snowflake (with a read-only Snowflake share)
- Customer managed Amazon S3 and Peak managed Snowflake (with read-only Snowflake share)
- Supported configurations
Peak data storage
Peak uses two types of data store technology when storing data for use in the platform; a data lake and a data warehouse.
Data lake
This is where structured and unstructured data is stored in its raw form. Peak uses a data lake to store data from multiple sources and in multiple formats.
Data lakes make storing data easy as data does not need to be formatted or structured in a particular way. They are also highly scalable and can accommodate growing volumes of data. However, data must still follow certain standards, such as having basic metadata and date-tagging.
Peak supports Amazon S3 as a data lake and it must be configured before you can use the platform.
Data warehouse
This is where structured and processed data is stored. Data warehouses are designed with a relatively strict structure which aids robust querying and data analysis. They can help with data governance by ensuring quality and security and can streamline data by removing redundancies. Peak uses data warehouses to store tabular data and to access it for analysis and decision making.
Peak requires a data warehouse to be configured so that core features such as Data Sources and SQL Explorer can be used.
Peak supports Snowflake and Amazon Redshift data warehouses.
Why use Data Bridge?
Quicker onboarding
You will not need to configure multiple data interfaces. This means that fewer data feeds, or even no feeds, will have to be scheduled.
Once the required policies and permissions have been created, data is available for use on Peak as soon as you have stored it.Security
You have full control over how your data is accessed and who can access it.
Data is not exposed to the public Internet; for example, in the case of a customer managed Amazon S3 data lake, data is securely transferred between your AWS account and Peak’s AWS account via an AWS PrivateLink.No data duplication
Data is not replicated across multiple locations, making your data easier to maintain and helping to ensure data integrity.Flexibility
You can store your data in any format and use it in any way you see fit.Uphold your data localization laws
Data is stored on your infrastructure which helps to ensure that you meet your specific data localization laws.
Secure connection to your Amazon S3 data lake
This illustration shows how Peak securely connects to data within your S3 data lake.
Key:
Region
This is the AWS region where your account is located.Bucket Policy that gives limited access to Peak
Peak assumes the IAM role that you provide. This policy is set in your S3 bucket and provides Peak with limited access to specific storage paths.IAM Policy giving access to specific resources
This is the your IAM policy that specifies what Peak can access within your AWS account.
The policy contains the minimum number of permissions that Peak requires to work with your account. If preferred, you can also define extra permissions that will give Peak access to additional resources in your account.Amazon S3 data lake
This is where your data is stored if using AWS.Cross account IAM role
This enables you to grant Peak secure access to AWS resources in your account.AWS Glue Data Catalog (optional)
This is an ETL service that provides an index to the location, schema and runtime metrics of your data. It can assist data scientists with their queries, but it is not essential.STS Assume Role
This returns a set of temporary security credentials that are used for cross-account access.AWS PrivateLink for Amazon S3 Endpoint
A technology that provides private connectivity between Virtual Private Clouds (VPCs) and services.Amazon Redshift Spectrum
Amazon's data warehouse service.Containers that read and process data
These are the container instances that run Peak.
Data Bridge configuration options
When onboarding to Peak, Data Bridge enables you choose how your data is stored and accessed by the platform:
Peak managed
The data lake or data warehouse that is used by your Peak organization is owned and managed by Peak and sits within the Peak data infrastructure.Customer managed
The data lake or data warehouse that is used by your Peak organization is owned and managed by you and sits within your own data infrastructure.
You can choose between a fully Peak managed configuration or different combinations of Peak managed and customer managed.
Supported configurations
Currently, Peak supports the following data lakes and data warehouses:
Data lake | Data warehouse | |
---|---|---|
Peak managed | Amazon S3 | Amazon Redshift Snowflake |
Customer managed | Amazon S3 | Snowflake |
The following configuration options are available:
- Peak managed Amazon S3 and Peak managed Amazon Redshift
- Customer managed Amazon S3 and Peak managed Amazon Redshift
- Peak managed Amazon S3 and Customer managed Snowflake
- Peak managed Amazon S3 and Peak managed Snowflake (with a read-only Snowflake share)
- Customer managed Amazon S3 and Peak managed Snowflake (with read-only Snowflake share)
Peak managed Amazon S3 and Peak managed Amazon Redshift
Peak owns and manages both the data lake and data warehouse. The Peak platform connects to your infrastructure and ingests data into both.
Customer managed Amazon S3 and Peak managed Amazon Redshift
This configuration is suitable if you have your own Amazon S3 data lake that you want to use with Peak.
You own and manage the data lake and Peak owns and manages the data warehouse within the Peak environment.
Peak managed Amazon S3 and Customer managed Snowflake
This configuration is suitable if you have a Snowflake data warehouse that you want to use with Peak.
You own and manage the Snowflake data warehouse and Peak owns and manages the data lake within the Peak environment.
After onboarding with this configuration, Peak will have read-only access to the schema containing your raw data and read-write access to a separate schema that Peak can then write data back to.
Peak managed Amazon S3 and Peak managed Snowflake
(with a read-only Snowflake share)
This configuration is suitable if you have a Snowflake data warehouse but do not want to share any of your details with Peak.
Peak owns and manages both the data lake and data warehouse within the Peak environment and you create a "share" between your Snowflake data warehouse account and Peak.
Any data objects that you share with Peak will be read-only which means that they cannot be deleted or modified, including adding or changing table data.
Customer managed Amazon S3 and Peak managed Snowflake
(with read-only Snowflake share)
Peak owns and manages the data warehouse within the Peak environment and you give Peak read-only access to your Amazon S3 data lake and create a "share" between your Snowflake data warehouse and Peak.
Any data objects that you share with Peak will be read-only which means that they cannot be deleted or modified, including adding or changing table data.
