Once a file has been uploaded, you have to create a feed to ingest the data to the tenant’s data lake.
Contents
- Before you start
- Getting to the screens
- Process Overview
- Import Configuration
- Destination
- Setting a trigger
Before you start
- Make sure your file has been uploaded to Peak.
To learn how to do this, see Ad Hoc Uploads. - Check that your files are named with a timestamp.
If data from the file is to be updated and fetched regularly as part of a feed, files must be named with a timestamp so that Peak loads them as part of the same feed.
Getting to the screens
To create a feed for a file:
- Go to Dock > Data Sources.
The Feeds screen appears. - Click ADD.
The Choose Connector screen appears. - Go to the File Storage section and click File Upload.
The Create Feed screen appears.
Process Overview
There are three stages that need to be completed when creating a data feed for a file:
- Import Configuration
- Destination
- Trigger
To find out how to create new and edit existing data feeds, see Managing your data feeds.
Import Configuration
If you have already uploaded a file, either via the Peak interface or a signed URL, go to the File drop-down and select your file from the list.
If you haven’t already uploaded a file, click UPLOAD NEW.
For details of this process, see Ad Hoc Uploads.
After choosing your file, click NEXT and complete the fields.
Once complete, click NEXT to move to the Destination stage.
Import Configuration Fields
File type
Choose the type of file: CSV, JSON or XML
CSV
Value separators can be:
- Comma
- Tab
- Pipe
XML
- Enter a value for the root tag
Feed load type
Primary key (optional)
The primary key is only mandatory for an upsert feed.
Feed name
Enter a suitable name for the feed:
- The name should be meaningful
- Only alphanumeric and underscore is allowed.
- It must start with a letter.
- Must not end with an underscore.
- Up to 50 characters allowed.
Destination
The Destination stage enables you to choose a where your data will be stored.
Choose a destination
The destination is where the customer data is stored by Peak.
It can be either S3 (spark processing), Redshift, or both.
S3 (Spark processing)
This is Amazon S3 data storage.
Apache Spark is used by Peak to process large, unstructured (CSV) datasets on Amazon S3.
Redshift
This is Amazon Redshift data storage.
Data stored using Redshift can be queried using SQL. This makes it possible to run frequent aggregations on really large datasets.
Redshift is a relational database and any data that is fed into it has to map exactly - column by column.
Any failed rows are flagged and written to a separate table.
Failed row threshold
This is the number of failed rows that are acceptable before the feed is stopped.
The threshold should reflect the total number of rows that are being written to the table and what is an acceptable proportion of fails before the quality of the data could be considered compromised.
Changing the data type of a schema
When specifying the destination for a data connector, you can change the data type of your schema.
This function is available for all connectors apart from Webhooks and the old agent based feeds.
Choose the required column name or field name and click the dropdown icon next to the Suggested Data Type. The following data types are available:
- STRING
- INTEGER
- NUMERIC
- TIMESTAMP
- DATE
- BOOLEAN
- JSON
Note:
In the current release, TIMESTAMPTZ is not supported.
Any data in this format will be ingested as a string by default.
Setting a trigger
From the Trigger stage, you can define triggers and watchers:
- Triggers enable you to define when a data feed is run.
- Watchers can be added to feeds to provide notifications of feed events to Peak users or other systems.
Triggers
Triggers enable you to define when a data feed is run. There are three types of trigger:
- Schedule trigger:
Schedule when the feed runs. A basic and advanced (Cron) scheduler is available. - Webhook trigger:
Trigger a feed to run via a webhook from another system. - Run Once trigger:
Trigger the feed to run once at either a set time or manually from the feed list.
Basic Schedule Trigger
- Basic schedules use days and time.
- The feed will run on the selected days (blue).
- Enter a suitable time or frequency for the tenant’s environment.
Advanced Schedule Trigger
- Advanced schedules use Cron.
- Enter the time / frequency as a Cron string.
Cron formatting
A cron expression is a string comprising 6 or 7 fields separated by a space.
Field | Mandatory | Allowed Values | Allowed Special Characters |
---|---|---|---|
Seconds | Yes | 0-59 | , - * / |
Minutes | Yes | 0-59 | , - * / |
Hours | Yes | 0-23 | , - * / |
Days of month | Yes | 1-31 | , - * ? / L W |
Month | Yes | 1-12 or JAN-DEC | , - * / |
Day of week | Yes | 1-7 or SUN-SAT | , - * ? / L # |
Year | No | empty, 1970-2099 | , - * / |
Cron expression examples
Expression | Meaning |
---|---|
| Trigger at 12pm (noon) every day |
| Trigger at 10:15am every day during the year 2021 |
| Trigger at 10:15am on the last Friday of every month |
Webhook triggers
Webhook triggers are used to trigger a data feed when data on a separate system has been updated.
Webhooks work in a similar way to regular APIs, but rather than making constant requests to other systems to check for updates, webhooks will only send data when a particular event takes place - in this case when new data is available for the data feed.
Using a the webhook URL
The webhook URL is generated by Peak and is unique to the data feed that you are creating or editing. The data source system needs the URL so that it knows where to send the notification.
- From the Trigger stage, click Webhook and copy the URL.
If required, you can generate a new URL by clicking the curved arrow. - Use the URL in the webhook section of the application that you want to receive data from.
If the system is external to Peak, you will also need to provide it with an API Key for your tenant so that the webhook can be authenticated.
For more information about generating API Keys, see API Keys. - Once you have generated and copied your webhook URL, click SAVE.
Run Once Triggers
Run Once triggers are used to run the feed once at either a set time or manually from the feeds list.
From the Run Type drop-down menu, choose either:
- Manual:
This enables you to trigger the feed manually from the feeds list.
To do this, go to Dock > Data Sources, hover over the feed and click ‘Run now’.
For more information, see Managing your data feeds. - Date and Time:
The feed will run once at the scheduled date and time.
The time you enter must be at least 30 minutes from the current time.
Watchers
Watchers can be added to feeds to provide notifications of feed events to Peak users or other systems.
There are two types of watcher:
- User watcher:
These are users of your tenant that will receive a notification within the platform if a feed event occurs. - Webhook watcher:
These are used to trigger or send notifications to other systems or applications when a feed is updated.
They could include external applications such as Slack or internal Peak functions such as Workflows.
To add a watcher:
- From the Trigger step screen, click ADD WATCHER.
- Choose either User or Webhook
User Watchers
These are users of your tenant that will receive a notification within the platform if a feed event occurs.
- To choose a tenant user to add as a watcher, click the Search User drop-down.
- Choose the data feed events that you want the user to be notified of.
You can choose to watch all or a custom selection.
Once added, users can view notifications by clicking the bell icon at the top of the screen.
Data feed events
Users can be notified of the following data feed events:
- Create:
- Execution status:
- Run fail:
The feed run has failed. - Edit / delete:
The feed has been edited or deleted. - Run success:
The feed has successfully run - No new data:
There is no new data available on a feed
Webhook Watchers
These are used to trigger or send notifications to other systems or applications when a feed is updated.
They could include external applications such as Slack or internal Peak functions such as Workflows.
The Webhook URL is taken from the application that you want to trigger if an event occurs.
If this is a Peak Workflow, this can be taken from the workflow’s trigger step.
The JSON payload is optional. It can be used to pass variables to provide additional information about the feed. Parameters can include:
{tenantname}
{jobtype}
{jobname}
{trigger}
Data feed events
Webhooks can be configured for the following data feed events:
- Run fail:
The feed run has failed. - Run success:
The feed has successfully run - Running for more than x minutes:
The feed has been running for more than the specified time in minutes - No new data:
There is no new data available on a feed