Contents
Introduction
Peak platforms support direct file transfer to ingest Data to Peak. The platform supports the following three ways for direct file transfer:
- File feeds
- Ad-hoc uploads
- FTP/SFTP feeds
In this article, you will learn how to set up an FTP/SFTP feed on the Peak platform.
Process Overview
Before we start setting up a feed, let's understand what kinds of FTP/SFTP are supported by the platform. If you think your server is different from the supported servers, you can raise a support ticket using our support portal.
the Peak supports the following types of servers:
- SFTP: SFTP stands for Secure File Transfer Protocol or SSH File Transfer Protocol. It is a secure network protocol used for transferring files and managing file systems over a secure and encrypted channel. SFTP operates on top of the SSH (Secure Shell) protocol, providing a secure way to transfer data between systems, ensuring data confidentiality and integrity.
- FTP: FTP, which stands for File Transfer Protocol, is a standard network protocol used for transferring files from one computer to another over a TCP/IP-based network like the Internet. FTP is not inherently secure as it sends data, including usernames and passwords, in plaintext, making it susceptible to eavesdropping. It is mainly used for non-sensitive data transfers.
- FTPS: FTPS, or FTP Secure, is an extension of FTP that adds a layer of security to the protocol. It uses encryption (typically SSL/TLS) to secure data transmission between the client and the server, making it suitable for secure file transfers. FTPS can operate in two modes: Implicit FTPS, where security is established from the beginning, and Explicit FTPS, where security is negotiated as needed.
Prerequisites
Before creating an FTP/SFTP feed, please make sure that the following prerequisites are fulfilled:
- The user should be assigned a role that has write access for Dock > Data Sources.
- The tenant must have a Data Warehouse configured on it as feeds require a data warehouse for data ingestion.
- Details of FTP/SFTP server.
- Whitelist Peak's public IPs in the server's firewall/network policies. You can find a list of IPs here.
You can create a FTP/SFTP feed by navigating to Dock > Data Sources > Add Feed > Go to File Storage Section > Choose FTP/SFTP Connector. Once you choose the connector, you will see the following four steps that need to be completed to set up the feed.
- Connection
- Import configuration
- Destination
- Trigger
Connection Setup
When setting up a feed, a connection is required to connect to your FTP/SFTP server and this connection will be used for the lifetime of your feed. You can either use an existing connection or create a new one.
Using existing connection
- At the Connection stage, choose the required connection from the Select Connection dropdown.
The dropdown will be empty if there have not been any previous connections configured. - Once a connection is selected, the platform will verify that the selected connection is still valid. If the connection is verified successfully, the Next button will be enabled. You can move to the next steps by clicking on the Next button.
- If connection verification fails, you will see errors on the platform. On the basis of the error, you can change the connection parameters (listed below) by clicking on the Edit icon on the right side of the Select Connection dropdown. You can only edit allowed parameters like passwords, private keys etc.
- Once you have updated the connection parameters, Save the connection and re-select the connection you edited from the dropdown. This will re-verify the connection and you can go to the next steps after the connection is verified successfully.
Creating a new connection
- At the Connection stage, click NEW CONNECTION.
- Enter the required connection parameters as listed below.
- Once all the required details are filled, click on TEST to test the connection.
- After the connection test is successful, click on SAVE to save the connection.
- Once the connection is saved, it will be visible in the Select Connection dropdown
- Select the newly created connection and move to the next steps.
Connection Parameters
The connection setup needs the following fields to be filled.
- Connection Name: Enter a suitable name for the connection using alphanumeric characters.
- Protocol: Choose a suitable protocol based on the type of server:
- File transfer protocol (FTP)
- Secure File transfer protocol (SFTP)
- Encryption: This is required if the selected Protocol is FTP. Choose the encryption type supported by the server. For more details refer to FTP docs (add link here)
Use plain FTP
Use explicit FTP over TLS
Host Address: The IP/host address for the server. Example: 10.20.20.30 or ftp.example.com
Port: The port number for the server.
Username: The username or login ID for the server.
Password: Password to connect to server. This is optional for SFTP Protocol if Use Private Key is selected.
Use Private Key: This is available if the selected Protocol is SFTP. Once checked, you can use private keys to connect to the SFTP server.
Creating SFTP connection
To create a new SFTP connection, follow the below steps:
- Enter the Connection Name as mentioned above.
- Choose the Secure File transfer protocol from the Protocol dropdown.
- Enter the following parameters:
- Host Address: The IP/host address for the server. Example: 10.20.20.30 or sftp.example.com.
- Port: The port number for the server. If you don't have any specific port then you can enter the SFTP server's default port 22.
- Username: The username or login ID for the server.
- For authentication, you can either enter a Password or Use Private Key.
- If the server requires a password, then enter the password in the Password field.
- If the server requires key-based authentication, then check the Use Private Key checkbox and upload the public key by clicking on the upload icon. We only support ppk, pem and pub extensions for public key files.
- Once all the fields are completed, click on TEST to test the connection.
- After the connection test is successful, click on SAVE to save the connection.
Creating FTP/FTPS connection
To create a new FTP/FTPS connection, follow the below steps:
- Enter the Connection Name as mentioned above.
- Choose the File transfer protocol from the Protocol dropdown.
- Choose the Encryption type supported by the server.
- Use plain FTP: Select this type if FTP is plain FTP and doesn't require any TLS checks.
- Use explicit FTP over TLS: Select this type if FTP requires explicit TLS checks.
- Enter the following parameters:
- Host Address: The IP/host address for the server. Example: 10.20.20.30 or ftp.example.com.
- Port: The port number for the server. If you don't have any specific port then you can enter the FTP server's default port 21.
- Username: The username or login ID for the server.
- Password: Password to connect to server.
- Once all the fields are completed, click on TEST to test the connection.
- After the connection test is successful, click on SAVE to save the connection.
Troubleshooting connection setup
While setting up the connection, you might get failures while testing the connection. Some common error messages are listed below. If you encounter any other error, please check the server's configuration or contact Peak support.
- Host not found. Please check that the configured host is correct: This means the entered Host Address is incorrect. Please check that the entered host address/URL or IP is correct.
- Connection refused by server. Please check that the configured host and port is correct: This means the entered Host Address or Port is incorrect. Please check the entered details and update if required.
- Authentication failure. Please check that the login credentials are correct: This means entered Username or Password or Private key is not correct. Please check the entered details and update if required.
- Connection request timed out. Please check your network connection and try again: This means either the server is not running or the Peak platform is not able to connect to the server. Please check that the server is running and Peak's public IPs are whitelisted properly.
Import Configuration Setup
The Import Configuration setup enables you to specify the details of the files you are importing, and how the data will be formatted and loaded. File format and naming convention requirements can be found here. Follow the below steps to set the import configuration:
- Enter the path for the file you want to import in the Enter File Path field. You can get the file path by connecting to the SFTP/FTP server using SFTP/FTP clients like Filezilla, Cyberduck, etc. The accepted file types are CSV, TXT, JSON, XML and GZIP. The given path will be used to decide the directory and file pattern for fetching files from the server.
- After the path is entered, click on VALIDATE to validate the path and generate a preview of the file.
- If the path is correct then a preview will be generated and you can enter other details.
- If the path is incorrect then please correct the entered path and click on VALIDATE again.
- Once the preview is generated, fill in the following fields:
- File Type: Choose the type of file either based on preview or based on file extension.
- CSV: For CSV, TSV, PSV and CSV.GZ files.
- JSON: For JSON and JSON.GZ files.
- XML: For XML and XML.GZ files.
- Separator: This field will be available if the File Type is CSV. Choose the suitable column separator present in selected files. You can choose one of Comma, Pipe, or Tab.
- Root Tag: This field will be available if the File Type is XML. Choose the root tag for the file. This tag will be used to identify the root tag from which data will be fetched.
- Feed Load Type: Choose the suitable load type based on data requirements. Please check here for more details.
- Incremental
- Truncate and Insert
- Upsert (Update and Insert)
- Primary Key: This field is only required if the selected Feed Load Type is Upsert. Enter the column name (s) that will be used as the primary key while ingesting the data. This primary key will be used to update the rows while ingesting the data.
- Feed Name: Enter a suitable name for the feed. The feed name should follow the following constraints:
- Only alphanumeric and underscore are allowed.
- It must start with a letter.
- Must not end with an underscore.
- Up to 50 characters are allowed.
- File Type: Choose the type of file either based on preview or based on file extension.
Once all the fields are entered, click on Next to go to the Destination step.
Destination Setup
The Destination setup enables you to choose where your data will be stored. Peak supports different types of destinations based on the configured data warehouse for the tenant.
Redshift Data Warehouse
When Redshift is configured as a data warehouse for the tenant, you can choose multiple destinations for your data. Below are the supported destinations:
- S3 (Spark): Choose this destination if you want to ingest data into an external table.
- Redshift: Choose this destination if you want to ingest data into a table.
The S3 (Spark) destination will only be available if tenant has an active Glue Catalog configured. If destination is disabled, please configure the Glue Catalog following these steps (add link to Glue Catalog Setup)
Snowflake Data Warehouse
When Snowflake is configured as a data warehouse for the tenant, you can choose only a single destination for your data. You can find more details about this difference here (add a link for the difference b/w Snowflake and Redshift warehouse). Below are the supported destinations:
- S3 (Spark): Choose this destination if you want to ingest data into an external table.
- Snowflake: Choose this destination if you want to ingest data into a normal table.
For more information, see Choosing a destination for a data connector.
Trigger Setup
For a guide to setting triggers for your data feeds, see How to create a Trigger for a data feed.
Considerations
- The platform doesn't restrict the number of feeds per connection. This means you can create any number of feeds using the same connection but consider the following limitations when multiple feeds are set up using the same connection.
- If multiple feeds that use a single connection are scheduled to be run at the same time then the server should be able to handle the same amount of concurrent connections. This will make sure that feeds do not fail because of resource constraints on the server.
- The SFTP connector fetches the files from the server in a batch. This means the connection should allow parallel operations on a single connection. If the connection can not support parallel operations, please inform Peak support by raising a support ticket or updating the server configuration.
- The SFTP connector requires an SFTP/FTP user who has read+write permissions on the directories/files that you want to ingest. The write permission is required because of the internal implementation of the connector.