Data connections¶
In Workbench, you can easily configure and reuse secure connections to predefined data sources. Not only does this allow you to interactively browse, preview, and profile your data, but for supported connections, it also gives you access to DataRobot's integrated data preparation capabilities.
See the associated considerations for important additional information.
Source IP addresses for allowing
Before setting up a data connection, make sure the source IPs have been allowed.
Connection capabilities¶
The following table lists the connections currently available in Workbench as well as supported capabilities for each one:
Connection | Availability | Dynamic datasets | Live preview | Wrangling | In-source materialization |
---|---|---|---|---|---|
Snowflake | GA | ✔ | ✔ | ✔ | ✔ |
Google BigQuery | GA | ✔ | ✔ | ✔ | ✔ |
Databricks | GA | ✔ | ✔ | ✔ | ✔ |
AWS S3 | GA | ||||
ADLS Gen2 | GA | ||||
SAP Datasphere | Premium |
Connect to a data source¶
Creating a data connection lets you explore external source data—from both connectors and JDBC drivers—and then add it to your Use Case.
JDBC driver capabilites
You can only add snapshot datasets from a JDBC driver connection. See the full list of supported connectors and drivers in DataRobot.
To create a data connection:
-
From the Data tab, click Add data in the upper-right corner, opening the Browse data modal.
-
Click + Add connection.
-
Select a data store. It is recommended that you select a data store listed in the Active tab.
Now, you can configure the data connection.
Configure the connection¶
Note
When configuring your data connection, configuration types, authentication options, and required parameters are based on the selected data source. The example below shows how to configure Snowflake with OAuth using new credentials.
To configure the data connection:
-
On the Configuration page, select a configuration method—either Parameters or JDBC URL.
-
Enter the required parameters for the selected configuration method.
-
Click New Credentials and select an authentication method—in this case, either Basic or OAuth.
Saved credentials
If you previously saved credentials for the selected data source, click Saved credentials and select the appropriate credentials from the dropdown.
-
Click Save in the upper right corner.
If you selected OAuth as your authentication method, you will be prompted to sign in before you can select a dataset. See the DataRobot Classic documentation for more information about supported authentication methods and required parameters.
Select a dataset¶
Once you've set up a data connection, you can add datasets by browsing the database schemas and tables you have access to.
To select a dataset:
-
Select the schema associated with the table you want to add.
-
Select the box to the left of the appropriate table.
With a dataset selected, you can:
Description 1 Click Wrangle to prepare the dataset before adding it to your Use Case. 2 Click Preview to open a snapshot preview to help determine if the dataset is relevant to your Use Case and/or if it needs to be wrangled. 3 Click Add to Use Case to add it to your Use Case, making it available to you and other team members on the Data tab. Large datasets
If you want to decrease the size of the dataset before adding it to your Use Case, click Wrangle. When you publish a recipe, you can configure automatic downsampling to control the number of rows when Snowflake materializes the output dataset.
Next steps¶
From here, you can:
- Perform data wrangling before adding the dataset to your Use Case.
- Add more data.
- View exploratory data insights for the dataset.
- Use the dataset to set up an experiment and start modeling.
Read more¶
To learn more about the topics discussed on this page, see: