Configure Data Source Connections


When staging the model versions in the AI Hub or configuring the operator parameters in the AI Pipelines, it is necessary to access the data in various data sources or write data to the data source. Before using such products, you can complete the following data source connection configuration according to business needs:

  • Git
  • HIVE
  • MySQL
  • S3
  • Blob
  • HDFS
  • API
  • APIM

Prerequisites


  • You should have obtained the access permissions to corresponding data sources (data source URL, user name, password, etc).
  • For Git data source connection, ensure that the AI Studio Administrator has configured the Git host whitelist through the Resource Configuration > Connection Configuration > Git White List page.

Create a Data Source Connection

Create an Internal Data Source Connection


If the current OU has requested the “File Storage HDFS” and “Data Warehouse Storage (Hive)” resources, create an internal Hive or HDFS data source connection by the following steps:

  1. Select Resource Configuration > Connection Configuration from the left navigation pane.

  2. Select New and select Hive or HDFS in right panel.

  3. Enter the following information on the pop-up window:

    • For Internal Hive Source


      Field Description
      Data Source Name Enter the name of the Hive data source.
      Internal/External Data Source Select Internal Data Source.
      Use HDFS Connections Enabling HDFS Connections can get a faster data access.
      Queue Select the queue resource of the OU from the dropdown list.
      Description Enter the description of Hive the data source.


    • For Internal HDFS Source


      Field Description
      Data Source Name Enter the name of the HDFS data source.
      Internal/External Data Source Select Internal Data Source.
      Description Enter the description of HDFS the data source.


  4. Select Confirm.


You can request or view the “File Storage HDFS” and “Data Warehouse Storage” resources on the Data Management tab of Resource Management > Resource List. If the “File Storage HDFS” and “Data Warehouse Storage” resources are not available for the current OU, contact the system administrator.

Create an External Data Source Connection


The following steps take the configuration of a Git data source connection as an example to describe how to create a new external data source connection:

  1. Select Resource Configuration > Connection Configuration from the left navigation pane.
  2. Select New, and select Git in the right panel.
  3. In the New Data Source window, provide the configuration of the data source connection.
    • Data Source Name: enter the name of data source
    • Git Type: select github or gitlab (data source configuration is different)
    • Authentication: select the method of user authentication (for github: authenticate through Git Token; for gitlab: you can select to authenticate through Git Token or username & password)
    • Git URL: enter the address of the Git data source (the format is http://hostname:port/namespace)
    • Description: enter a brief description of the data source
    • Test Connection: after completing the above configuration, select Test to test whether the data source connection is configured corretly.
  4. After testing the data source connection, select Confirm. The added data source connection will be displayed in the list.

Use Data Source Connections


After completing the data source connection configuration, you can access the data in the data source through the established connection. For example, you can access the model source file saved in the Git data source when staging the model version in the AI Hub:


../_images/accessing_data_source.png


Manage Data Source Connection


You can manage (edit or delete) the added data source connection according to business needs.

  1. If you need to edit a registered data source connection, you can select the Edit icon of a data source connection in the list of data source connections to modify its configuration information.
  2. If a data source connection is no longer needed by the business, you can select the Delete icon of the target data source connection in the list of data source connections to delete it.

Note

Only the creator of a data source connection has the permission to edit or delete the data source connection, or test its connectivity.