File Source Operators¶
The AI Pipelines provides the following file source operators based on Git and HDFS, which can be used to get files or directories:
- Git Directory Operator 
- Git File Operator 
- HDFS Directory Operator 
- HDFS File Operator 
- HDFS Uploader Operator 
Git Directory Operator¶
The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:
 
Input Parameters Description¶
| Name | Required/Optional | Type | Description | 
|---|---|---|---|
| data_source_name | Required | String | Data source name from the data source connection configuration. | 
| project | Required | String | Git project name. | 
| branch | Required | String | Git branch name. | 
| paths | Required | List | File path list (in list format), where the list element may be a file or path. For example: [“modelhosting_prj/model6/test1.py”]. | 
Output Parameters Description¶
| Name | Type | Description | 
|---|---|---|
| workspace | Directory | Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace. | 
| paths | List | File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. | 
Git File Operator¶
The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.
Input Parameters Description¶
| Name | Required/Optional | Type | Description | 
|---|---|---|---|
| data_source_name | Required | String | Data source name from the data source connection configuration. | 
| project | Required | String | Git project name. | 
| branch | Required | String | Git branch name. | 
| file_path | Required | String | File path. | 
Output Parameters Description¶
| Name | Type | Description | 
|---|---|---|
| file | File | Output a single file pulled from Git. | 
HDFS Directory Operator¶
The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.
Input Parameters Description¶
| Name | Required/Optional | Type | Description | 
|---|---|---|---|
| data_source_name | Required | String | Data source name from the data source connection configuration. | 
| file_paths | Required | List | HDFS file path list. | 
Output Parameters Description¶
| Name | Type | Description | 
|---|---|---|
| workspace | Directory | File directory. | 
| paths | List | File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. | 
HDFS File Operator¶
The HDFS File operator is used to get a single file in a specified directory from HDFS.
Input Parameters Description¶
| Name | Required/Optional | Type | Description | 
|---|---|---|---|
| data_source_name | Required | String | Data source name from the data source connection configuration. | 
| file_path | Required | String | HDFS file path. | 
Output Parameters Description¶
| Name | Type | Description | 
|---|---|---|
| file | File | Output a single file gotten from HDFS. | 
HDFS Uploader Operator¶
The HDFS Uploader is used to upload a specified file to a specified HDFS directory, which does not have output parameters.
Input Parameters Description¶
| Name | Required/Optional | Type | Description | 
|---|---|---|---|
| data_source_name | Required | String | Data source name from the data source connection configuration. | 
| file | Optional | file | The file needs to be uploaded, which can be obtained using other file source operators such as Git operator or HDFS operator. | 
| filename | Optional | file | The new file name after the file is uploaded. | 
| directory | Optional | Directory | Current path of the file. | 
| dest | Optional | String | Destination path of the file. | 
| overwrite | Optional | Boolean | 
 |