File Source Operators¶
The AI Pipelines provides the following file source operators based on Git and HDFS, which can be used to get files or directories:
Git Directory Operator
Git File Operator
HDFS Directory Operator
HDFS File Operator
HDFS Uploader Operator
Git Directory Operator¶
The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:
Input Parameters Description¶
Name |
Required/Optional |
Type |
Description |
---|---|---|---|
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
project |
Required |
String |
Git project name. |
branch |
Required |
String |
Git branch name. |
paths |
Required |
List |
File path list (in list format), where the list element may be a file or path. For example: [“modelhosting_prj/model6/test1.py”]. |
Output Parameters Description¶
Name |
Type |
Description |
---|---|---|
workspace |
Directory |
Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace. |
paths |
List |
File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. |
Git File Operator¶
The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.
Input Parameters Description¶
Name |
Required/Optional |
Type |
Description |
---|---|---|---|
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
project |
Required |
String |
Git project name. |
branch |
Required |
String |
Git branch name. |
file_path |
Required |
String |
File path. |
Output Parameters Description¶
Name |
Type |
Description |
---|---|---|
file |
File |
Output a single file pulled from Git. |
HDFS Directory Operator¶
The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.
Input Parameters Description¶
Name |
Required/Optional |
Type |
Description |
---|---|---|---|
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file_paths |
Required |
List |
HDFS file path list. |
Output Parameters Description¶
Name |
Type |
Description |
---|---|---|
workspace |
Directory |
File directory. |
paths |
List |
File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. |
HDFS File Operator¶
The HDFS File operator is used to get a single file in a specified directory from HDFS.
Input Parameters Description¶
Name |
Required/Optional |
Type |
Description |
---|---|---|---|
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file_path |
Required |
String |
HDFS file path. |
Output Parameters Description¶
Name |
Type |
Description |
---|---|---|
file |
File |
Output a single file gotten from HDFS. |
HDFS Uploader Operator¶
The HDFS Uploader is used to upload a specified file to a specified HDFS directory, which does not have output parameters.
Input Parameters Description¶
Name |
Required/Optional |
Type |
Description |
---|---|---|---|
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file |
Optional |
file |
The file needs to be uploaded, which can be obtained using other file source operators such as Git operator or HDFS operator. |
filename |
Optional |
file |
The new file name after the file is uploaded. |
directory |
Optional |
Directory |
Current path of the file. |
dest |
Optional |
String |
Destination path of the file. |
overwrite |
Optional |
Boolean |
|