File Operators
The MI Pipelines provides the following file operators based on Git and HDFS, which can be used to get files or directories:
- Git Directory Operator
- Git File Operator
- HDFS Directory Operator
- HDFS File Operator
Git Directory Operator
The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:
Output parameters description
Name |
Type |
Description |
workspace |
Directory |
Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace. |
paths |
List |
File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. |
Git File Operator
The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.
Input Parameters Description
Name |
Required/optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
project |
Required |
String |
Git project name. |
branch |
Required |
String |
Git branch name. |
file_path |
Required |
String |
File path. |
Output parameters description
Name |
Type |
Description |
file |
File |
Output a single file pulled from Git. |
HDFS Directory Operator
The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.
Input Parameters Description
Name |
Required/optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file_paths |
Required |
List |
HDFS file path list. |
Output parameters description
Name |
Type |
Description |
workspace |
Directory |
File directory. |
paths |
List |
File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. |
HDFS File Operator
The HDFS File operator is used to get a single file in a specified directory from HDFS.
Input Parameters Description
Name |
Required/optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file_path |
Required |
String |
HDFS file path. |
Output parameters description
Name |
Type |
Description |
file |
File |
Output a single file gotten from HDFS. |