File Operators


The MI Pipelines provides the following file operators based on Git and HDFS, which can be used to get files or directories:

  • Git Directory Operator
  • Git File Operator
  • HDFS Directory Operator
  • HDFS File Operator

Git Directory Operator

The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:

../_images/git_dir_calculator.png

Input Parameters Description

Name Required/optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
project Required String Git project name.
branch Required String Git branch name.
paths Required List File path list (in list format), where the list element may be a file or path. For example: [“modelhosting_prj/model6/test1.py”].

Output parameters description

Name Type Description
workspace Directory Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace.
paths List File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

Git File Operator

The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.

Input Parameters Description

Name Required/optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
project Required String Git project name.
branch Required String Git branch name.
file_path Required String File path.

Output parameters description

Name Type Description
file File Output a single file pulled from Git.

HDFS Directory Operator

The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.

Input Parameters Description

Name Required/optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
file_paths Required List HDFS file path list.

Output parameters description

Name Type Description
workspace Directory File directory.
paths List File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

HDFS File Operator

The HDFS File operator is used to get a single file in a specified directory from HDFS.

Input Parameters Description

Name Required/optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
file_path Required String HDFS file path.

Output parameters description

Name Type Description
file File Output a single file gotten from HDFS.