File Source Operators


The AI Pipelines provides the following file source operators based on Git and HDFS, which can be used to get files or directories:

  • Git Directory Operator
  • Git File Operator
  • HDFS Directory Operator
  • HDFS File Operator
  • HDFS Uploader Operator

Git Directory Operator


The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:


../_images/git_dir_calculator.png

Input Parameters Description


Name Required/Optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
project Required String Git project name.
branch Required String Git branch name.
paths Required List File path list (in list format), where the list element may be a file or path. For example: [“modelhosting_prj/model6/test1.py”].

Output Parameters Description


Name Type Description
workspace Directory Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace.
paths List File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

Git File Operator


The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.

Input Parameters Description


Name Required/Optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
project Required String Git project name.
branch Required String Git branch name.
file_path Required String File path.

Output Parameters Description


Name Type Description
file File Output a single file pulled from Git.

HDFS Directory Operator


The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.

Input Parameters Description


Name Required/Optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
file_paths Required List HDFS file path list.

Output Parameters Description


Name Type Description
workspace Directory File directory.
paths List File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

HDFS File Operator


The HDFS File operator is used to get a single file in a specified directory from HDFS.

Input Parameters Description


Name Required/Optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
file_path Required String HDFS file path.

Output Parameters Description


Name Type Description
file File Output a single file gotten from HDFS.

HDFS Uploader Operator


The HDFS Uploader is used to upload a specified file to a specified HDFS directory, which does not have output parameters.

Input Parameters Description


Name Required/Optional Type Description
data_source_name Required String Data source name from the data source connection configuration.
file Optional file The file needs to be uploaded, which can be obtained using other file source operators such as Git operator or HDFS operator.
filename Optional file The new file name after the file is uploaded.
directory Optional Directory Current path of the file.
dest Optional String Destination path of the file.
overwrite Optional Boolean
Specify whether to overwrite the file with the same name in the destination folder.
  • Select true to overwrite
  • Select false to prevent overwriting