File Source Operators


The AI Pipelines provides the following file source operators based on Git and HDFS, which can be used to get files or directories:

  • Git Directory Operator

  • Git File Operator

  • HDFS Directory Operator

  • HDFS File Operator

  • HDFS Uploader Operator

Git Directory Operator


The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:


../_images/git_dir_calculator.png

Input Parameters Description


Name

Required/Optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

project

Required

String

Git project name.

branch

Required

String

Git branch name.

paths

Required

List

File path list (in list format), where the list element may be a file or path. For example: [“modelhosting_prj/model6/test1.py”].

Output Parameters Description


Name

Type

Description

workspace

Directory

Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace.

paths

List

File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

Git File Operator


The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.

Input Parameters Description


Name

Required/Optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

project

Required

String

Git project name.

branch

Required

String

Git branch name.

file_path

Required

String

File path.

Output Parameters Description


Name

Type

Description

file

File

Output a single file pulled from Git.

HDFS Directory Operator


The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.

Input Parameters Description


Name

Required/Optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

file_paths

Required

List

HDFS file path list.

Output Parameters Description


Name

Type

Description

workspace

Directory

File directory.

paths

List

File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

HDFS File Operator


The HDFS File operator is used to get a single file in a specified directory from HDFS.

Input Parameters Description


Name

Required/Optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

file_path

Required

String

HDFS file path.

Output Parameters Description


Name

Type

Description

file

File

Output a single file gotten from HDFS.

HDFS Uploader Operator


The HDFS Uploader is used to upload a specified file to a specified HDFS directory, which does not have output parameters.

Input Parameters Description


Name

Required/Optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

file

Optional

file

The file needs to be uploaded, which can be obtained using other file source operators such as Git operator or HDFS operator.

filename

Optional

file

The new file name after the file is uploaded.

directory

Optional

Directory

Current path of the file.

dest

Optional

String

Destination path of the file.

overwrite

Optional

Boolean

Specify whether to overwrite the file with the same name in the destination folder.
  • Select true to overwrite

  • Select false to prevent overwriting