AI Studio FAQs¶

This section provides answers to some frequently asked questions on AI Studio. For troubleshooting topics, see AI Studio Troubleshooting.

FAQs for AI Lab¶

Q: What preparation is required for accessing data in HDFS/HIVE through Notebook?¶

A: Complete the following steps before accessing data in HDFS/HIVE through Notebook:

When requesting container resource through Resource Management, select Enable read access for HDFS and Data Warehouse.
When adding PVC through Data Analytics > Resource Config > Storage Config, use the requested resource with read access for HDFS and Data Warehouse (and used by AI Lab).
When creating Notebook instance, select the spark or pyspark image, and also select the Mount Hadoop PVC option.
Open the Jupyter Notebook and enter kinit -kt /etc/security/keytab/data_xxxxx data_xxxxx@ENIOT.IO (xxxxx is the OU ID) in the Terminal to update the ticket.

Q: How to use Python to retrieve big data amount from HIVE for model training?¶

A: For limited data volume, use pyhive. For big data, you can download files from HDFS to local storage, compress the data into an ORC file by HIVE SQL, then process the ORC file with pyarrow package.

Q: How should I collaborate with others within Notebook?¶

A: In different Notebook instances, mount the same PVC storage to realize collaboration and sharing.

Q: The Kernel fails to start (Error Starting Kernel) after switching the environment or performing other operations in the Notebook. How to resolve this problem?¶

A: Try running the python3 -m ipykernel install --user command in the Terminal.

Q: Can I add a new Kernel in the Notebook?¶

A: Yes. You can use the following commands:

conda create -n py36-test python=3.6
source activate py36-test
conda install ipykernel
python -m ipykernel install --name py36-test
conda deactivate

Q: Why does my Notebook instance become slow after using the Notebook for some time?¶

A: When opening the Notebook instance, some Kernel sessions or Terminal sessions will be created. When you close the Notebook instance, these sessions will not be closed for easier access when you open the Notebook again. You need to close these sessions manually if they are not needed.

Q: After installing some packages in the Notebook, some package dependency issues happen. How can I restore the Notebook instance to the initial status?¶

A: In the Notebook menu, select File > Shut Down to close the Notebook instance. Open the Notebook again to restore the Notebook status.

Q: Can I change the Python version in Notebook?¶

A: AI Studio Notebook integrates Python 3.7. You can change the Python version by adding Kernels. For compatibility reasons, we recommended that you use the built-in Python 3.7 to avoid issues when staging a model version.

FAQs for AI Hub¶

Q: When calling model service APIs, if the request body is too big or the processing time exceeds the limit, a timeout error will be reported. How to solve this problem?¶

A: When deploying a model version, you can set the Timeout value for the model service API (the maximum value is 600,000 ms).

Q: When using MLflow version 1.10.0, a compatibility issue may occur, which causes the model version not being successfully published. How to solve this problem?¶

A: MLflow version 1.8 is integrated in AI Hub and AI Lab by default. If you upgrade MLflow to version 1.10.0 or use a newer MLflow version in model development, you must use artifacts files of MLflow 1.8 version to publish model versions.

Q: Model services deployed by AI Hub can be called within the cluster only. How to call the services cross clusters?¶

A: To expose model services, the services must be published through EnOS API Management, which provides authentication and traffic control service. For more information about EnOS APIM, see API Management.

Q: When calling model service APIs, what is the scope of using authentication?¶

A: When calling AI Studio model service APIs, try to use the authentication function in a way other than Seldon SDK. For internal calls with REST or GRPC, authentication is not required.

Q: The request time for calling AI Studio model service API is not stable. How to improve the stability of request time?¶

A: You can try increasing the memory request when deploying the model and test calling the model service through Postman.

FAQs for AI Pipelines¶

Q: Python packages cannot be downloaded due to unstable network connection, which results in pipeline running timeout. How can I fix this problem?¶

A: You can install Python packages into built-in pip by GUI or commands:

By GUI: upload packages in “AI Lab > Python Package”.
By commands: use commands like “bash batch_upload.sh -r /tmp/requirements.txt”. You can find batch_upload.sh in manual/started/private-pip-repository, and you need to specify the package versions in the “requirements.txt”, for example xyz==1.34. Otherwise, the latest version will be checked and downloaded when you run pip commands. After installing Python packages into built-in pip, the instance will use packages in built-in pip first.

Q: The workflows in both AI Studio and EnOS Data Management support scheduling. What are the differences?¶

A: The batch processing workflows of EnOS Data Management are for data synchronization and data processing, which supports synchronizing and processing structured data and file streams based on Data IDE, Shell, and Python. The workflows are used by data engineers.

The intelligent workflows of AI Studio are for the lifecycle management of machine learning models, including data preparation, model training, model deployment, and model prediction service. The workflows are used by data scientists.

Q: When workflows are running, how to control the concurrency in the case of high concurrency?¶

A: Use the following methods to control workflow concurrency by levels:

Control the maximum number of runs at the same time of each workflow by setting the maximum concurrency number at runtime.
Control the maximum number of concurrent pods of a single workflow by setting the advanced parameter “maximum pod number”.
Control the item concurrency of the ParallelFor operator by setting the concurrency parameter of the operator.
Control the concurrency of operators by setting the “maximum pod number” parameter of the ParallelFor operator.

By setting the above 4 parameters, you can control the concurrency of a workflow from run to pod.

Q: Can data be transferred between operators of a workflow?¶

A: Yes, you can use the File operator or the Directory operator to transfer data.

Q: When an operator in a workflow runs in error, can the workflow rerun from where the error occurs?¶

A: Yes. You can select Retry on the Running Instance Detail page upon running errors. Note that rerunning the workflow is only for occasional errors. If operator parameter configuration is modified after an error occurs, rerunning will not take effect. Secondly, if the running exceeds the timeout setting of the workflow, rerunning will not take effect either.

Q: How to monitor the resource usage of a running workflow?¶

A: You can find the pod name on the Running Instance Detail page. Then, check and monitor the usage of pod running resources in Grafana by the pod name.

FAQs for Resource Management¶

Q: How does the resources that are requested through Resource Management correspond to the AI Studio? How to set the Request and Limit?¶

A: The resources requested corresponds to resource quota of the resource pool. The total AI Studio resource consumption in this resource pool including Notebook instances, model services, operators, and so on cannot exceed the quota. Requests define the minimum amount of resources the containers need, Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod requests. Limits define the maximum amount of resources the containers can consume. Setting limits can prevent a Pod from using all resources available. For more information, see Kubernetes Documentation.

Q: When running out the PVC mounted to a Notebook, how can I expand the PVC?¶

You can expand a PVC by the following steps:

Go to AI Lab and click the View icon of the target Notebook instance.
On the Instance Details page, find the PVC of the instance in the Storage section.
Go to Resource Configuration > Storage Configuration and click the Expand icon of the target PVC.
On the pop-up window, configure the Capacity section and click OK to expand the PVC.
Restart the Notebook instance to apply the changes.