google cloud platform – fedml-gcp: Incompatible packages–Unable to train fedml-gcp sample notebook on Vertex AI prebuilt container
I am trying to using Vertex AI to train a linear regression model on data stored in SAP DWC by using the fedml_gcp package. I am following the sample notebook and script provided here.
I ran the following code to train the model:
from fedml_gcp import dwcgcp
import os
PROJECT_ID = 'project-name'
REGION = 'asia-southeast1'
BUCKET_NAME = 'sap_dwc_fed'
BUCKET_URI = "gs://"+BUCKET_NAME
BUCKET_FOLDER = 'linear_test'
MODEL_OUTPUT_DIR = BUCKET_URI+"https://stackoverflow.com/"+BUCKET_FOLDER
SCRIPT_PATH = 'LinearRegressionScript.py'
JOB_NAME = "linear-regression-training"
MODEL_DISPLAY_NAME = "linear-regression-model"
DEPLOYED_MODEL_DISPLAY_NAME = 'linear-regression-deployed-model'
params = {'project':PROJECT_ID,
'location':REGION,
'staging_bucket':BUCKET_URI}
dwc = dwcgcp.DwcGCP(params)
TRAIN_VERSION = "scikit-learn-cpu.0-23"
DEPLOY_VERSION = "sklearn-cpu.1-0"
TRAIN_IMAGE = "us-docker.pkg.dev/vertex-ai/training/{}:latest".format(TRAIN_VERSION)
DEPLOY_IMAGE = "us-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(DEPLOY_VERSION)
table_name="sample_dataset"
table_size = 1
job_dir="gs://"+BUCKET_NAME
cmd_args = [
"--table_name=" + str(table_name),
"--table_size="+ str(table_size),
"--job-dir=" + str(job_dir),
"--bucket_name=" + str(BUCKET_NAME),
"--bucket_folder=" + str(BUCKET_FOLDER)
]
required_packages = [
'fedml_gcp',
'matplotlib>=2.2.3',
'seaborn>=0.9.0',
'scikit-learn>=0.20.2',
'pandas>=1.1.4',
'numpy',
'hdbcli',
'pandas-gbq'
]
inputs2 = {
'display_name':JOB_NAME,
'script_path':SCRIPT_PATH,
'container_uri':TRAIN_IMAGE,
'model_serving_container_image_uri':DEPLOY_IMAGE,
'requirements':required_packages
}
run_job_params2 = {'model_display_name':MODEL_DISPLAY_NAME,
'args':cmd_args,
'replica_count':1,
'base_output_dir':MODEL_OUTPUT_DIR,
'sync':True}
model = dwc.train_model(training_inputs=inputs2,
training_type="custom",
params=run_job_params2)
This resulted in package dependency issues. Based on the error messages, I changed the required packages to:
required_packages = [
'fedml_gcp',
'matplotlib>=2.2.3',
'seaborn>=0.9.0',
'scikit-learn>=0.20.2',
'pandas>=1.1.4',
'numpy',
'hdbcli',
'pandas-gbq',
'google-auth<2,>=1.25.0',
'google-auth-oauthlib<0.5,>=0.4.1',
'google-api-core[grpc]<2.0.0dev,>=1.34.0',
'google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<2dev,>=1.31.5',
'google-cloud-core<2.0dev,>=1.1.0',
'googleapis-common-protos[grpc]<2.0.0dev,>=1.56.0',
'grpcio<2.0dev,>=1.47.0',
'packaging<22.0.0dev,>=14.3',
'protobuf!=3.20.0,!=3.20.1,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5'
]
This resulted in more package dependency issues. The errors shown this time require package versions that contradict the versions specified in the errors in the first log file. Here are the contradicting package requirements:
- First log: “google-cloud-logging 1.15.0 has requirement google-cloud-core<2.0dev,>=1.1.0”, and the second log: “google-cloud-storage 2.7.0 has requirement google-cloud-core<3.0dev,>=2.3.0”
- First log: “google-api-python-client 1.9.3 has requirement google-api-core<2dev,>=1.18.0”, and the second log: “pandas-gbq 0.19.1 has requirement google-api-core<3.0.0dev,>=2.10.2”
- First log: “tensorboard 2.2.2 has requirement google-auth<2,>=1.6.3” and the second log: “pandas-gbq 0.19.1 has requirement google-auth>=2.13.0”
- First log: “tensorboard 2.2.2 has requirement google-auth-oauthlib<0.5,>=0.4.1” and the second log: “pandas-gbq 0.19.1 has requirement google-auth-oauthlib>=0.7.0”
I am using the Vertex AI prebuilt container that is used in the sample notebook provided by SAP as a part of the documentation for fedml-gcp (as shown in the code block above). But the packages pandas-gbq, hdbcli, and fedml_gcp are not getting installed on this container.
Please help!
Read more here: Source link
