MLBenchのビルド(https://mlbench.github.io)に色々手間取ったのでその記録を残します。
最終的な手順
# freetype apt-get install libfreetype6-dev # cython pip install --upgrade pip pip install wheel pip install cython # mlbench-core v3.00 pip install mlbench-core==3.0.0 # kubectl v1.19.1-00 apt-get install kubectl=1.19.1-00 # helm apt-get install helm # kind v0.9.0 (download release binary) wget https://github.com/kubernetes-sigs/kind/releases/download/v0.9.0/kind-linux-amd64 mv kind-linux-amd64 /some-dir-in-your-PATH/kind # add user into docker group sudo gpasswd -a user_name docker
実行コマンド
# kubernetestのversionとして1.19(使用可能な中で最新)を指定 mlbench create-cluster kind 3 my-cluster -k 1.19
エラー一覧
error: blis/cy.c: No such file or directory
--> cython installで解決(pip install cython)
Compiler gcc building 'blis.cy' extension creating build/temp.linux-x86_64-3.10 creating build/temp.linux-x86_64-3.10/blis gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/tmp/pip-install-kef0zwl3/blis_a24213e9f70646c48d3a9bedba7c0064/include -I/tmp/pip-install-kef0zwl3/blis_a24213e9f70646c48d3a9bedba7c0064/blis/_src/include/linux-x86_64 -I/home/hoge/.pyenv/versions/3.10.4/envs/3.10.4_mlbench/include -I/home/hoge/.pyenv/versions/3.10.4/include/python3.10 -c blis/cy.c -o build/temp.linux-x86_64-3.10/blis/cy.o -std=c99 gcc: error: blis/cy.c: No such file or directory gcc: fatal error: no input files compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure × Encountered error while trying to install package. ╰─> blis note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure. [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × pip subprocess to install backend dependencies did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip.
Failed to build matplotlib
--> freetypeのinstallで解決(apt-get install libfreetype6-dev)
3 | #error "FreeType version 2.3 or higher is required. \ | ^~~~~ src/checkdep_freetype2.c:10:10: error: #include expects "FILENAME" or <FILENAME> 10 | #include FT_FREETYPE_H | ^~~~~~~~~~~~~ src/checkdep_freetype2.c:15:9: note: #pragma message: Compiling with FreeType version FREETYPE_MAJOR.FREETYPE_MINOR.FREETYPE_PATCH. 15 | #pragma message("Compiling with FreeType version " \ | ^~~~~~~ src/checkdep_freetype2.c:18:4: error: #error "FreeType version 2.3 or higher is required. You may set the MPLLOCALFREETYPE environment variable to 1 to let Matplotlib download it." 18 | #error "FreeType version 2.3 or higher is required. \ | ^~~~~ error: command '/usr/bin/gcc' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for matplotlib Running setup.py clean for matplotlib Failed to build matplotlib (中略) error: subprocess-exited-with-error × Running setup.py install for matplotlib did not run successfully. │ exit code: 1 ╰─> [591 lines of output] Edit setup.cfg to change the build options; suppress output with --quiet. BUILDING MATPLOTLIB matplotlib: yes [3.2.1] python: yes [3.9.12 (main, May 31 2022, 20:47:13) [GCC 9.3.0]] platform: yes [linux] sample_data: yes [installing] tests: no [skipping due to configuration] agg: yes [installing] tkagg: yes [installing; run-time loading from Python Tcl/Tk] macosx: no [Mac OS-X only]
Starting control-plane Error
--> kindとkubernetestのバージョンを落とすことで解決(kubectl v1.19.1-00, kind v0.9.0)
エラー時の環境
- kind: v0.14.0
- kubernetest: v1.24.1-00
command: mlbench create-cluster kind 3 my-cluster
I0615 05:23:34.996526 275 round_trippers.go:438] GET https://my-cluster-3-control-plane:6443/healthz?timeout=32s in 0 milliseconds [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker. Here is one example how you may list all Kubernetes containers running in docker: - 'docker ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'docker logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
docker.errors.APIError
--> dockerでregistryのCONTAINER IDを調べてstopとdelete
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6a74047b55f1 kindest/node:v1.19.1 "/usr/local/bin/entr…" 3 minutes ago Up 3 minutes my-cluster-3-worker 708d610be1d1 kindest/node:v1.19.1 "/usr/local/bin/entr…" 3 minutes ago Up 3 minutes 127.0.0.1:38503->6443/tcp my-cluster-3-control-plane b99ded43a14b kindest/node:v1.19.1 "/usr/local/bin/entr…" 3 minutes ago Up 3 minutes my-cluster-3-worker2 a54bd6a93976 registry:2 "/entrypoint.sh /etc…" 21 hours ago Up 21 hours 0.0.0.0:5000->5000/tcp, :::5000->5000/tcp kind-registry $ docker stop a54bd6a93976 a54bd6a93976 $ docker rm a54bd6a93976 a54bd6a93976
Traceback (most recent call last): File "/home/hoge/.pyenv/versions/3.9.12/envs/3.9.12_mlbench/lib/python3.9/site-packages/docker/api/client.py", line 261, in _raise_for_status response.raise_for_status() File "/home/hoge/.pyenv/versions/3.9.12/envs/3.9.12_mlbench/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http+docker://localhost/v1.35/networks/ac328edd34a7252e98b1afe0b8776feecf0ff70c2c323224a05c795e71ca3297/connect (中略) docker.errors.APIError: 403 Client Error: Forbidden ("endpoint with name kind-registry already exists in network kind")
Error: Failed to create cluster with the following error
--> kindのclusterの削除で解決
$ kind get clusters my-cluster-3 $ kind delete cluster --name my-cluster-3 Deleting cluster "my-cluster-3" ...
Try 'mlbench create-cluster kind --help' for help. Error: Failed to create cluster with the following error: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "my-cluster-3"