サブロウ丸

主にプログラミングと数学

MLBenchのビルド

MLBenchのビルド(https://mlbench.github.io)に色々手間取ったのでその記録を残します。

最終的な手順

# freetype
apt-get install libfreetype6-dev

# cython
pip install --upgrade pip
pip install wheel
pip install cython

# mlbench-core v3.00
pip install mlbench-core==3.0.0

# kubectl v1.19.1-00
apt-get install kubectl=1.19.1-00

# helm
apt-get install helm

# kind v0.9.0 (download release binary)
wget https://github.com/kubernetes-sigs/kind/releases/download/v0.9.0/kind-linux-amd64
mv kind-linux-amd64 /some-dir-in-your-PATH/kind

# add user into docker group
sudo gpasswd -a user_name docker

実行コマンド

# kubernetestのversionとして1.19(使用可能な中で最新)を指定
mlbench create-cluster kind 3 my-cluster -k 1.19

エラー一覧

error: blis/cy.c: No such file or directory

--> cython installで解決(pip install cython)

            Compiler gcc
            building 'blis.cy' extension
            creating build/temp.linux-x86_64-3.10
            creating build/temp.linux-x86_64-3.10/blis
            gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/tmp/pip-install-kef0zwl3/blis_a24213e9f70646c48d3a9bedba7c0064/include -I/tmp/pip-install-kef0zwl3/blis_a24213e9f70646c48d3a9bedba7c0064/blis/_src/include/linux-x86_64 -I/home/hoge/.pyenv/versions/3.10.4/envs/3.10.4_mlbench/include -I/home/hoge/.pyenv/versions/3.10.4/include/python3.10 -c blis/cy.c -o build/temp.linux-x86_64-3.10/blis/cy.o -std=c99
            gcc: error: blis/cy.c: No such file or directory
            gcc: fatal error: no input files
            compilation terminated.
            error: command '/usr/bin/gcc' failed with exit code 1
            [end of output]
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: legacy-install-failure
      
      × Encountered error while trying to install package.
      ╰─> blis
      
      note: This is an issue with the package mentioned above, not pip.
      hint: See above for output from the failure.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install backend dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Failed to build matplotlib

--> freetypeのinstallで解決(apt-get install libfreetype6-dev)

          3 |     #error "FreeType version 2.3 or higher is required. \
            |      ^~~~~
      src/checkdep_freetype2.c:10:10: error: #include expects "FILENAME" or <FILENAME>
         10 | #include FT_FREETYPE_H
            |          ^~~~~~~~~~~~~
      src/checkdep_freetype2.c:15:9: note: #pragma message: Compiling with FreeType version FREETYPE_MAJOR.FREETYPE_MINOR.FREETYPE_PATCH.
         15 | #pragma message("Compiling with FreeType version " \
            |         ^~~~~~~
      src/checkdep_freetype2.c:18:4: error: #error "FreeType version 2.3 or higher is required. You may set the MPLLOCALFREETYPE environment variable to 1 to let Matplotlib download it."
         18 |   #error "FreeType version 2.3 or higher is required. \
            |    ^~~~~
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for matplotlib
  Running setup.py clean for matplotlib
Failed to build matplotlib

(中略)

  error: subprocess-exited-with-error
  
  × Running setup.py install for matplotlib did not run successfully.
  │ exit code: 1
  ╰─> [591 lines of output]
      
      Edit setup.cfg to change the build options; suppress output with --quiet.
      
      BUILDING MATPLOTLIB
        matplotlib: yes [3.2.1]
            python: yes [3.9.12 (main, May 31 2022, 20:47:13)  [GCC 9.3.0]]
          platform: yes [linux]
       sample_data: yes [installing]
             tests: no  [skipping due to configuration]
               agg: yes [installing]
             tkagg: yes [installing; run-time loading from Python Tcl/Tk]
            macosx: no  [Mac OS-X only]

Starting control-plane Error

--> kindとkubernetestのバージョンを落とすことで解決(kubectl v1.19.1-00, kind v0.9.0)

エラー時の環境

  • kind: v0.14.0
  • kubernetest: v1.24.1-00

command: mlbench create-cluster kind 3 my-cluster

I0615 05:23:34.996526     275 round_trippers.go:438] GET https://my-cluster-3-control-plane:6443/healthz?timeout=32s  in 0 milliseconds
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
    - 'docker ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

docker.errors.APIError

--> dockerでregistryのCONTAINER IDを調べてstopとdelete

$ docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED         STATUS         PORTS                                       NAMES
6a74047b55f1   kindest/node:v1.19.1   "/usr/local/bin/entr…"   3 minutes ago   Up 3 minutes                                               my-cluster-3-worker
708d610be1d1   kindest/node:v1.19.1   "/usr/local/bin/entr…"   3 minutes ago   Up 3 minutes   127.0.0.1:38503->6443/tcp                   my-cluster-3-control-plane
b99ded43a14b   kindest/node:v1.19.1   "/usr/local/bin/entr…"   3 minutes ago   Up 3 minutes                                               my-cluster-3-worker2
a54bd6a93976   registry:2             "/entrypoint.sh /etc…"   21 hours ago    Up 21 hours    0.0.0.0:5000->5000/tcp, :::5000->5000/tcp   kind-registry

$ docker stop a54bd6a93976
a54bd6a93976
$ docker rm a54bd6a93976
a54bd6a93976
Traceback (most recent call last):
  File "/home/hoge/.pyenv/versions/3.9.12/envs/3.9.12_mlbench/lib/python3.9/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/home/hoge/.pyenv/versions/3.9.12/envs/3.9.12_mlbench/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http+docker://localhost/v1.35/networks/ac328edd34a7252e98b1afe0b8776feecf0ff70c2c323224a05c795e71ca3297/connect

(中略)

docker.errors.APIError: 403 Client Error: Forbidden ("endpoint with name kind-registry already exists in network kind")

Error: Failed to create cluster with the following error

--> kindのclusterの削除で解決

$ kind get clusters
my-cluster-3
$ kind delete cluster --name my-cluster-3
Deleting cluster "my-cluster-3" ...
Try 'mlbench create-cluster kind --help' for help.

Error: Failed to create cluster with the following error:
 ERROR: failed to create cluster: node(s) already exist for a cluster with the name "my-cluster-3"