Creating a KFServing Cluster on EKS

We’ll cover:


  • Your own domain (maybe it’s possible to do it without a custom domain, but I don’t know. Let me know if you find a way to do it)
  • AWS cli
  • Kubectl and eksctl installed (see Creating an EKS Cluster for instructions)

Clone the Repo

We’ll need some files from this repo:

git clone

Create a EKS cluster

We’ll use eksctl for creating a cluster. More info.

Create a config file (cluster.yaml):

kind: ClusterConfig

  name: ml-bookcamp-eks
  region: eu-west-1
  version: "1.18"

  - name: ng
    desiredCapacity: 2
    instanceType: m5.xlarge

Apply it:

eksctl create cluster -f cluster.yaml

It takes a while — up to 20 minutes. In the meantime, you can do the next steps (configuring a subdomain and certificate manager)

It should create a cluster and put the config to ~/.kube/config. If there’s an error and it doesn’t create the config file, create it with AWS cli:

aws eks --region eu-west-1 update-kubeconfig --name ml-bookcamp-eks

Verify that you can connect to the cluster:

kubectl get service

It should return the list of services currently running on a cluster:

kubernetes   ClusterIP   <none>        443/TCP   6m17s

Configuring a subdomain

Go to “Route 53”, create a hosted zone. Put any subdomain there. For example, “”:

When it’s created, note the NS record:

Here they are:


Now configure it in your domain provider. Go to DNS settings of your domain provider (e.g. GoDaddy). Create a new NS (Namespace) record for each:

Configuring Certificate Manager

To access the endpoint with http, you’ll need to create a certificate.

Go to Certificate Manager and select “Provision certificates”. Choose “Request a public certificate”, click “request a certificate”. Choose the “DNS validation method”.

Click “request” and then, in the next step, click on the domain name, and then click “Create record in Route 53”.

Validating the certificate should be fast (1-2 minutes), but may take more time (up to 30 minutes).

Take a note of the certificate’s ARN. We’ll need it for the next step.

Install KFServing

We’ll use the quick install script. It will install

  • Istio
  • KNative Serving
  • Cert manager
  • KFServing

Go to the install directory. First, rename istio-operator-template.yaml to istio-operator.yaml

Find the line with "" and use your certificate ARN from the previous step.

Run the installation script:


Next, we need it to use our domain. Create a config file for it (config-domain.yaml):

apiVersion: v1
kind: ConfigMap
  name: config-domain
  namespace: knative-serving
data: ""

Apply it:

kubectl apply -f config-domain.yaml

Check that it installed correctly:

kubectl get pod -n kfserving-system

You should see:

NAME                             READY   STATUS    RESTARTS   AGE
kfserving-controller-manager-0   2/2     Running   0          107s

Configure Route53

The previous step installed Istio. It created a load balancer. We’ll now need the domain we created previously to point to that ELB.

Go to EC2, select “Load Balancers’’, find the load balancer created by Istio. If you have multiple load balancers, look for the one with tag “ : owned”

Copy its DNS name.

Then go to Route53, select the hosted zone we created previously and create a new CNAME record:

  • Record name: * (asterisk)
  • Record type: CNAME
  • Value: the DNS name of the load balancer

Now this domain can be used by KF-Serving.

Deploying a test model to KFServing

This step is optional. Feel free to jump to the next section.

Let’s deploy a simple model. Create a config (“tf-flowers.yaml” - we copied it from KFServing repo) with this content:

apiVersion: ""
kind: "InferenceService"
  name: "flowers-sample"
        storageUri: "gs://kfserving-samples/models/tensorflow/flowers"

Apply it:

kubectl apply -f tf-flowers.yaml

Check that it works:

kubectl get inferenceservice

The output:

NAME             URL                                                                              READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
flowers-sample   True    100                                45m

Test it with curl:


curl -X POST \
    -H 'Content-Type: application/json' \
    -d @flowers-input.json \

The output:

    "predictions": [
            "prediction": 0,
            "key": "   1",
            "scores": [0.999114931, 9.20989623e-05, 0.000136786737, 0.000337258854, 0.000300533458, 1.84814289e-05]

It works! We can delete this model:

kubectl delete -f tf-flowers.yaml

Let’s now deploy our own model.

Configure KF-Serving to use S3

We will save our model to S3, so KFServing will need to be able to access it to fetch the model files. We need to do it by providing credentials. (Not sure if it can use a role, if it can, please let me know and I’ll update the tutorial).

The credentials need to be encoded with base64. Let’s do it:

echo -ne ${AWS_ACCESS_KEY_ID} | base64
echo -ne ${AWS_ACCESS_KEY_ID} | base64

Now create a config with secrets (kfserving-s3-secret.yaml). Change the region if you need:

apiVersion: v1
kind: Secret
  name: mysecret
  annotations: "1" "1" eu-west-1
type: Opaque
  # put your encoded key and secret here
  awsAccessKeyID: QUtJQVhxxxVXVjQ=
  awsSecretAccessKey: QzR0UnxxxVNOd0NQQQ==
apiVersion: v1
kind: ServiceAccount
  name: sa
  - name: s3credentials

Apply it:

kubectl apply -f kfserving-s3-secret.yaml

Now KF-serving will be able to fetch models from S3

By the way, this is how the config for the test flowers model looks like when the models are in S3:

apiVersion: ""
kind: "InferenceService"
  name: "flowers-sample"
      serviceAccountName: sa
        storageUri: "s3://mlbookcamp-models/flowers"

We add serviceAccountName and change the storageUri to use s3.

That’s all! Now you can use a KFServing cluster on EKS for deploying models!

