Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6144] Add Helm chart for deploying Zeppelin on Kubernetes #4896

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

ChenYi015
Copy link

What is this PR for?

A few sentences describing the overall goals of the pull request's commits.
First time? Check out the contributing guide - https://zeppelin.apache.org/contribution/contributions.html

What type of PR is it?

Feature

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-6144

How should this be tested?

  • Strongly recommended: add automated unit tests for any new or changed behavior
  • Outline any manual steps to test the PR here.

Screenshots (if appropriate)

Questions:

  • Does the license files need to update?
  • Is there breaking changes for older versions?
  • Does this needs documentation?

@ChenYi015 ChenYi015 marked this pull request as ready for review November 13, 2024 13:32
@ChenYi015 ChenYi015 requested a review from Reamer November 13, 2024 13:32
envFrom:
- configMapRef:
name: {{ include "zeppelin.envConfigMapName" . }}
volumeMounts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin also has a configuration directory where it stores the system state. When you restart pod, these files will be reset and return to their original state. It is necessary to provide the ability to use pvc to store configs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Armadik Yes, you can configure the helm chart to mount a PVC to the server pod in order to persist config files like shiro.ini and interpreter.json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the shiro.ini should be mounted via ConfigMap just like the logging configuration.
Only credentials.json, interpreter.json and notebook-authorization.json should be outsourced. This can be done with ZEPPELIN_CONFIG_FS_DIR.

Other environment variables which should link to a pvc are ZEPPELIN_NOTEBOOK_DIR and ZEPPELIN_SEARCH_INDEX_PATH.

I currently use the following environment variables. The PVC is mounted under /data.

ZEPPELIN_CONFIG_FS_DIR=/data/zeppelin-config
ZEPPELIN_NOTEBOOK_DIR=/data/zeppelin-notebook
ZEPPELIN_SEARCH_INDEX_PATH=/data/lucene-index

Signed-off-by: Yi Chen <[email protected]>
@ChenYi015
Copy link
Author

@Reamer Please take a look again when you have time, thank you! We have already used this Helm chart in the production environment to do interactive analysis with Spark.

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a quick look at the change and left you a few comments. I'm more of a friend of Kustomize, but I also know that Helm is more powerful.
The changes definitely look much better than before with the dns and nginx.

export ZEPPELIN_K8S_SERVICE_NAME={{ include "zeppelin.server.serviceName" . }}
export ZEPPELIN_K8S_SPARK_CONTAINER_IMAGE={{ include "zeppelin.interpreter.spark.image" . }}

zeppelin-site.xml: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please carry out a configuration on environment variables. This configuration is easier to understand and does not require an additional xsl file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to provide both methods. One can choose to configure the Zeppelin conf with server.conf or configure the environment variables with server.env / server.envFrom value, and let users to choose whatever they like.

# limitations under the License.
#
#
# [name] [maven artifact] [description]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which functionality is this file required?

limitations under the License.
*/ -}}

apiVersion: rbac.authorization.k8s.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that Role, ServiceAccount and Role binding are only required for the Spark interpreter. How is this implemented?

Copy link
Author

@ChenYi015 ChenYi015 Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is implemented in interpreter-spec.yaml file, we will not mount the service account token except for the interpreter group name is spark, see : https://github.com/ChenYi015/zeppelin/blob/beea9837983b16e50e1be9ca56c49d828148ffeb/charts/zeppelin/templates/interpreter/configmap.yaml#L165-L170.

{{ $key }} {{ $value }}
{{- end }}

driver-pod-template.yaml: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver pod is created directly by Zeppelin. I think this file has no effect on an already existing pod. Zeppelin starts the Driver Pod in client mode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the driver pod template will not take effect actually.

{{- toYaml . | nindent 8 }}
{{- end }}

executor-pod-template.yaml: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration could be used in the Spark driver. I have not yet tried such a template. I currently configure the executors and drivers via the Spark-Config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the executor pod template file for that some fields of the executor pods (e.g. affinity, tolerations) cannot be configured using Spark configuration. Additionally, I believe it is more straightforward to configure the executor pods using a pod template rather than through Spark conf.

# -- Security context for Zeppelin interpreter containers.
securityContext:
runAsNonRoot: true
runAsUser: 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not enter a fixed UID. Openshift, for example, uses a random UID by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tried whether it can work with a random UID. The uid 1000 comes from the zeppelin server Dockerfile, ref:

USER 1000
EXPOSE 8080
ENTRYPOINT [ "/usr/bin/tini", "--" ]
WORKDIR ${ZEPPELIN_HOME}
CMD ["bin/zeppelin.sh"]
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants