-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] Fixes for podman v4 #1573
base: main
Are you sure you want to change the base?
Conversation
} else { | ||
uniqueName = `${name}-${mUsedNames[name]}`; | ||
mUsedNames[name] += 1; | ||
let uniqueName = `${name}-${generateNamespace()}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was causing naming collisions right? There is another issue with the cleanup but using some random here sounds good :)
} else { | ||
uniqueName = `${name}-${mUsedNames[name]}`; | ||
mUsedNames[name] += 1; | ||
let uniqueName = `${name}-${generateNamespace()}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podman's --filter
option expects a regular expression rather than an exact matching string. Thus, doing podman pod ps -f name=temp
will return an array of statuses for every pod which contains temp
in its name, so you'll get temp
, temp-2
, and even tempo
, and just checking the first element of that array doesn't guarantee you see the info for that very pod you're looking for.
My naive attempt to wrap it like name='^temp$'
didn't succeed for some reason, so I ended up just assigning unique names to nodes. This should definitely be improved.
@@ -56,6 +56,7 @@ const TMP_DONE = "echo done > /tmp/zombie-tmp-done"; | |||
const TRANSFER_CONTAINER_WAIT_LOG = "waiting for tar to finish"; | |||
const NODE_CONTAINER_WAIT_LOG = "waiting for copy files to finish"; | |||
const WAIT_UNTIL_SCRIPT_SUFIX = `until [ -f ${FINISH_MAGIC_FILE} ]; do echo ${NODE_CONTAINER_WAIT_LOG}; sleep 1; done; echo copy files has finished`; | |||
const PODMAN_WAIT_UNTIL_SCRIPT_SUFIX = `sleep 5`; // FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised to find out that putting the magic file is a no-op for podman provider. I'm not sure how it was supposed to even work, so just put a long enough delay for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think there was a change on how the pods
run in podman between v3/v4 that make no need to use the magic-file
. I can review on v3
.
@@ -432,7 +434,7 @@ export class PodmanClient extends Client { | |||
|
|||
await this.createResource(podDef, false, false); | |||
|
|||
await this.wait_pod_ready(name); | |||
await this.wait_pod(name, waitExit ? ["Exited"] : ["Running"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race conditions are a big problem for podman. If we're spinning up a node, we want to wait until the pod is running. But if we're generating a chainspec, we need to wait until the pod has finished its job. Every second time the chainspec was generated, I ended up with an empty chainspec in the network directory because we only waited for the pod to start, and we tried to copy the chainspec at once, but it was not ready yet. The pod started writing it but didn't finish, so the empty file is copied.
To address that, I added an explicit waiting condition, and now races are gone (mostly; see other comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can reuse the same approach from k8s
here. We wait until some log to detect that the chain-spec is ready.
@@ -449,6 +451,15 @@ export class PodmanClient extends Client { | |||
localFilePath: string, | |||
): Promise<void> { | |||
debug(`cp ${this.tmpDir}/${identifier}${podFilePath} ${localFilePath}`); | |||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To copy a file we need that file to be ready; it was not always the case. Sometimes, it's just not there yet, and sometimes, the pod starts writing something, but the file is still zero-length when we try to copy it. I don't remember the exact reason why in this very case I couldn't use wait_pod
for Exited
status as implemented above, but I had to implement this loop as my best effort to address this problem. Given that the pause between iterations is 1 sec, and PODMAN_WAIT_UNTIL_SCRIPT_SUFIX
is a 5 sec sleep, it always succeeds. Nevertheless, we definitely could use a better design here.
if (waitReady) await this.wait_pod_ready(name); | ||
if (waitReady) await this.wait_pod(name); | ||
|
||
if (typeof this.podSuffix === "undefined") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podman v3 and podman v4 resource naming schemes are different. Here, I try to guess which one is used (because I believe the naming changed not exactly in v4.0.0, but in v4.something). Beware: this wasn't tested with podman v3. I only made sure it works for v4.
@@ -34,7 +34,7 @@ export interface ContainerPort { | |||
export interface Container { | |||
image: string; | |||
name: string; | |||
imagePullPolicy: "Always"; | |||
imagePullPolicy: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing this to be Always
triggers a Docker Hub rate limit rather soon. I'm manually patching this to be IfNotPresent
for individual resources when doing a lot of iterations; hence, some freedom is required on the interface level. I'd love to see a config file option to explicitly set the image pull policy for every image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be configurable through the network config and use IfNotPresent
as default. (both here and in k8s)
Hey @s0me0ne-unkn0wn thanks for your contribution 🙌 , looks great! I will check if something is break in podman v3. Thanks again for your contribution!! |
I've spent a weekend playing around with larger zombienets using podman 4.7.2 as a provider. This PR is a set of dirty fixes and ugly hacks I made to make things work. Neither of them is complete, well-tested, or carefully designed; I'm not ready to wear a Typescript developer's hat. I just hope someone will pick this PR up and make things pretty using the ideas I exposed here. With the patches in this PR, I'm currently running 20 validators and 6 parachains on the local machine, and the network starts up smoothly every time.
I'm going to leave some comments in a self-review to make it easier to understand the underlying ideas.