Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to load jffi binary stub from noexec tmp dir #158

Open
shubhamb12 opened this issue Nov 25, 2024 · 3 comments
Open

Failure to load jffi binary stub from noexec tmp dir #158

shubhamb12 opened this issue Nov 25, 2024 · 3 comments

Comments

@shubhamb12
Copy link

shubhamb12 commented Nov 25, 2024

Hi!,

We are using "com.datadoghq" % "java-dogstatsd-client" % "4.2.0", as SBT dependency for our Flink application. Suddenly during HPA rescaling or any general redeployment we are seeing the following jnr related error from StatsDClientBuilder

Flink version: 1.18
DatadogClient: 3.33.0

java.lang.UnsatisfiedLinkError: could not load FFI provider jnr.ffi.provider.jffi.Provider
	at jnr.ffi.provider.InvalidRuntime.newLoadError(InvalidRuntime.java:101)
	at jnr.ffi.provider.InvalidRuntime.findType(InvalidRuntime.java:42)
	at jnr.ffi.Struct$NumberField.<init>(Struct.java:872)
	at jnr.ffi.Struct$Unsigned16.<init>(Struct.java:1240)
	at jnr.unixsocket.SockAddrUnix$DefaultSockAddrUnix.<init>(SockAddrUnix.java:209)
	at jnr.unixsocket.SockAddrUnix.create(SockAddrUnix.java:174)
	at jnr.unixsocket.UnixSocketAddress.<init>(UnixSocketAddress.java:53)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder$1.call(NonBlockingStatsDClientBuilder.java:261)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder$1.call(NonBlockingStatsDClientBuilder.java:259)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder.staticAddressResolution(NonBlockingStatsDClientBuilder.java:283)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder.staticStatsDAddressResolution(NonBlockingStatsDClientBuilder.java:299)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder.resolve(NonBlockingStatsDClientBuilder.java:217)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder.build(NonBlockingStatsDClientBuilder.java:193)
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:309)
	at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
	at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter$SourceOutputWrapper.collect(KafkaRecordEmitter.java:67)
	at org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:84)
org.apache.flink.connector.kafka.source.reader.deserializer.KafkaValueOnlyDeserializationSchemaWrapper.deserialize(KafkaValueOnlyDeserializationSchemaWrapper.java:51)

Caused by: java.lang.UnsatisfiedLinkError: could not get native definition for type `POINTER`, original error message follows: java.io.IOException: Unable to write jffi binary stub to `/tmp`. Set `TMPDIR` or Java property `java.io.tmpdir` to a read/write path that is not mounted "noexec".
	at com.kenai.jffi.internal.StubLoader.tempReadonlyError(StubLoader.java:414)
	at com.kenai.jffi.internal.StubLoader.loadFromJar(StubLoader.java:399)
	at com.kenai.jffi.internal.StubLoader.load(StubLoader.java:278)
	at com.kenai.jffi.internal.StubLoader.<clinit>(StubLoader.java:487)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Unknown Source)
	at com.kenai.jffi.Init.load(Init.java:68)
	at com.kenai.jffi.Foreign$InstanceHolder.getInstanceHolder(Foreign.java:49)
	at com.kenai.jffi.Foreign$InstanceHolder.<clinit>(Foreign.java:45)
	at com.kenai.jffi.Foreign.getInstance(Foreign.java:103)
	at com.kenai.jffi.Type$Builtin.lookupTypeInfo(Type.java:242)
	at com.kenai.jffi.Type$Builtin.getTypeInfo(Type.java:237)
	at com.kenai.jffi.Type.resolveSize(Type.java:155)
	at com.kenai.jffi.Type.size(Type.java:138)
	at jnr.ffi.provider.jffi.NativeRuntime$TypeDelegate.size(NativeRuntime.java:178)
	at jnr.ffi.provider.AbstractRuntime.<init>(AbstractRuntime.java:48)
	at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:57)
	at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:41)
	at jnr.ffi.provider.jffi.NativeRuntime$SingletonHolder.<clinit>(NativeRuntime.java:53)
	at jnr.ffi.provider.jffi.NativeRuntime.getInstance(NativeRuntime.java:49)
	at jnr.ffi.provider.jffi.Provider.<init>(Provider.java:29)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at java.base/java.lang.Class.newInstance(Unknown Source)
	at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.getInstance(FFIProvider.java:68)
	at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.<clinit>(FFIProvider.java:57)
	at jnr.ffi.provider.FFIProvider.getSystemProvider(FFIProvider.java:35)
	at jnr.ffi.Runtime$SingletonHolder.<clinit>(Runtime.java:82)
	at jnr.ffi.Runtime.getSystemRuntime(Runtime.java:67)
	at jnr.unixsocket.SockAddrUnix.<init>(SockAddrUnix.java:46)
	at jnr.unixsocket.SockAddrUnix$DefaultSockAddrUnix.<init>(SockAddrUnix.java:208)
	at jnr.unixsocket.SockAddrUnix.create(SockAddrUnix.java:174)
	at jnr.unixsocket.UnixSocketAddress.<init>(UnixSocketAddress.java:53)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder$1.call(NonBlockingStatsDClientBuilder.java:261)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder$1.call(NonBlockingStatsDClientBuilder.java:259)
	at com.timgroup.statsd.NonBlockingStatsDClientBuilder.staticAddressResolution(NonBlockingStatsDClientBuilder.java:283)
	at 
	at scala.Option.foreach(Option.scala:257)
	at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38)
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
	at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:309)
	at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
	at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter$SourceOutputWrapper.collect(KafkaRecordEmitter.java:67)
	at org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:84)
org.apache.flink.connector.kafka.source.reader.deserializer.KafkaValueOnlyDeserializationSchemaWrapper.deserialize(KafkaValueOnlyDeserializationSchemaWrapper.java:51)
	at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:53)
	at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:33)
	at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:160)
	at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:419)
	at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedByInterruptException
	at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.endBlocking(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.size(Unknown Source)
	at java.base/sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source)
	at com.kenai.jffi.internal.StubLoader.loadFromJar(StubLoader.java:392)
	... 74 more

	at com.kenai.jffi.Type$Builtin.lookupTypeInfo(Type.java:253)
	at com.kenai.jffi.Type$Builtin.getTypeInfo(Type.java:237)
	at com.kenai.jffi.Type.resolveSize(Type.java:155)
	at com.kenai.jffi.Type.size(Type.java:138)
	at jnr.ffi.provider.jffi.NativeRuntime$TypeDelegate.size(NativeRuntime.java:178)
	at jnr.ffi.provider.AbstractRuntime.<init>(AbstractRuntime.java:48)
	at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:57)
	at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:41)
	at jnr.ffi.provider.jffi.NativeRuntime$SingletonHolder.<clinit>(NativeRuntime.java:53)
	at jnr.ffi.provider.jffi.NativeRuntime.getInstance(NativeRuntime.java:49)
	at jnr.ffi.provider.jffi.Provider.<init>(Provider.java:29)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at java.base/java.lang.Class.newInstance(Unknown Source)
	at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.getInstance(FFIProvider.java:68)
	at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.<clinit>(FFIProvider.java:57)
	at jnr.ffi.provider.FFIProvider.getSystemProvider(FFIProvider.java:35)
	at jnr.ffi.Runtime$SingletonHolder.<clinit>(Runtime.java:82)
	at jnr.ffi.Runtime.getSystemRuntime(Runtime.java:67)
	at jnr.unixsocket.SockAddrUnix.<init>(SockAddrUnix.java:46)
	at jnr.unixsocket.SockAddrUnix$DefaultSockAddrUnix.<init>(SockAddrUnix.java:208)
	... 43 more

We have already tried upgrading to latest 4.4.3 but no luck.

Thanks for any help

@headius
Copy link
Member

headius commented Nov 25, 2024

This is probably an issue with the default temp directory being secure and set noexec to prevent loading executable code from there. The key error message here is this one:

java.io.IOException: Unable to write jffi binary stub to /tmp. Set TMPDIR or Java property java.io.tmpdir to a read/write path that is not mounted "noexec".

This means that the /tmp directory on your system does not allow loading executable code, and as a result we can't unpack and load the jffi dynamic library from there. You should be able to work around this by setting the suggested variables to a different temp location that does allow loading executable code.

Can you give that a try and let us know how it goes?

@headius headius changed the title Transient JFFI UnsatisfiedLinkError for Flink application Failure to load jffi binary stub from noexec tmp dir Nov 25, 2024
@headius
Copy link
Member

headius commented Nov 25, 2024

I've retitled this to better reflect this issue's cause for future users.

@shubhamb12
Copy link
Author

Hey @headius
/tmp is already there with all permission and issue is transient meaning even for the same pipeline this issue only occurs for some pods.
Also we have same setup for 400-500 other pipelines which never break. I suspect the loadFromJar doesnt gracefully handle either some race condition or low disk space just my assumption. I wonder if we can retry this somehow or handle this gracefully.
On adding try catch to statsd init level with retry will handle the exception but then statsd wont be initiated at all which is also unacceptable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants