Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does v1.6 support spark on yarn? #1901

Open
foxgarden opened this issue Jul 2, 2018 · 2 comments
Open

Does v1.6 support spark on yarn? #1901

foxgarden opened this issue Jul 2, 2018 · 2 comments

Comments

@foxgarden
Copy link

foxgarden commented Jul 2, 2018

Hi,

In reference.conf I set (and some other minor options):
# The execution modes in Sparta are: local, mesos or marathon
sparta.config.executionMode = yarn

# Yarn cluster name
sparta.yarn.master = yarn

# Cluster or Client. If the user need more than one policy running is necessary use "cluster". Is the same as the variable spark.submit.deployMode
sparta.yarn.deployMode = cluster

I have a correct workflow which can run on local mode, but after switching to yarn mode, I get below logs. It seems like sparta cannot connect with Resource Manager. Could anybody help with this issue?

02 Jul 2018 15:29:31.053 INFO c.s.s.s.c.a.ClusterLauncherActor Sparta submit options initialized correctly
02 Jul 2018 15:29:31.062 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Failed ---> NotStarted
Status Information: The checker detects that the policy not start/stop correctly ---> Sparta submit options initialized correctly
Submission Id: undefined ---> undefined
Submission Status: LOST ---> LOST
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:31.103 INFO c.s.s.s.c.a.ClusterLauncherActor Launching Sparta Job with options ...
Policy name: test1
Main Class: com.stratio.sparta.driver.SparkDriver
Driver file: http://0.0.0.0:9090/sparta/driver/driver-1.6.0-SNAPSHOT.jar
Master: yarn
Spark submit arguments: --deploy-mode -> cluster,--num-executors -> 1,--properties-file -> /etc/spark2/conf/spark-defaults.conf,--proxy-user -> hdfs
Spark configurations: spark.sql.parquet.binaryAsString -> true,spark.app.name -> test1-2018/07/02-03:29:30,spark.driver.memory -> 1G,spark.driver.cores -> 1,spark.mesos.driverEnv.SPARK_USER -> ,spark.executor.memory -> 1G,spark.executor.cores -> 1
Driver arguments: Map(plugins -> ICw=, clusterConfig -> eyJ5YXJuIjp7ImRlcGxveU1vZGUiOiJjbHVzdGVyIiwiZHJpdmVyQ29yZXMiOjEsImRyaXZlck1lbW9yeSI6IjFHIiwiZXhlY3V0b3JDb3JlcyI6MSwiZXhlY3V0b3JNZW1vcnkiOiIxRyIsImtpbGxVcmwiOiIvdjEvc3VibWlzc2lvbnMva2lsbCIsIm1hc3RlciI6Inlhcm4iLCJudW1FeGVjdXRvcnMiOjEsInByb3BlcnRpZXNGaWxlIjoiL2V0Yy9zcGFyazIvY29uZi9zcGFyay1kZWZhdWx0cy5jb25mIiwicHJveHktdXNlciI6ImhkZnMiLCJzcGFyayI6eyJzcWwiOnsicGFycXVldCI6eyJiaW5hcnlBc1N0cmluZyI6dHJ1ZX19fSwic3BhcmtIb21lIjoiL29wdC9jbG91ZGVyYS9wYXJjZWxzL1NQQVJLMi0yLjEuMC5jbG91ZGVyYTItMS5jZGg1LjcuMC5wMC4xNzE2NTgvbGliL3NwYXJrMiJ9fQ==, detailConfig -> eyJjb25maWciOnsiYWRkVGltZVRvQ2hlY2twb2ludFBhdGgiOmZhbHNlLCJhdXRvRGVsZXRlQ2hlY2twb2ludCI6dHJ1ZSwiYXdhaXRQb2xpY3lDaGFuZ2VTdGF0dXMiOiIxODBzIiwiYmFja3Vwc0xvY2F0aW9uIjoiL29wdC9zZHMvc3BhcnRhL2JhY2t1cHMiLCJjaGVja3BvaW50UGF0aCI6Ii90bXAvc3BhcnRhL2NoZWNrcG9pbnQiLCJkcml2ZXJQYWNrYWdlTG9jYXRpb24iOiIvb3B0L3Nkcy9zcGFydGEvZHJpdmVyIiwiZHJpdmVyVVJJIjoiaHR0cDovLzAuMC4wLjA6OTA5MC9zcGFydGEvZHJpdmVyL2RyaXZlci0xLjYuMC1TTkFQU0hPVC5qYXIiLCJleGVjdXRpb25Nb2RlIjoieWFybiIsImZyb250ZW5kIjp7InRpbWVvdXQiOjUwMDB9LCJwbHVnaW5QYWNrYWdlTG9jYXRpb24iOiIvb3B0L3Nkcy9zcGFydGEvcGx1Z2lucyIsInJlbWVtYmVyUGFydGl0aW9uZXIiOnRydWV9fQ==, storageConfig -> IA==, policyId -> d23359d0-de5b-4589-bb5a-236b1bde8eed, zookeeperConfig -> eyJ6b29rZWVwZXIiOnsiY29ubmVjdGlvblN0cmluZyI6IjEwLjAuMTEuMjI6MjE4MSwxMC4wLjExLjMwOjIxODEsMTAuMC4xMS4zMToyMTgxIiwiY29ubmVjdGlvblRpbWVvdXQiOjE1MDAwLCJyZXRyeUF0dGVtcHRzIjo1LCJyZXRyeUludGVydmFsIjoxMDAwMCwic2Vzc2lvblRpbWVvdXQiOjYwMDAwfX0=)
02 Jul 2018 15:29:31.128 INFO c.s.s.s.c.a.ClusterLauncherActor Sparta cluster job launched correctly
02 Jul 2018 15:29:31.131 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: NotStarted ---> Launched
Status Information: Sparta submit options initialized correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: LOST ---> UNKNOWN
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:31.205 INFO c.s.s.s.c.a.ClusterLauncherActor Cluster context listener added to test1 with id: d23359d0-de5b-4589-bb5a-236b1bde8eed
02 Jul 2018 15:29:31.218 INFO c.s.s.s.c.a.ClusterLauncherActor Starting scheduler task in awaitPolicyChangeStatus with time: 180s
02 Jul 2018 15:29:33.764 INFO c.s.s.s.c.a.ClusterLauncherActor Submission state changed to ... CONNECTED
02 Jul 2018 15:29:33.767 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Launched ---> Launched
Status Information: Sparta cluster job launched correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: UNKNOWN ---> CONNECTED
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:34.299 INFO c.s.s.s.c.a.ClusterLauncherActor Submission state changed to ... LOST
02 Jul 2018 15:29:34.301 INFO c.s.s.s.c.a.ClusterLauncherActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Launched ---> Launched
Status Information: Sparta cluster job launched correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: CONNECTED ---> LOST
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:51.657 INFO c.s.s.s.core.actor.StatusActor Updating context d23359d0-de5b-4589-bb5a-236b1bde8eed with name test1:
Status: Launched ---> Stopping
Status Information: Sparta cluster job launched correctly ---> Sparta cluster job launched correctly
Submission Id: undefined ---> undefined
Submission Status: LOST ---> LOST
Marathon Id: undefined ---> undefined
Last Error: undefined ---> undefined
Last Execution Mode: yarn-cluster ---> yarn-cluster
Resource Manager URL: undefined ---> undefined
02 Jul 2018 15:29:51.678 INFO c.s.s.s.c.a.ClusterLauncherActor Stopping message received from Zookeeper
02 Jul 2018 15:29:51.678 INFO c.s.s.s.c.a.ClusterLauncherActor The Sparta System don't have submission id associated to policy test1
02 Jul 2018 15:29:51.679 INFO c.s.s.s.c.a.ClusterLauncherActor Node cache to cluster context listener closed correctly

@foxgarden
Copy link
Author

foxgarden commented Jul 2, 2018

@compae please help!
In ResourceManager log I found:
2018-07-02 19:00:48,944 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 2014

this means driver could connect with ResourceManager, right? But driver process became DEAD in 2-3 seconds, then "Submission Status" changed to LOST.

@foxgarden
Copy link
Author

Solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant