Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

scala - Upload zip file using --archives option of spark-submit on yarn

I have a directory with some model files and my application has to access these models files in local file system due to some reason.

Of course I know that --files option of spark-submit can upload file to the working directory of each executor and it does work.

However, I want keep the directory structure of my files so I come up with --archives option, which is said

YARN-only:
......
--archives ARCHIVES         Comma separated list of archives to be extracted into the working directory of each executor.
......

But when I actually use it to upload models.zip, I found yarn just put it there without extraction, like what it did with --files. Have I misunderstood to be extracted or misused this option?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

Please log in or register to answer this question.

1 Answer

0 votes
by (71.8m points)

Found the answer myself.

YARN does extract the archive but add an extra folder with the same name of the archive. To make it clear, If I put models/model1 and models/models2 in models.zip, then I have to access my models by models.zip/models/model1 and models.zip/models/model2.

Moreover, we can make this more beautiful using the # syntax.

The --files and --archives options support specifying file names with the # similar to Hadoop. For example you can specify: --files localtest.txt#appSees.txt and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name appSees.txt, and your application should use the name as appSees.txt to reference it when running on YARN.

Edit:

This answer was tested on spark 2.0.0 and I'm not sure the behavior in other versions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
...