Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

python - Saving result of DataFrame show() to string in pyspark

I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.

df.show()
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

The ultimate purpose is to capture this as string inside my logger.info I tried logger.info(df.show()) which will only display on console.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

Please log in or register to answer this question.

1 Answer

0 votes
by (71.8m points)

You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show() and observe that it is calling self._jdf.showString().

The answer depends on which version of spark you are using, as the number of arguments to show() has changed over time.

Spark Version 2.3 and above

In version 2.3, the vertical argument was added.

def getShowString(df, n=20, truncate=True, vertical=False):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20, vertical))
    else:
        return(df._jdf.showString(n, int(truncate), vertical))

Spark Versions 1.5 through 2.2

As of version 1.5, the truncate argument was added.

def getShowString(df, n=20, truncate=True):
    if isinstance(truncate, bool) and truncate:
        return(df._jdf.showString(n, 20))
    else:
        return(df._jdf.showString(n, int(truncate)))

Spark Versions 1.3 through 1.4

The show function was first introduced in version 1.3.

def getShowString(df, n=20):
    return(df._jdf.showString(n))

Now use the helper function as follows:

x = getShowString(df)  # default arguments
print(x)
#+----+-------+
#| age|   name|
#+----+-------+
#|null|Michael|
#|  30|   Andy|
#|  19| Justin|
#+----+-------+

Or in your case:

logger.info(getShowString(df))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
...