Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
303 views
in Technique[技术] by (71.8m points)

Validate multiple date formats (Pandas DataFrame)

Is there a way to check if date values in a DataFrame column abide by 1 of N possible date formats, and no others?

The only acceptable formats are:

  1. YYYY-MM-DD H:M:S --> (with or without decimal places in the seconds field)
  2. YYYY-MM-DD H:M
  3. YYYY-MM-DD H
  4. YYYY-MM-DD

In each of the 4 cases, I want to append the required number of 0's to make the field whole. So, for example, "2020-11-28 12" becomes "2020-11-28 12:00:00", "2020-12-18 22:31" becomes "2020-12-18 22:31:00" etc.

I borrowed @cs95's answer from this post and it works like a charm for all 4 formats I mentioned above. This is what I wrote:

if pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S', errors='coerce').notnull().all():
    df['DATE'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S', errors='coerce')
    #something

However, the problem is that it also allows for fields with just YYYY and YYYY-MM. For instance, it'll accept "2020" as a valid field and make it "2020-01-01 00:00:00".

I want to make sure the allowance for missing fields is strictly capped at YYYY-MM-DD (i.e. YYYY and YYYY-MM are rejected). Can this be done using the method I've used or do I have to try something else? Any help is appreciated here!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

Please log in or register to answer this question.

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to JiKe DevOps Community for programmer and developer-Open, Learning and Share
...