Asked  9 Months ago    Answers:  5   Viewed   198 times

I need to come up with a function which will take a single string and it will do the following :

  1. check if it is a timestamp in UTC format (e.g. if it is of the form 2014-05-10T12:30:00).
  2. If it is in the format described above, replace 'T' with space and return the string.
  3. If it is not of timestamp, simply return the string.

What is the best way to accomplish this in python? I thought I could use datetime module. But can this be done using re module?

 Answers

2

you can match using a regex:

>>> s1 = "1) check if it is a timestamp in UTC format (e.g. if it is of the form '2014-05-10T12:30:00')."
>>> s2 = "3) If it is not of timestamp, simply return the string."
>>> re.compile('dddd-dd-dd(T)dd:dd:dd')
<_sre.SRE_Pattern object at 0x7f9781558470>
>>> s = re.sub(r'(.*dddd-dd-dd)T(dd:dd:dd.*)',r'1 2',s1)
>>> print(s)
1) check if it is a timestamp in UTC format (e.g. if it is of the form '2014-05-10 12:30:00').
>>> s = re.sub(r'(.*dddd-dd-dd)T(dd:dd:dd.*)',r'1 2',s2)
>>> print(s)
3) If it is not of timestamp, simply return the string.
>>> 

Play with it

The trick here, is to catch groups left and right of the T character, and paste them again around a space. As a bonus, if there's no match, there's no substitution.

Wednesday, September 22, 2021
 
Ali
 
Ali
3

I prefer using the dateutil library for timezone handling and generally solid date parsing. If you were to get an ISO 8601 string like: 2010-05-08T23:41:54.000Z you'd have a fun time parsing that with strptime, especially if you didn't know up front whether or not the timezone was included. pyiso8601 has a couple of issues (check their tracker) that I ran into during my usage and it hasn't been updated in a few years. dateutil, by contrast, has been active and worked for me:

import dateutil.parser
yourdate = dateutil.parser.parse(datestring)
Tuesday, June 1, 2021
 
MKM
 
MKM
3

Part of the problem here is that the strings usually used to represent timezones are not actually unique. "EST" only means "America/New_York" to people in North America. This is a limitation in the C time API, and the Python solution is… to add full tz features in some future version any day now, if anyone is willing to write the PEP.

You can format and parse a timezone as an offset, but that loses daylight savings/summer time information (e.g., you can't distinguish "America/Phoenix" from "America/Los_Angeles" in the summer). You can format a timezone as a 3-letter abbreviation, but you can't parse it back from that.

If you want something that's fuzzy and ambiguous but usually what you want, you need a third-party library like dateutil.

If you want something that's actually unambiguous, just append the actual tz name to the local datetime string yourself, and split it back off on the other end:

d = datetime.datetime.now(pytz.timezone("America/New_York"))
dtz_string = d.strftime(fmt) + ' ' + "America/New_York"

d_string, tz_string = dtz_string.rsplit(' ', 1)
d2 = datetime.datetime.strptime(d_string, fmt)
tz2 = pytz.timezone(tz_string)

print dtz_string 
print d2.strftime(fmt) + ' ' + tz_string

Or… halfway between those two, you're already using the pytz library, which can parse (according to some arbitrary but well-defined disambiguation rules) formats like "EST". So, if you really want to, you can leave the %Z in on the formatting side, then pull it off and parse it with pytz.timezone() before passing the rest to strptime.

Wednesday, July 14, 2021
 
Nil
 
Nil
4

Try matches():

{#dataField.matches('[0-9]{3,8}')}

This matches a string that is from 3 to 8 digits long (inclusive). You can change those values to whatever works for you.

You can also use open-ended length ranges: [0-9]{3,} means "at least 3 digits"

Thursday, August 19, 2021
 
5

There are many options, as showed here. A nice one would be list comprehensions.

Let's consider two strings, one that satisfies the criteria, other that doesn't:

>>> match = "123/456/"
>>> no_match = "123a456/"

We can check if a character of them matches by using isdigit() and comparation:

>>> match[0].isdigit() or match[0] == '/'
True

But we want to know if all chars match. We can get a list of results by using list comprehensions:

>>> [c.isdigit() or c == '/' for c in match]
[True, True, True, True, True, True, True, True]
>>> [c.isdigit() or c == '/' for c in no_match]
[True, True, True, False, True, True, True, True]

Note that the list of the non-matching string has False at the same position of the 'a' char.

Since we want all chars to match, we can use the all() function. It expects a list of values; if at least one of them is false, then it returns false:

>>> all([c.isdigit() or c == '/' for c in match])
True
>>> all([c.isdigit() or c == '/' for c in no_match])
False

Bonus points

Put on a function

You would be better to put it on a function:

>>> def digit_or_slash(s):
...     return all([c.isdigit() or c == '/' for c in s])
... 
>>> digit_or_slash(match)
True
>>> digit_or_slash(no_match)
False

Generator expressions

Generator expressions tend to be more efficient:

>>> def digit_or_slash(s):
...     return all(c.isdigit() or c == '/' for c in s)
... 

But in your case it is probably negligible anyway.

What about in?

I would prefer to use the in operator, as below:

>>> def digit_or_slash(s):
...     return all(c in "0123456789/" for c in s)

Note that this is only one of the options. Sadly, your problem fails this Zen of Python recommendation (>>> import this):

There should be one- and preferably only one -obvious way to do it.

But that's ok, now you can choose whatever you prefer :)

Thursday, November 11, 2021
 
rasmusx
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :