Here's a regex way. It will return the first set of 4 digits followed by an underscore followed by 2 digits followed by an underscore followed by 2 digits. It does not rely on character positioning, but on the fact that the date you want will be in this pattern and the first occurrence of that pattern in the string. I find when creating a regex you first need to be able to describe what you need to extract in plain language and make sure that description satisfies all possibilities you are dealing with. Including the unexpected! What will happen when the pattern is not found? Always expect the unexpected and code for it too. Since the regular expression language can get complicated make sure you understand it and document it in your code for future developers. Code for the person that will maintain this behind you. Don't make them spend 3 days figuring this out when you can include a comment to make it clear and have them coding in minutes. Have you been that person? Don't write expensive code (it cost an extra 3 days developer time!) and have the person behind you curse your name. lol Anyway I'm getting way off track.
{REGEX(Replace|acs_xyzl_abb_cndsD_11_AAP_pqrsty_2020_06_10_1_2019_06_18_1523.xlsx|.*?_(\d{4}_\d{2}_\d{2}).*|$1)}
The regex basically describes the string and defines a capture group around the part you want to keep. The captured group is returned by being referenced as $1.
Explained:
. - Match any character
* - Modifies the previous character. Zero or more of the previous character. So match zero or more of any characters.
? - Make that set of previous characters optional or non-greedy (stop after the first occurrence)
_ - when followed by an underscore
( - Start a group
\d{4}_\d{2}_\d{2} - match 4 digits, an underscore, 2 digits, an underscore and 2 digits)
) - close the group
.* - match the rest of the line. zero or more of any characters.
In the replace section:
$1 - Replace with the contents of the first captured group.
Expect the unexpected!
The entire string will be returned if the match is not found. Make sure that is handled. Also this pattern exists twice in the string. If the filename is malformed and only contains the last pattern, THAT will be returned. Is that ok? If not, the regex may need to be tweaked to ensure the occurrence of that pattern where it occurs at the end is never returned. Consider this:
{REGEX(Replace|acs_xyzl_abb_cndsD_11_AAP_pqrsty_2019_06_10_1_2019_06_18_1523.xlsx|.*?_(\d{4}_\d{2}_\d{2})_\d+_(\d{4}_\d{2}_\d{2}).*\.xlsx|$1)}
It will only return the first occurrence of the pattern, not the last one. (Note it's more complex then it has to be to meet this condition in order to work for the next examples). Remove a digit in the year of the first date. Put it back then change the $1 to $2. Make sure you know the specs of exactly what you need to return before building the regex as it can return the unexpected as well. Play around with the data a little and see what gets returned before using this in production.
Edited by user
2020-06-26T13:59:56Z
|
Reason: Not specified