Go regexp for printable characters

Issue

I have a Go server application that processes and saves a user-specified name. I really don’t care what the name is; if they want it to be in hieroglyphs or emojis that’s fine, as long as most clients can display it. Based on this question for C# I was hoping to use

^[^\p{Cc}\p{Cn}\p{Cs}]{1,50}$

basically 1-50 characters that are not control characters, unassigned characters, or partial UTF-16 characters. But Go does not support Cn. Basically I can’t find a reasonable regexp that will match any printable unicode string but not "퟿͸", for example.

I want to use regex because the clients are not written in Go and I want to be able to precisely match the server validation. It’s not clear how to match functions like isPrint in other languages.

Is there any way to do this other than hard-coding the unassigned unicode ranges into my application and separately checking for those?

Solution

You probably want to use just these Unicode character classes:

  • L (Letter)
  • M (Mark)
  • P (Punctuation)
  • S (Symbol)

That would give you this [positive] regular expression:

^[\pL\pM\pN\pP\pS]+$

Alternatively, test for those Unicode character classes which you don’t want:

  • Z (Separator)
  • C (Other)

Again, a positive regular expression:

^[^\pZ\pC]+$

Answered By – Nicholas Carey

Answer Checked By – Willingham (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.