How to Check If The Rune is Chinese Punctuation Character in Go

Issue

For Chinese punctuation chars like , how to detect via Go?

I tried with range table of package unicode like the code below, but Han doesn’t include those punctuation chars.

Can you please tell me which range table should I use for this task? (Please refraining from using regex because it’s low performance.)

for _, r := range strToDetect {
    if unicode.Is(unicode.Han, r) {
        return true
    }
}

Solution

Puctuation marks are scattered about in different Unicode code blocks.


The Unicode® Standard
Version 14.0 – Core Specification

Chapter 6
Writing Systems and Punctuation
https://www.unicode.org/versions/latest/ch06.pdf

Punctuation. The rest of this chapter deals with a special case: punctuation marks, which tend to be scattered about in different blocks and which may be used in common by many scripts. Punctuation characters occur in several widely separated places in the blocks, including Basic Latin, Latin-1 Supplement, General Punctuation, Supplemental Punctuation, and CJK Symbols and Punctuation. There are also occasional punctuation characters in blocks for specific scripts.


Here are two of your examples,

〜 Wave Dash U+301C

。Ideographic Full Stop U+3002


package main

import (
    "fmt"
    "unicode"
)

func main() {
    // CJK Symbols and Punctuation Unicode block
    for r := rune('\u3000'); r <= '\u303F'; r++ {
        if unicode.IsPunct(r) {
            fmt.Printf("%[1]U\t%[1]c\n", r)
        }
    }
}

https://go.dev/play/p/WoJjM6JKTYR

U+3001  、
U+3002  。
U+3003  〃
U+3008  〈
U+3009  〉
U+300A  《
U+300B  》
U+300C  「
U+300D  」
U+300E  『
U+300F  』
U+3010  【
U+3011  】
U+3014  〔
U+3015  〕
U+3016  〖
U+3017  〗
U+3018  〘
U+3019  〙
U+301A  〚
U+301B  〛
U+301C  〜
U+301D  〝
U+301E  〞
U+301F  〟
U+3030  〰
U+303D  〽

Answered By – rocka2q

Answer Checked By – Mildred Charles (GoLangFix Admin)

Leave a Reply

Your email address will not be published.