How to use unsafe get a byte slice from a string without memory copy

Issue

I have read about “https://github.com/golang/go/issues/25484” about no-copy conversion from []byte to string.

I am wondering if there is a way to convert a string to a byte slice without memory copy?

I am writing a program which processes terra-bytes data, if every string is copied twice in memory, it will slow down the progress. And I do not care about mutable/unsafe, only internal usage, I just need the speed as fast as possible.

Example:

var s string
// some processing on s, for some reasons, I must use string here
// ...
// then output to a writer
gzipWriter.Write([]byte(s))  // !!! Here I want to avoid the memory copy, no WriteString

So the question is: is there a way to prevent from the memory copying? I know maybe I need the unsafe package, but I do not know how. I have searched a while, no answer till now, neither the SO showed related answers works.

Solution

Getting the content of a string as a []byte without copying in general is only possible using unsafe, because strings in Go are immutable, and without a copy it would be possible to modify the contents of the string (by changing the elements of the byte slice).

So using unsafe, this is how it could look like (corrected, working solution):

func unsafeGetBytes(s string) []byte {
    return (*[0x7fff0000]byte)(unsafe.Pointer(
        (*reflect.StringHeader)(unsafe.Pointer(&s)).Data),
    )[:len(s):len(s)]
}

This solution is from Ian Lance Taylor.

Original, wrong solution was:

func unsafeGetBytesWRONG(s string) []byte {
    return *(*[]byte)(unsafe.Pointer(&s)) // WRONG!!!!
}

See Nuno Cruces’s answer below for reasoning.

Testing it:

s := "hi"
data := unsafeGetBytes(s)
fmt.Println(data, string(data))

data = unsafeGetBytes("gopher")
fmt.Println(data, string(data))

Output (try it on the Go Playground):

[104 105] hi
[103 111 112 104 101 114] gopher

BUT: You wrote you want this because you need performance. You also mentioned you want to compress the data. Please know that compressing data (using gzip) requires a lot more computation than just copying a few bytes! You will not see any noticeable performance gain by using this!

Instead when you want to write strings to an io.Writer, it’s recommended to do it via io.WriteString() function which if possible will do so without making a copy of the string (by checking and calling WriteString() method which if exists is most likely does it better than copying the string). For details, see What's the difference between ResponseWriter.Write and io.WriteString?

There are also ways to access the contents of a string without converting it to []byte, such as indexing, or using a loop where the compiler optimizes away the copy:

s := "something"
for i, v := range []byte(s) { // Copying s is optimized away
    // ...
}

Also see related questions:

golang: []byte(string) vs []byte(*string)

What are the possible consequences of using unsafe conversion from []byte to string in go?

What is the difference between the string and []byte in Go?

Does conversion between alias types in Go create copies?

How does type conversion internally work? What is the memory utilization for the same?

Answered By – icza

Answer Checked By – Clifford M. (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.