
CODERWALL . COM {
}
Title:
Dealing with Unicode in Go (Example)
Description:
A protip by hermanschaaf about unicode, utf8, golang, and go.
Website Age:
14 years and 2 months (reg. 2011-04-19).
Matching Content Categories {📚}
- Games
- Telecommunications
- Shopping
Content Management System {📝}
What CMS is coderwall.com built with?
Custom-built
No common CMS systems were detected on Coderwall.com, but we identified it was custom coded using Ruby on Rails (Ruby).
Traffic Estimate {📈}
What is the average monthly size of coderwall.com audience?
🚄 Respectable Traffic: 10k - 20k visitors per month
Based on our best estimate, this website will receive around 16,152 visitors per month in the current month.
check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush
How Does Coderwall.com Make Money? {💸}
We can't see how the site brings in money.
Not every website is profit-driven; some are created to spread information or serve as an online presence. Websites can be made for many reasons. This could be one of them. Coderwall.com has a secret sauce for making money, but we can't detect it yet.
Keywords {🔍}
unicode, characters, string, utf, strings, ascii, package, python, javascript, hermanschaaf, length, range, year, ago, coderwall, golang, working, byte, doesnt, code, character, rune, ruby, jobs, encoding, chinese, hear, func, comparecharsword, comparecharshello, falsefalsetruefalse, tests, comparechars你好好好, unicodeutf, size, utfdecoderunes, nextr, related, cthom, youre, tip, post, job, privacy, terms, frontend, tools, ios, tips, sign,
Topics {✒️}
google privacy policy herman schaaf recommend thin abstraction layer span multiple runes coderwall community unicode/utf8 package front page underlying byte encoding normal ascii strings response cthom06 unicode/utf8 unicode package org/normalization putting unicode org/strings utf-8 package javascript 13 utf8 package unicode strings python 2 unexpectedly strolling extra 4 characters ascii characters juicy trap ascii values naive approach work perfectly found equal dig deep update notifications easy pitfall composed characters nice quote rob pike jobs post terms service apply byte array ascii strings chinese characters äç err range return comparing 好 write tests fresh tip awesome job func comparechars comparechars function []rune conversion code points
Questions {❓}
- Err, okay, how did 世界 become äç?
- Have a fresh tip?
- Now, what if we were saying hello in Chinese?
- So,
怎么办呢what to do? - Wait, what just happened?
- What happens if we get the length of this string?
Schema {🗺️}
Article:
context:http://schema.org
id:dealing-with-unicode-in-go
author:
type:Person
name:Herman Schaaf
dateModified:2020-09-25 14:49:27 UTC
datePublished:2020-09-25 14:49:27 UTC
headline:Dealing with Unicode in Go (Example)
url:/p/k7zvyg/dealing-with-unicode-in-go
commentCount:4
keywords:unicode, utf8, golang, go
articleBody:If Go is normally a walk in the park, working with Unicode in Go can be described as unexpectedly strolling through a minefield. Take, for example, this inconspicuous string from the front page: `"Hello, 世界"`. What happens if we get the length of this string?
fmt.Println(len("Hello, 世界"))
>>> 13
Wait, what just happened? Shouldn't the length be 9? Where did the extra 4 characters come from?
Under the hood, Go is actually encoding the string as a byte array. While it doesn't make you distinguish between normal ASCII strings and Unicode strings like Python 2.x, it still doesn't abstract away the underlying byte encoding of the characters. Since Chinese characters take up three bytes while ASCII characters take only one, Go tells you the length is `1*7+3*2=13`. This can be really confusing, and a huge, juicy trap for those who only test their code with ASCII values. Take, for example:
hello := "Hello, 世界"
for i := range hello {
fmt.Print(string(hello[i]))
}
>>> Hello, äç
Err, okay, how did 世界 become äç? I can already hear you shouting, "but you can just use the second range return value!" Indeed you can!
hello := "Hello, 世界"
for _, c := range hello {
fmt.Print(string(c))
}
>>> Hello, 世界
Much better! Ah, but we can't always do it that way, can we? As a simple example, suppose we just want to compare a character with the next character in the string . A naive approach might do the following:
func CompareChars(word string) {
for i, c := range word {
if i < len(word)-1 {
fmt.Print(string(word[i+1]) == string(c), ",")
}
}
}
...
CompareChars("hello")
>>> false,false,true,false,
And with tests for only ASCII, it will work perfectly. Now, what if we were saying hello in Chinese?
CompareChars("你好好好")
>>> false,false,false,false,
Oops. Of course, the characters will never be found equal, because we are comparing `好` with `\xE5`, the first byte of `好`.
So, <s>怎么办呢</s> what to do? Luckily, if you dig deep enough, you will find that Go ships with the [`unicode/utf8`](http://code.google.com/p/go/source/browse/src/pkg/unicode/utf8) package. It doesn't offer much, but let's use this to go back to our first problem: finding the length of the "Hello, " string:
import (
"fmt"
"unicode/utf8"
)
...
fmt.Println(utf8.RuneCountInString("Hello, 世界"))
>>> 9
Great, that's the count we expected at first! Now, how about updating our `CompareChars` function so it works with Unicode?
func CompareChars(word string) {
s := []byte(word)
for utf8.RuneCount(s) > 1 {
r, size := utf8.DecodeRune(s)
s = s[size:]
nextR, size := utf8.DecodeRune(s)
fmt.Print(r == nextR, ",")
}
}
...
CompareChars("hello")
>>> false,false,true,false,
CompareChars("你好好好")
>>> false,true,true,
It worked! やった!
**The moral of the story**:
Be very careful when working with Unicode in Go, *especially* when looping through strings. Most importantly, **always** write tests that contain both Unicode and ASCII strings, and use the built-in UTF-8 package where appropriate.
description:A protip by hermanschaaf about unicode, utf8, golang, and go.
publisher:
type:Organization
name:Coderwall
logo:
type:ImageObject
url:/logo.png
url:https://coderwall.com/
image:/logo.png
id:dealing-with-unicode-in-go
id:dealing-with-unicode-in-go
id:dealing-with-unicode-in-go
id:dealing-with-unicode-in-go
Person:
name:Herman Schaaf
name:Corey Thomasson
name:Herman Schaaf
name:Herman Schaaf
name:Egon Elbre
Organization:
name:Coderwall
logo:
type:ImageObject
url:/logo.png
url:https://coderwall.com/
ImageObject:
url:/logo.png
Comment:
context:http://schema.org
datePublished:2013-05-23 15:36:31 UTC
text:When working with unicode you should be converting your strings to []rune. That's why the utf8 package is so sparse, most things are covered by []rune conversion and the unicode package.
upvoteCount:3
author:
type:Person
name:Corey Thomasson
about:
type:Article
id:dealing-with-unicode-in-go
context:http://schema.org
datePublished:2013-05-23 15:53:06 UTC
text:@cthom06 Yeah, you're right, even the utf8 package itself is like a very thin abstraction layer for using strings as []rune. Putting Unicode in strings and assuming the best is an easy pitfall though, one that took me, being new to Go, slightly by surprise!
upvoteCount:0
author:
type:Person
name:Herman Schaaf
about:
type:Article
id:dealing-with-unicode-in-go
context:http://schema.org
datePublished:2013-05-24 05:44:51 UTC
text:@unnali I wanted to leave the impression that you can never be too sure what you're going to get in a string, so I'm glad to hear it was unsettling :)
upvoteCount:0
author:
type:Person
name:Herman Schaaf
about:
type:Article
id:dealing-with-unicode-in-go
context:http://schema.org
datePublished:2015-02-13 09:22:19 UTC
text:Also the code is still wrong. Characters can span multiple runes (code points), basically there are also composed characters (http://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters).
(also see http://blog.golang.org/normalization, http://blog.golang.org/strings)
And a nice quote by Rob Pike -
"In fact, the definition of "character" is ambiguous and it would be a mistake to try to resolve the ambiguity by defining that strings are made of characters."
upvoteCount:0
author:
type:Person
name:Egon Elbre
about:
type:Article
id:dealing-with-unicode-in-go
Social Networks {👍}(2)
External Links {🔗}(7)
- How much revenue does http://code.google.com/p/go/source/browse/src/pkg/unicode/utf8 bring in?
- Discover the revenue of http://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters
- What is the monthly revenue of http://blog.golang.org/normalization?
- How much does http://blog.golang.org/strings make?
- What are the total earnings of https://github.com/coderwall/coderwall-next?
- What's the financial intake of https://policies.google.com/privacy?
- How much money does https://policies.google.com/terms make?
Analytics and Tracking {📊}
- Google Analytics
- Google Tag Manager
- Google Universal Analytics
Emails and Hosting {✉️}
Mail Servers:
- ASPMX.L.GOOGLE.com
- ALT1.ASPMX.L.GOOGLE.com
- ALT2.ASPMX.L.GOOGLE.com
- ALT3.ASPMX.L.GOOGLE.com
- ALT4.ASPMX.L.GOOGLE.com
Name Servers:
- ns1.dnsimple.com
- ns2.dnsimple.com
- ns3.dnsimple.com
- ns4.dnsimple.com
CDN Services {📦}
- Jsdelivr