CODERWALL . COM {}

Analyzed Page
Matching Content Categories
CMS
Monthly Traffic Estimate
How Does Coderwall.com Make Money
Keywords
Topics
Questions
Schema
Social Networks
External Links
Analytics And Tracking
Hosting Providers
CDN Services

We are analyzing https://coderwall.com/p/k7zvyg/dealing-with-unicode-in-go.

Title:
Dealing with Unicode in Go (Example)
Description:
A protip by hermanschaaf about unicode, utf8, golang, and go.
Website Age:
14 years and 2 months (reg. 2011-04-19).

Matching Content Categories {📚}

Games
Telecommunications
Shopping

Content Management System {📝}

What CMS is coderwall.com built with?

Custom-built

No common CMS systems were detected on Coderwall.com, but we identified it was custom coded using Ruby on Rails (Ruby).

Traffic Estimate {📈}

What is the average monthly size of coderwall.com audience?

🚄 Respectable Traffic: 10k - 20k visitors per month

Based on our best estimate, this website will receive around 16,152 visitors per month in the current month.

check SE Ranking
check Ahrefs
check Similarweb
check Ubersuggest
check Semrush

How Does Coderwall.com Make Money? {💸}

We can't see how the site brings in money.

Not every website is profit-driven; some are created to spread information or serve as an online presence. Websites can be made for many reasons. This could be one of them. Coderwall.com has a secret sauce for making money, but we can't detect it yet.

Keywords {🔍}

unicode, characters, string, utf, strings, ascii, package, python, javascript, hermanschaaf, length, range, year, ago, coderwall, golang, working, byte, doesnt, code, character, rune, ruby, jobs, encoding, chinese, hear, func, comparecharsword, comparecharshello, falsefalsetruefalse, tests, comparechars你好好好, unicodeutf, size, utfdecoderunes, nextr, related, cthom, youre, tip, post, job, privacy, terms, frontend, tools, ios, tips, sign,

Topics {✒️}

google privacy policy herman schaaf recommend thin abstraction layer span multiple runes coderwall community unicode/utf8 package front page underlying byte encoding normal ascii strings response cthom06 unicode/utf8 unicode package org/normalization putting unicode org/strings utf-8 package javascript 13 utf8 package unicode strings python 2 unexpectedly strolling extra 4 characters ascii characters juicy trap ascii values naive approach work perfectly found equal dig deep update notifications easy pitfall composed characters nice quote rob pike jobs post terms service apply byte array ascii strings chinese characters äç err range return comparing 好 write tests fresh tip awesome job func comparechars comparechars function []rune conversion code points

Questions {❓}

Err, okay, how did 世界 become äç?
Have a fresh tip?
Now, what if we were saying hello in Chinese?
So, ~~怎么办呢~~ what to do?
Wait, what just happened?
What happens if we get the length of this string?

Schema {🗺️}

Article:
      context:http://schema.org
      id:dealing-with-unicode-in-go
      author:
         type:Person
         name:Herman Schaaf
      dateModified:2020-09-25 14:49:27 UTC
      datePublished:2020-09-25 14:49:27 UTC
      headline:Dealing with Unicode in Go (Example)
      url:/p/k7zvyg/dealing-with-unicode-in-go
      commentCount:4
      keywords:unicode, utf8, golang, go
      articleBody:If Go is normally a walk in the park, working with Unicode in Go can be described as unexpectedly strolling through a minefield. Take, for example, this inconspicuous string from the front page: `"Hello, 世界"`. What happens if we get the length of this string? fmt.Println(len("Hello, 世界")) >>> 13 Wait, what just happened? Shouldn't the length be 9? Where did the extra 4 characters come from? Under the hood, Go is actually encoding the string as a byte array. While it doesn't make you distinguish between normal ASCII strings and Unicode strings like Python 2.x, it still doesn't abstract away the underlying byte encoding of the characters. Since Chinese characters take up three bytes while ASCII characters take only one, Go tells you the length is `1*7+3*2=13`. This can be really confusing, and a huge, juicy trap for those who only test their code with ASCII values. Take, for example: hello := "Hello, 世界" for i := range hello { fmt.Print(string(hello[i])) } >>> Hello, äç Err, okay, how did 世界 become äç? I can already hear you shouting, "but you can just use the second range return value!" Indeed you can! hello := "Hello, 世界" for _, c := range hello { fmt.Print(string(c)) } >>> Hello, 世界 Much better! Ah, but we can't always do it that way, can we? As a simple example, suppose we just want to compare a character with the next character in the string . A naive approach might do the following: func CompareChars(word string) { for i, c := range word { if i < len(word)-1 { fmt.Print(string(word[i+1]) == string(c), ",") } } } ... CompareChars("hello") >>> false,false,true,false, And with tests for only ASCII, it will work perfectly. Now, what if we were saying hello in Chinese? CompareChars("你好好好") >>> false,false,false,false, Oops. Of course, the characters will never be found equal, because we are comparing `好` with `\xE5`, the first byte of `好`. So, <s>怎么办呢</s> what to do? Luckily, if you dig deep enough, you will find that Go ships with the [`unicode/utf8`](http://code.google.com/p/go/source/browse/src/pkg/unicode/utf8) package. It doesn't offer much, but let's use this to go back to our first problem: finding the length of the "Hello, " string: import ( "fmt" "unicode/utf8" ) ... fmt.Println(utf8.RuneCountInString("Hello, 世界")) >>> 9 Great, that's the count we expected at first! Now, how about updating our `CompareChars` function so it works with Unicode? func CompareChars(word string) { s := []byte(word) for utf8.RuneCount(s) > 1 { r, size := utf8.DecodeRune(s) s = s[size:] nextR, size := utf8.DecodeRune(s) fmt.Print(r == nextR, ",") } } ... CompareChars("hello") >>> false,false,true,false, CompareChars("你好好好") >>> false,true,true, It worked! やった！ **The moral of the story**: Be very careful when working with Unicode in Go, *especially* when looping through strings. Most importantly, **always** write tests that contain both Unicode and ASCII strings, and use the built-in UTF-8 package where appropriate.
      description:A protip by hermanschaaf about unicode, utf8, golang, and go.
      publisher:
         type:Organization
         name:Coderwall
         logo:
            type:ImageObject
            url:/logo.png
         url:https://coderwall.com/
      image:/logo.png
      id:dealing-with-unicode-in-go
      id:dealing-with-unicode-in-go
      id:dealing-with-unicode-in-go
      id:dealing-with-unicode-in-go
Person:
      name:Herman Schaaf
      name:Corey Thomasson
      name:Herman Schaaf
      name:Herman Schaaf
      name:Egon Elbre
Organization:
      name:Coderwall
      logo:
         type:ImageObject
         url:/logo.png
      url:https://coderwall.com/
ImageObject:
      url:/logo.png
Comment:
      context:http://schema.org
      datePublished:2013-05-23 15:36:31 UTC
      text:When working with unicode you should be converting your strings to []rune. That's why the utf8 package is so sparse, most things are covered by []rune conversion and the unicode package.
      upvoteCount:3
      author:
         type:Person
         name:Corey Thomasson
      about:
         type:Article
         id:dealing-with-unicode-in-go
      context:http://schema.org
      datePublished:2013-05-23 15:53:06 UTC
      text:@cthom06 Yeah, you're right, even the utf8 package itself is like a very thin abstraction layer for using strings as []rune. Putting Unicode in strings and assuming the best is an easy pitfall though, one that took me, being new to Go, slightly by surprise!
      upvoteCount:0
      author:
         type:Person
         name:Herman Schaaf
      about:
         type:Article
         id:dealing-with-unicode-in-go
      context:http://schema.org
      datePublished:2013-05-24 05:44:51 UTC
      text:@unnali I wanted to leave the impression that you can never be too sure what you're going to get in a string, so I'm glad to hear it was unsettling :)
      upvoteCount:0
      author:
         type:Person
         name:Herman Schaaf
      about:
         type:Article
         id:dealing-with-unicode-in-go
      context:http://schema.org
      datePublished:2015-02-13 09:22:19 UTC
      text:Also the code is still wrong. Characters can span multiple runes (code points), basically there are also composed characters (http://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters). (also see http://blog.golang.org/normalization, http://blog.golang.org/strings) And a nice quote by Rob Pike - "In fact, the definition of "character" is ambiguous and it would be a mistake to try to resolve the ambiguity by defining that strings are made of characters."
      upvoteCount:0
      author:
         type:Person
         name:Egon Elbre
      about:
         type:Article
         id:dealing-with-unicode-in-go

Social Networks {👍}(2)

External Links {🔗}(7)

Analytics and Tracking {📊}

Google Analytics
Google Tag Manager
Google Universal Analytics

Emails and Hosting {✉️}

Mail Servers:

ASPMX.L.GOOGLE.com
ALT1.ASPMX.L.GOOGLE.com
ALT2.ASPMX.L.GOOGLE.com
ALT3.ASPMX.L.GOOGLE.com
ALT4.ASPMX.L.GOOGLE.com

Name Servers:

ns1.dnsimple.com
ns2.dnsimple.com
ns3.dnsimple.com
ns4.dnsimple.com

CDN Services {📦}

Jsdelivr

5.39s.