Skip to main content

Unicode -- Uphill both ways (Ruby Programming pt. 7)

I found this cool article on Unicode (it gets UTF-16 wrong but that's ok). However I'm running into a large wall dealing with Unicode in my program. So I'll put it out there so a solution presents itself.

So far my program checks each line of the file to see if it's ASCII only text. If so it reverses it with Ruby's built-in reverse method.

If not what I want to do is to have it read each hex pair (or four-some) decide if it is below U+007F (inclusive) to treat it as plain ASCII and pass the character as one element to an array, if it's between U+0080 and U+FFFF then to take a two byte chunk and pass it as one element to an array. And finally if it is between U+010000 and U+10FFFF then to take a three byte chunk and pass it as one element to an array. Then to read the elements of the array First one In Last one Out (FILO), remove the end of line (/n) marker and put the elements into another array. Join that array add an end of line element and write it to the file.

So the first thing I need to do is find a way of reading the hexadecimal values of the characters. So after a lot of looking I found a hex editor plugin for Notepad++ and though it doesn't do exactly what I want I figure something out. The last character or the U+007F is 7F in the hex value of the file. Apparently Notepad++ hides the 00 of endian-ness. So that's the one I want to move as a one element to another array. And at least for now I can assume that every thing above 80 is a two-byte element, till I figure a way of reading the three-byte ones. It won't be perfect but if it works it will be a step.
Now to try it out.

Comments

Popular posts from this blog

What Medieval Economics can teach us about tariffs.

As a teen, I used to play Dungeons and Dragons (D&D) with my friends. This started an interest in the medieval period that led to me taking a medieval history class in college just to understand the period more. Over the years I've also read great books like " Dungeon, Fire and Sword " about the crusades (I recommend the book) and yet with all that knowledge it wasn't until recently that it occurred to me I had a completely wrong understanding of economics in the Medieval Period. "Viking helmets, sword and footwear" by eltpics is licensed under CC BY-NC 2.0 In my D&D games, players who are adventures battling monsters and creatures would need equipment and on the trips to town, they'd get resupplied with their adventuring necessities. I'd run these moments referencing my imagination of what it must have been and fantasy books I'd read. There be an inn with a raucous bar, a gruffly black-smith, if a city also a weapon and armor sm...

Testing with Cucumber, Sinatra and Capybara

Everything you need to know There are many elements you need to simultaneously learn to do effective testing of your code. Because some of these elements are very simple a lot of explanations just jump over what you need to know and give them up as obvious. Let’s start with a list of the things you need to learn: Gherkin (the language of Cucumber) ——> super easy Capybara (the DSL that controls the browser tests) Rspec (the DSL in which the actual pass/fail tests are written.) None of these are hard. But having to learn all at the same time can seem daunting. But it’s not! It’s easy peasy but takes time. :-/ It took me three days to get a handle on this. And I hope by reading this you’ll get a handle on it much much quicker. Let’s start with Cucumber first. Cucumber Five things you need to know about Cucumber: Cucumber tests are located on a features folder that have plain text files with a .feature extension and written in Gherkin . The .feature files contain t...

Best Tech Books

Best Tech Books for Programming Language Learning I'm a bit of a polyglot no only in human languages (English, Spanish, Japanese) but also with programming languages. I found that the best way to get a deep understanding of the programming field, I needed to be broad. I got introduced to Bruce Tate's 7 languages in 7 weeks series right when I was starting to learn Ruby and found the cross-language trends to be very useful in knowing what to learn for the future.  So here is a list of Programming Books that I found good for learning a language. These are the must have books in my opinion to "get" or "grok" the language. Most of these books I have not finished but they're so good I can recommend them for other language learners and polyglots. All these books should accelerate your learning dramatically.  Poignant Guide to Ruby Ruby: POODR and _why's Poignant Guide to Ruby .  Okay, so _why's Poignant Guide to Ruby is the reason I fell in love with ...