I Need Some SUBSTR() Help, Please!

Last week, I presented a session titled, “Unicode Made Easier with SQLite” at the Southwest Fox 2014 conference.

At the end of the session, an attendee asked about manipulating strings, specifically using SUBSTR() on UTF-8 encoded strings.

Last night, I played around with creating my own SUBSTR() function to deal with UTF-8 encoded strings, and I think I may have been successful, but I’m not sure.

If there are any readers that can tell me if the image below looks correct for a SUBSTR(2,3) on the Russian phrase in the textbox, I would appreciate it very much.

On the other hand, if you think I’m going down a rabbit hole that doesn’t need to be traveled, please let me know.  I would hate to think I’ve wasted time “substringing” a phrase for which there is really no need for “substringing.”

Thank you in advance for any help!

Image of substr example

SUBSTR() on UTF-8 data

  • Sergey Berezniker

    Yes it looks right. If I had to do it, I would start with one of C++ Unicode string copy functions

    • Kevin Ragsdale

      Thank you Sergey!

  • James Frye

    Hi Kevin,

    That is correct!

    • Kevin Ragsdale

      Thank you James!