6 months training

The power of a raw Lua interpreter to manipulate strings is quite limited. A program can create string literals and concatenate them. But it cannot extract a substring, check its size, or examine its contents. The full power to manipulate strings in Lua comes from its string library.

Some functions in the string library are quite simple: string.len(s) returns the length of a string s. string.rep(s, n) returns the string srepeated n times. You can create a string with 1M bytes (for tests, for instance) with string.rep("a", 2^20). string.lower(s) returns a copy of s with the upper-case letters converted to lower case; all other characters in the string are not changed (string.upper converts to upper case). As a typical use, if you want to sort an array of strings regardless of case, you may write something like

    table.sort(a, function (a, b)
      return string.lower(a) < string.lower(b)
    end)

Both string.upper and string.lower follow the current locale. Therefore, if you work with the European Latin-1 locale, the expression

    string.upper("ação")

results in "AÇÃO".

The call string.sub(s,i,j) extracts a piece of the string s, from the i-th to the j-th character inclusive. In Lua, the first character of a string has index 1. You can also use negative indices, which count from the end of the string: The index -1 refers to the last character in a string, -2 to the previous one, and so on. Therefore, the call string.sub(s, 1, j) gets a prefix of the string s with length j; string.sub(s, j, -1) gets a suffix of the string, starting at the j-th character (if you do not provide a third argument, it defaults to -1, so we could write the last call as string.sub(s, j)); and string.sub(s, 2, -2) returns a copy of the string s with the first and last characters removed:

    s = "[in brackets]"
    print(string.sub(s, 2, -2))   -->  in brackets

Remember that strings in Lua are immutable. The string.sub function, like any other function in Lua, does not change the value of a string, but returns a new string. A common mistake is to write something like

    string.sub(s, 2, -2)

and to assume that the value of s will be modified. If you want to modify the value of a variable, you must assign the new value to the variable:

    s = string.sub(s, 2, -2)

The string.char and string.byte functions convert between characters and their internal numeric representations. The function string.char gets zero or more integers, converts each one to a character, and returns a string concatenating all those characters. The function string.byte(s, i) returns the internal numeric representation of the i-th character of the string s; the second argument is optional, so that a call string.byte(s) returns the internal numeric representation of the first (or single) character of s. In the following examples, we assume that characters are represented in ASCII:

    print(string.char(97))                    -->  a
    i = 99; print(string.char(i, i+1, i+2))   -->  cde
    print(string.byte("abc"))                 -->  97
    print(string.byte("abc", 2))              -->  98
    print(string.byte("abc", -1))             -->  99

In the last line, we used a negative index to access the last character of the string.

The function string.format is a powerful tool when formatting strings, typically for output. It returns a formatted version of its variable number of arguments following the description given by its first argument, the so-called format string. The format string has rules similar to those of the printf function of standard C: It is composed of regular text and directives, which control where and how each argument must be placed in the formatted string. A simple directive is the character `%´ plus a letter that tells how to format the argument: `d´ for a decimal number, `x´ for hexadecimal, `o´ for octal, `f´ for a floating-point number, `s´ for strings, plus other variants. Between the `%´ and the letter, a directive can include other options, which control the details of the format, such as the number of decimal digits of a floating-point number:

    print(string.format("pi = %.4f", PI))     --> pi = 3.1416
    d = 5; m = 11; y = 1990
    print(string.format("%02d/%02d/%04d", d, m, y))
      --> 05/11/1990
    tag, title = "h1", "a title"
    print(string.format("<%s>%s</%s>", tag, title, tag))
      --> <h1>a title</h1>

In the first example, the %.4f means a floating-point number with four digits after the decimal point. In the second example, the %02dmeans a decimal number (`d´), with at least two digits and zero padding; the directive %2d, without the zero, would use blanks for padding. For a complete description of those directives, see the Lua reference manual. Or, better yet, see a C manual, as Lua calls the standard C libraries to do the hard work here.

6 months training

Friday, 5 August 2016

No comments:

Post a Comment