newLISP你也行 --- 字符串

2019独角兽企业重金招聘Python工程师标准>>>

#############################################################################
# Name:         newLISP你也行 --- 流
# Author:       黄登(winger)
# Project:      http://code.google.com/p/newlisp-you-can-do
# Gtalk:        free.winger@gmail.com
# Gtalk-Group: zen0code@appspot.com
# Blog:         http://my.opera.com/freewinger/blog/
# QQ-Group:     31138659
# 大道至简 -- newLISP
#
# Copyright 2012 黄登(winger) All rights reserved.
# Permission is granted to copy, distribute and/or
# modify this document under the terms of the GNU Free Documentation License,
# Version 1.2 or any later version published by the Free Software Foundation;
# with no Invariant Sections, no Front-Cover Texts,and no Back-Cover Texts.
#############################################################################

        自由固不是钱所买到的，但能够为钱而卖掉。        --- 鲁迅

    现实中, 在人和计算机交互中, 涉及到最多的就是字符串了.
    以至于大部分的数据输入都被当做字符串来处理.
    如果说列表是天地, 那字符串就一定是这天地间的横流.

一. newLISP中的字符串
    Strings in newLISP code

    newLISP 处理字符串的能力无疑是强大的, 各种方便的刀具都给你备齐了, 每一把都
是居家宅男, 杀码越货, 的必备神器.

    广告完毕, 言归正传.~_~~

    在nl里有三种方法可以表示字符串:

    用双引号围起来 ;优点按键更少, 而且转义字符有效, 比如"\n"
    (set 's "this is a string")

    用花括号围起来 ;优点过滤一切转义字符
    (set 's {this is a string})

    用专门的标识码围起来 ;除了上面的优点外,他还可以构造大于2048字节的字符串
    (set 's [text]this is a string[/text])

    第一和第二中方法构建的字符串不能超过 2048 个字节.
    很多人会觉得既然有了第二种, 为什么还要有第一种?
    让我们测试下下面的代码

> {\{}

ERR: string token too long : "\\{}"

> "\""
"\""

    看到没, 花括号的好处就是过滤一切的转义字符, 转义字符到了里面没有任何作用.
如果你要print 一个字符串:

> (print {\n road to freedom})
\n road to freedom"\\n road to freedom"
> (print "\n road to freedom")

road to freedom"\n road to freedom"

    花括号内内的转义字符没效了, 根本没换行. 这三种方法就第一种方法, 可以在内部
使用自己的TAG 双引号.

    第二种方法, 花括号, 这种方法我是非常鼓励使用的, 为什么, 方便啊, 不用在转义
字符前加个反斜杠了, 在构造正则表达式的时候尤其好用.

> (println "\t45")
        45
"\t45"
> (println "\\t45")
\t45
"\\t45"
> (println {\t45})
\t45
"\\t45"

> (regex "\\d" "a9b6c4")
("9" 1 1)

> (regex {\d} "a9b6c4")
("9" 1 1)

    字符串通常支持以下几种转义字符:

character   description
\"          for a double quote inside a quoted string
\n          for a line-feed character (ASCII 10)
\r          for a return character (ASCII 13)
\t          for a TAB character (ASCII 9)
\nnn        for a three-digit ASCII number (nnn format between 000 and 255)
\xnn        for a two-digit-hex ASCII number (xnn format between x00 and xff)

(set 's "this is a string \n with two lines")
(println s)

this is a string
with two lines

(println "\110\101\119\076\073\083\080") ; 十进制 ASCII
newLISP

(println "\x6e\x65\x77\x4c\x49\x53\x50") ; 十六进制 ASCII
newLISP

    如果要你反过来把字符串写成上面的各种数字字符串, 该怎么呢?
    提示: 用 format 和 unpack .

    第三种[text] [\text] 通常用来处理超长的字符串数据(大于 2048 字节), 比如web
页面. nL 在传递长字符串的时候, 也会自动使用这种格式.

(set 'novel (read-file {my-latest-novel.txt}))
;->
[text]
It was a dark and "stormy" night...
...
The End.
[/text]

    使用 length 可以得到字符串的长度:

(length novel)
;-> 575196

    newLISP 可以高效的处理数百万的字符串.
    如果要统计unicode 字符串的长度, 必须使用utf8 版本的 newLISP:

(utf8len (char 955))
;-> 1
(length (char 955))
;-> 2
> (utf8len "个")
4
> (length "个")
2

    cmd.exe 在处理非ascii 字符的时候会产生很多问题, 几乎无法解决, 但是非Win32
的 console 没这个问题.

二. 构造字符串
    Making strings

    有N种方法构造字符串. 到处都是字符串. 遍地都是字符串...
    如果想一个一个字符的构造的话可以用 char :

(char 33)
;-> "!"

> (char "a")
97

> (char 0x61)
"a"

> (char 97)
"a"

    char 只能处理一个字符, 他可以将字符转换成数字, 也可以将数字转换成字符.

(join (map char (sequence (char "a") (char "z"))))
;-> "abcdefghijklmnopqrstuvwxyz"

    char 获得 "a" 和 "z" ascii码, 然后用sequence 产生一个数字序列, 接着用map
映射 char 函数到每个数字, 产生数字相对应的字符. 最后join 将整个列表合成一个字
符串.

    我们也可以给 join 传递一个参数, 做分隔符.

(join (map char (sequence (char "a") (char "z"))) "-")
;-> "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

    和 join 类似 append 也可以连接字符串. (大部分的列表函数也可用于字符串)

(append "con" "cat" "e" "nation")
;-> "concatenation"

    构造列表的时候我们用list , 构造字符串我们用string .
    string 可以将各种参数组合成, 一个字符串.

(define x 42)
(string {the value of } 'x { is } x)
;-> "the value of x is 42"

    更精细的字符串输出可以使用format , 稍后就会见到.
    dup 可以复制字符串:

> (dup "帅锅" 5)
"帅锅帅锅帅锅帅锅帅锅"

    date 会产生一个包含当前时间信息的字符串.

> (date)
"Mon May 14 15:50:34 2012"

> (date 1234567890)
"Sat Feb 14 07:31:30 2009"

三. 字符串手术
    String surgery

    这里不知道怎么翻译鸟, 手术啊. 听起来很恐怖. 其实就是永久性改变.

-     很多函数都可以操作字符串, 部分是具有破坏性的(destructive 这些函数在手册
里, 都有一个 ! 标志).

(set 't "a hypothetical one-dimensional subatomic particle")
(reverse t)
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"
t
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"

    之前已经说过要用这些函数又不想破坏原来的数据, 就要用 copy.

(reverse (copy t))
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"
t
;-> "a hypothetical one-dimensional subatomic particle"

    上面的reverse 永久性的改变了 t. 但是下面的大小写转换函数, 却不会改变原字符
串.

(set 't "a hypothetical one-dimensional subatomic particle")
(upper-case t)
;-> "A HYPOTHETICAL ONE-DIMENSIONAL SUBATOMIC PARTICLE"
(lower-case t)
;-> "a hypothetical one-dimensional subatomic particle"
(title-case t)
;-> "A hypothetical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

四. 子串
    Substrings

    如果需要抽取字符串中的一部分可以用以下的方法:

(set 't "a hypothetical one-dimensional subatomic particle")
(first t)
;-> "a"
(rest t)
;-> " hypothetical one-dimensional subatomic particle"
(last t)
;-> "e"
(t 2)
;-> "h"

    你会发现这和上一章介绍的列表操作好像. 在nL里头大部分的列表操作函数, 也同样
可以操作字符串. 其中就包括各种选取函数.

1: 字符串分片
    String slices

    slice 可以将从一个现存的字符串中, 分割出一个新的字符串.

(set 't "a hypothetical one-dimensional subatomic particle")
(slice t 15 13) ;从第15个位置开始, 提取出出13个字符
;-> "one-dimension"
(slice t -8 8) ;从倒数第8个位置开始, 提取出8个字符
;-> "particle"
(slice t 2 -9) ;从第2个位置开始, 提取到倒数第9个字符为止(第9个字符不算)
;-> "hypothetical one-dimensional subatomic"
(slice "schwarzwalderkirschtorte" 19 -1) ;同上, 最后一个字符不取
;-> "tort"

    当然, 字符串也可以用隐式操作.

(15 13 t)
;-> "one-dimension"
(0 14 t)
;-> "a hypothetical"

    上面提取的字符串都是连续的. 如果要抽取出分散的字符. 就得用 select :

(set 't "a hypothetical one-dimensional subatomic particle")
(select t 3 5 24 48 21 10 44 8)
;-> "yosemite"
(select t (sequence 1 49 12)) ; 从第1个字符开始, 每隔12个提取出一个字符
;-> " lime"

> (help select)
syntax: (select <string> <list-selection>)
syntax: (select <string> [<int-index_i> ... ])

     <list-selection> 列表中包含了要提取的字符的位置.

2: 改变字符串的首位
    Changing the ends of strings

    chop 和 trim 可以给字符串做收尾切除术, 他们都具破坏性.
    切切切...

    chop 只能切除一个指定位置的字符...

(chop t) ; 默认是最后一个字符
;-> "a hypothetical one-dimensional subatomic particl"
(chop t 9) ; 切除第9个字符
;-> "a hypothetical one-dimensional subatomic"

    trim 修剪掉存在于字符串头尾的指定字符.

(set 's " centred ")
(trim s) ; defaults to removing spaces
;-> "centred"

(set 's "------centred------")
(trim s "-")
;-> "centred"

(set 's "------centred********")
(trim s "-" "*") ;可以分别指定需要修剪的头和尾 "字符"
;-> "centred"

3: push 和 pop 字符串
    push and pop work on strings too

    push 可以将元素压入指定字符串的指定位置. pop 相反.
    如果没有指定位置, 默认为字符串的第一个位置.

(set 't "some ")
(push "this is " t)
(push "text " t -1)
;-> t is now "this is some text"

    push 和 pop 都返回压入或者弹出的元素, 而不是目标字符串. 这样操作大的字符串
时, 就会更快. 否则你就得用slice 屏蔽输出了.

>(help pop)
syntax: (pop <str> [<int-index> [<int-length>]])

    可以指定pop字符的数量, [<int-length>] .

(set 'version-string (string (sys-info -2)))
; eg: version-string is "10402"
(set 'dev-version (pop version-string -2 2)) ; 总是两个数字
; version-string is now "02"
(set 'point-version (pop version-string -1)) ; 总是一个数字
; version-string is now "4"
(set 'version version-string) ; 一位或者两位 99?
(println version "." point-version "." dev-version " on " ostype)
10.4.02 on Win32
"Win32"

    ostype 返回操作系统类型.

五. 修改字符串
    Modifying strings

    有两种方法修改字符串, 一种, 指定具体的位置. 第二种指定特定的内容.

1: 通过索引修改字符串
    Using index numbers in strings

    好久以前是有nth-set 和 set-nth 的, 不过鉴于各种 set 和被 set , 其操作方法
和返回值的复杂性. 在现今的版本中, 他们都已经消失不见了. 不过我们可以使用隐式索
引, 操作访问指定位置的元素.

> (set 'str "thinking newLISP !")
"thinking newLISP !"
> (setf (str 0) "I t")
"I T"
> str
"I Thinking newLISP !"

2: 改变字符串的子串
    Changing substrings

    很多时候, 你无法确切的知道, 需要操作的字符的索引, 或者找出来的代价太大.\
    这时候就可以用replace 替换所有符合自己要求的字符串部分...

> (help replace)
syntax: (replace <str-key> <str-data> <exp-replacement>)
syntax: (replace <str-pattern> <str-data> <exp-replacement> <int-regex-option>)

(replace old-string source-string replacement)
So:
(set't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" t "theor") ;将字符串中所有的hypoth替换成theor
;-> "a theoretical one-dimensional subatomic particle"

replace 是破坏性函数, 如果你不想改变原来的字符串, 可以使用copy 或者 string :

(set't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" (string t) "theor")
;-> "a theoretical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

3: 使用正则表达式替换字符串内容
    Regular expressions

    如果你翻阅过手册, 会发现很多语法里都会加上一个可选参数, <int-regex-option>
. 这个参数就是正则表达式数字选项. 具体的数字意义, 可以在手册中搜索 PCRE name .
最常用的是0 (大小写不敏感) 好 1 (大小写敏感).

    nL使用的是Perl-compatible Regular Expressions (PCRE), Perl兼容的正则表达
式. 除了replace 外, directory, find, find-all, parse, search starts-with,
ends-with, 都接受正则表达式.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {h.*?l(?# h followed by l but not too greedy)} t {} 0)
;-> "a one-dimensional subatomic particle"

    在构建正则表达式的时候, 你可以选用双引号, 或者花括号, 两者的区别之前已经讲
过了. 个人还是推荐花括号...

(set'str "\s")
(replace str "this is a phrase" "|" 0) ; 并没有搜索替换 \s (空白符)
;-> thi| i| a phra|e ; 只替换了字符 s

(set'str "\\s")
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase ; 成功替换!

(set'str {\s})
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase ; better!

六: 系统变量: $0, $1 ...
    System variables: $0, $1 ...

    凡是使用 regex 的函数, 都会将匹配的结果绑定到系统变量: $0 $1 ... $15 , 可
以直接使用他们, 也可以使用$ 函数来引用他们.
    如果你是正则表达式初学者, 建议搜索pcre 教程. 下面的代码看的迷糊的不用建议.
还有手册, 还有code-pattern, 再不济还有"狗狗" , 通往nL的路不止一条.
    我的观点一向是够用就好, 所以如果看的不太懂, 可以跳下去. 等你用多了, 自然就
会了. 业精于勤荒于嬉.

- (set 'quotation {"I cannot explain." She spoke in a low, eager voice,
with a curious lisp in her utterance. "But for God's sake do what I ask you. Go
back
and never set foot upon the moor again."})

- (replace {(.*?),.*?curious\s*(l.*p\W)(.*?)(moor)(.*)}
quotation
(println { $1 } $1 { $2 } $2 { $3 } $3 { $4 } $4 { $5 } $5)
4) ;出于格式的问题上面的字符串多了\n换行, 所以我用4 设置了 PCRE_DOTALL
   ;这样 . 也代表了换行符

$1 "I cannot explain." She spoke in a low $2 lisp $3 in her utterance. "But f
r God's sake do what I ask you. Go
back
and never set foot upon the $4 moor $5 again."

    上面每一个小括号内的匹配值, 都被绑定到了系统变量, 从$1 到$5 , 而$0 代表符
合整个正则表达式的字符串部分. 拗口吧, 蛋疼的看代码去.

(set 'str "http://newlisp.org:80")
(find "http://(.*):(.*)" str 0) → 0

$0 → "http://newlisp.org:80"
$1 → "newlisp.org"
$2 → "80"

1. 替换部分的表达式
    The replacement expression

> (help replace)
syntax: (replace <str-key> <str-data> <exp-replacement>)
syntax: (replace <str-pattern> <str-data> <exp-replacement> <int-regex-option>)

    <exp-replacement>就是替换部分, 你找到的任何符合要求的数据, 都可以用这里的
表达式值, 替换. 整个表达式没有限制, 设置是可以没意义的操作.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {t[h]|t[aeiou]} t (println $0) 0)
th
ti
to
ti
t
;-> "a hypothetical one-dimensional subatomic particle"

    整个replace 表达式的目的是, 将字符串里, 以t开头, h或者任何元音字母结尾的字
符打印出来. <exp-replacement> 就是 (println $0) , 他完成了两个工作, 1. 打印出
匹配的单词, 也有人叫这"副作用". 第二个利用表达式的返回值$0 , 替换远字符串中匹
配的值, 而这两个值是一样的, 所以原字符串内容看起来没有任何改变.

(replace "a|e|c" "This is a sentence" (upper-case $0) 0)
;-> "This is A sEntEnCE"

    下面的代码使用了更复杂的<exp-replacement>.

(set 't "a hypothetical one-dimensional subatomic particle")
(set 'counter 0)
- (replace "o" t
- (begin
(inc 'counter)
(println {replacing "} $0 {" number } counter)
(string counter)) ; 替换的部分必须是字符串. 这个值是<exp-replacement>的返回值
0)
replacing "o" number 1
replacing "o" number 2
replacing "o" number 3
replacing "o" number 4
"a hyp1thetical 2ne-dimensi3nal subat4mic particle"

    begin 将多个表达式组装成一个表达式, 依次执行, 最后一个表达式, 作为这个表达
式组的返回值.
    下面让我们看一个replace 的实际应用.
    假设有一个文本文件, "zhuzhu.txt"里面的内容如下:

1 a = 15
2 another_variable = "strings"
4 x2 = "another string"
5 c = 25
3x=9

    现在我们想将他改成如下形式, 让他看起来漂亮点.

10 a                   = 15
20 another_variable    = "strings"
30 x2                  = "another string"
40 c                   = 25
50 x                   = 9

    将下面的代码保持成ft.lsp . 然后执行 newlisp ft.lsp zhuzhu.txt

(set 'file (open ((main-args) 2) "read"))
;(set 'file (open "ni.txt" "read"))
(set 'counter 0)
- (while (read-line file)
-     (set 'temp
-         (replace {^(\d*)(\s*)(.*)} ; 改变开始的数字
            (current-line)
            (string (inc 'counter 10) " " $3 )
            0))
- (println
-     (replace {(\S*)(\s*)(=)(\s*)(.*)} ; 找出有用的数据
        temp
        (string $1 (dup " " (- 20 (length $1))) $3 " " $5)
    0)))

    while 循环不断的将文件的每一行读入, 然后(current-line) 获取当前读入的行.
第一个replace 组装开始的数字, {^(\d*)(\s*)(.*)} 将源字符串分离成, 开始的数字,
接着的空白符, 和最后的内容. 接着用 (string (inc 'counter 10) " " $3 ) 将前两部
分剔除, 剩下第三部分和 counter 值组成新的字符串. counter 每处理一行, 就加 10 .
替换后的字符串赋值给临时变量temp.
    第二个replace , 将临时变量分离成4个部分 {(\S*)(\s*)(=)(\s*)(.*)}.
    \S 代表了除 \s 以外的任何字符.
    从中提取出$1 $3 $5 , 组成新的字符串,
    (string $1 (dup " " (- 20 (length $1))) $3 " " $5)
    为了对齐, 我们将$1 和 $3 (也就是等号) , 之间的距离规定成20 , 如果$1 短于
20个字节则dup 出多余空格来补充.

    Regular expressions aren't very easy for the newcomer,
    but they're very powerful, particularly
    with newLISP's replace function, so they're worth learning.

    正则表达式也许对于初学者来说比较困难, 但是非常强大, 特别是配合上各种
newLISP函数后, 可以大大的提高效率. 平时还是该多练习下.

七. 测试和比较字符串
    Testing and comparing strings

    有各种各样的测试函数可以用到字符串上. 这些比较操作符会依序相互比较字符串的
每一个部分.

(> {Higgs Boson} {Higgs boson}) ; nil ;B 比 b 小
(> {Higgs Boson} {Higgs}) ; true
(< {dollar} {euro}) ; true
(> {newLISP} {LISP}) ; true
(= {fred} {Fred}) ; nil ; f 和 F 不一样
(= {fred} {fred}) ; true

    从第一个字符开始比较, 直到得出结果.
    比较多个字符串也不是问题. 介于newLISP 优秀的参数处理能力, 你不用再直接写迭
代了.

(< "a" "c" "d" "f" "h")
;-> true

    如果只提供一个参数呢?
    nL会为你提供默认值. 如果提供的是数字, 则假设和0 比较, 如果是字符串, 则假设
和"" 空字符串比较...

(> 1) ; true - assumes > 0
(> "fred") ; true - assumes > ""

    下面的函数可以非常方便的分析和提取字符串中的指定内容:
    member , regex , find-all , starts-with , ends-with .

(starts-with "newLISP" "new")
;-> true
(ends-with "newLISP" "LISP")
;-> true

    他们也可以使用正则表达式参数. (通常使用 0 和 1)

(starts-with {newLISP} {[a-z][aeiou](?\#lc followed by lc vowel)} 0)
;-> true
(ends-with {newLISP} {[aeiou][A-Z](?\# lc vowel followed by UCase)} 0)
;-> false

    0 代表了PCRE 里的, 大小写敏感, 1 则是不敏感.
    find , find-all , member , 和 regex 查找整个字符串.
    find 返回, 第一个符合要求的元素的位置.

(set 't "a hypothetical one-dimensional subatomic particle")
(find "atom" t)
;-> 34
(find "l" t)
;-> 13
(find "L" t)
;-> nil ; 大小写敏感

    member 判断一个字符串是否是另一个字符串的一部分, 如果是, 则返回子串, 以及
之后的所有字符.

(member "rest" "a good restaurant")
;-> "restaurant"

    find 和 member 都可以使用正则表达式选项.

- (set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})

(find "lisp" quotation) ; 没有正则
;-> 69 ; 位于第 69 位 , 即 l 的位置

(find {i} quotation 0) ; with regex
;-> 15 ; 位于第 15 位

(find {s} quotation 1) ; 大小写不敏感
;-> 20 ; 位于第 20 位

- (println "character "
(find {(l.*?p)} quotation 0) ": " $0) ; 查找一个字符l 后跟着字符p 的子串
;-> character 13: lain." She sp

    再次提醒, 在console 命令行下, 输入多行语句的时候, 先输入一个回城, 然后才能
把语句全粘贴上去, 或者在多行语句的首尾两行, 分别单独的写上[cmd]和[/cmd].

    find-all 的工作方式类似 find , 不过他不仅仅是返回第一个匹配子串, 而是以列
表的形式, 返回所有的匹配子串. 他操作字符串的时候默认使用正则表达式. 所以可以不
用显示的标注, 正则选项.

> (help find-all)
syntax: (find-all <str-regex-pattern> <str-text> [<exp> [<int-regex-option>]])

- (set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})

(find-all "[aeiou]{2,}" quotation $0) ; 两个或者更多的原音字母组成的子串
;-> ("ai" "ea" "oi" "iou" "ou" "oo" "oo" "ai")

    find-all 返回的是, 符合要求的内容. 如果还想得到他们的位置和长度, 就要使用
regex .
    regex 返回符合要求的每个子串的内容, 开始位置, 以及长度. 第一次看, 会觉得
稍显复杂.

- (set 'quotation
{She spoke in a low, eager voice, with a curious lisp in her utterance.})

(println (regex {(.*)(l.*)(l.*p)(.*)} quotation 0))
;-->
- ("She spoke in a low, eager voice, with a curious lisp in
her utterance." 0 70 "She spoke in a " 0 15 "low, eager
voice, with a curious " 15 33 "lisp" 48 4 " in her
utterance." 52 18)

    首先返回的就是符合整个正则表达式要求的字符串. 也是最长的, 从 0 开始长达
70 字节. 然后就是第一个第一个括号内匹配的内容, 从位置 0 开始 , 长 15 个字节.
第二个括号(分组)内的数据, 从第 15 位开始, 长 33 字节....

    这些匹配的分组全被放到系统变量里.

- (for (x 1 4)
(println {$} x ": " ($ x)))
$1: She spoke in a
$2: low, eager voice, with a curious
$3: lisp
$4: in her utterance.

八. 字符串转换成列表
    Strings to lists

    先让我们看看 "闻名遐迩" 的explode , 他可以将字符串按指定的大小炸成一段段
的子串, 然后以列表的形式返回所有子串.

(set 't "a hypothetical one-dimensional subatomic particle")
(explode t)

- :-> ("a" " " "h" "y" "p" "o" "t" "h" "e" "t" "i" "c" "a" "l"
" " "o" "n" "e" "-" "d" "i" "m" "e" "n" "s" "i" "o" "n" "a"
"l" " " "s" "u" "b" "a" "t" "o" "m" "i" "c" " " "p" "a" "r"
"t" "i" "c" "l" "e")

> (help explode)
syntax: (explode <str> [<int-chunk> [<bool>]])
syntax: (explode <list> [<int-chunk> [<bool>]])

(explode (replace " " t "") 5)
;-> ("ahypo" "theti" "calon" "e-dim" "ensio" "nalsu" "batom" "icpar"
"ticle")

    int-chunk 就是分块的大小, bool 决定是否要抛弃最后不满int-chunk 长度的子串.
    你有开天斧, 我有补天石.
    join 和 explode 做的刚好相反, 将一个全是字符串元素的列表组装成一个新的字符
串.

>(help join)
syntax: (join list-of-strings [str-joint [bool-trail-joint]])

set 'lst '("this" "is" "a" "sentence"))

(join lst " ") → "this is a sentence"

(join (map string (slice (now) 0 3)) "-") → "2012-5-16" ;将数字中

(join (explode "keep it together")) → "keep it together"

(join '("A" "B" "C") "-")         → "A-B-C"
(join '("A" "B" "C") "-" true)    → "A-B-C-"

    find-all 也可以分割字符串.

(find-all ".{3}" t) ; 默认使用正则表达式
characters
;-> ("a h" "ypo" "the" "tic" "al " "one" "-di" "men" "sio"
"nal" " su" "bat" "omi" "c p" "art" "icl")

九. 分析字符串
    Parsing strings

    接下来这个函数绝对会让你"痛哭流涕".
    如果你需要经常频繁的处理大范围的文本数据的时候. parse 绝对是你的至宝.
    他让你的数据统计分析, 不再痛苦. (nL内部还有很多专业的统计学函数)

> (help parse)
syntax: (parse <str-data> [<str-break> [<int-option>]])

    parse 根据<str-break> 来分割字符串. 字符串中的 <str-break> 会被吃掉. 剩下
判断, 作为一个个子串组成列表返回.

(parse t) ; 默认的分隔符为空格...
;-> ("a" "hypothetical" "one-dimensional" "subatomic" "particle")

    <str-break> 可以是单个的分割符 , 也可以是字符串.

(set 'pathname {/System/Library/Fonts/Courier.dfont})
(parse pathname {/})
;-> ("" "System" "Library" "Fonts" "Courier.dfont")

(set 't {spamspamspamspamspamspamspamspam})
;-> "spamspamspamspamspamspamspamspam"
(parse t {am}) ; break on "am"
;-> ("sp" "sp" "sp" "sp" "sp" "sp" "sp" "sp" "")

    我们可以用filter 将结果列表中的, 空格字符串, 过滤掉.

(filter (fn (s) (not (empty? s))) (parse t {/}))
;-> ("System" "Library" "Fonts" "Courier.dfont")

    过滤HTML-tag:

(set 'html (read-file "/Users/Sites/index.html"))
(println (parse html {<.*?>} 4)) ; option 4: dot matches newline

    nL同时提供了专门的XML分析工具: xml-parse . 后面会有专门一整章介绍.

    在我们没有明确指定的 <str-break> 的时候, nL 使用内部的分析规则. 这时候的算
法和指定后的算法也不一样.

    When no str-break is given, parse tokenizes according to newLISP's
    internal parsing rules.

- (set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t)
;-> ("Eats" "," "shoots" "," "and" "leaves") ; she's gone!

    因为没有指定界定符, 所以 ";" 之后的内容都被判定成了注释.
    如果要让parse 按你的规则分离数据, 就必须提供明确的界定符或者正则表达式.

- (set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t " ")
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

    或者

(parse t "\\s" 0) ; {\s} 是空白字符
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

    另一种分割字符串的方法就是使用 find-all .

(set 'a "1212374192387562311")
(println (find-all {\d{3}|\d{2}$|\d$} a))
;-> ("121" "237" "419" "238" "756" "231" "1")

; 二选一

(explode a 3)
;-> ("121" "237" "419" "238" "756" "231" "1")

    parse 会界定符吃掉, 而 find-all 则是留下来.

(find-all {\w+} t ) ; 匹配一个英文字母、数字或下划线；等价于[0-9a-zA-Z_]
;-> ("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")

(parse t {\w+} 0 ) ; 吃掉界定符
;-> ("" ", " ", " " " " ; " " " " " " " " " "")

(parse t {[^\w]+} 0 )
;->("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")

(append '("") (find-all {[^\w]+} t ) '(""))
;-> ("" ", " ", " " " " ; " " " " " " " " " "")

十. 其他的字符串函数
    Other string functions

    search 在文件中搜索符合要求的字符串. 并返回第一个符合要求的字符串的位置,
然后将文件指针移到字符串头的位置(默认情况下), 当 <bool-flag> 为 true 值时 , 则
移字符串末尾. 下次search 的时候, 从当前文件指针的位置继续开始.

> (help search )
syntax: (search <int-file> <str-search> [<bool-flag> [<int-options>]])

(set 'f (open {/private/var/log/system.log} {read}))
(search f {kernel})
(seek f (- (seek f) 64)) ; rewind file pointer
- (dotimes (n 3)
(println (read-line f)))
(close f)

    上面的代码从系统日志中搜索包含 kernel 的字符串, 然后从找到的位置回溯 64
个字节, 读取一行日志, 并打印出来.
    更多的字符串相关函数, 可以在手册中搜索 String and conversion functions .

十一. 格式化字符串
      String and conversion functions

    和其他的语言一样, nL也提供了优雅的字符串输出更能 (format 函数).
    假设我们需要打印如下的内容:

folder: Library
file: mach

    我们需要使用如下的字符串模板:

"folder: %s" ; or
" file: %s"

    提供给 format 一个文字模板, 之后依序接上所有模板中需要的参数.

(format "folder: %s" f) ; or
(format " file: %s" f)

> (help format)
syntax: (format <str-format> [<exp-data-1> <exp-data-2> ... ])
syntax: (format <str-format> <list-data>)

    <str-format> 就是字符串模板, 只有一个. 其后的参数都是编码中相对应的数据.
以 (format " file: %s" f) 为例, 这里提供的 f 是字符串, 前面模板里就必须放一个
%s , 如果提供的 f 是数字, 前面的模板就必须放一个 %d . 目前支持 11 种数据类型.

format description
s    text string
c    character (value 1 - 255)
d    decimal (32-bit)
u    unsigned decimal (32-bit)
x    hexadecimal lowercase
X    hexadecimal uppercase
o    octal (32-bits) (not supported on all compilers)
f    floating point
e    scientific floating point
E    scientific floating point
g    general floating point

    类似必须匹配, 否则会报错. %相当于转义字符, 他的位置代表了后面的数据在字符
串中的位置.

(set 'f "OneLisp")
(format "folder: %s" f)
;-->"folder: OneLisp"

(format "%s folder: " f)
"OneLisp folder: "

(format "%d" "abc")
;-->ERR: data type and format don't match in function format : "abc"

    下面的代码使用 directory 函数打印出当前目录下所有的文件和目录.

- (dolist (f (directory))
-     (if (directory? f)
        (println (format "folder: %s" f))
        (println (format " file: %s" f))))

;输出

folder: .
folder: ..
folder: api
file: cd.dll
file: cmd-lisp.bat
folder: code
file: CodePatterns-cn.html
file: CodePatterns-CN.html.bak
file: CodePatterns.html
file: COPYING
file: demo-stdin.lsp
file: drag.bat
folder: examples
file: freetype6.dll
file: gs.bat
folder: guiserver
file: guiserver-keyword.txt
...

    format 里的字符串模板还可以就行更精细的输出控制.

"%w.pf"

    f 就是之前介绍的数据类型标志, 必选.
    w 是这个数据输出时, 占用的宽度.
    p 是这个数据输出时, 的精度.
    w之前可以跟, 负号(右对齐), 正号(左对齐), 0 (空位用0填满) , 默认是右对齐.
    填 0 只在右对齐的时候有用.

>(format "Result = %05d" 2)
"Result = 00002"

> (format "Result = %+05d" 2)
"Result = +0002"
> (format "Result = %+05d" -2)
"Result = -0002"
> (format "Result = %-05d" -2)
"Result = -2   "
> (format "Result = %05d" -2)
"Result = -0002"

    下面来个复杂点的例子. 打印位于 32 - 400 内的所有字符, 并输出他们的十进制,
十六进制, 和二进制内容.
    因为format 无法输出二进制数据, 所以专门写了个二进制转换函数. 现在有个现成
的bits 可以转换 2 进制了.

- (define (binary x , results)
-   (until (<= x 0)
    (push (string (% x 2)) results) ;使用 % 求余, 代表每一位的二进制数
    (set 'x (/ x 2))) ; 重新设置 x
  results)

- (for (x 32 0x01a0)
-   (println (char x) ; 先用char将数字转换成字符
-     (format "%4d\t%4x\t%10s" ; 十进制 \t 十六进制 \t 二进制字符串
            (list x x (join (binary x))))))

x 120     78       1111000
y 121     79       1111001
z 122     7a       1111010
{ 123     7b       1111011
| 124     7c       1111100
} 125     7d       1111101
~ 126     7e       1111110

十二. 让newLISP思考
      Strings that make newLISP think

    为什么用这个标题, 嘿嘿, 最后有个很好玩的例子. 你甚至可以写个, 代码混乱生成
器, 看看你会得到些什么.

    本章最后介绍的两个函数: eval , eval-string .
    这两个函数专门负责执行nL代码.
    只要你提供的代码能通过检测, 他们就会返回给你结果.

    eval 接受表达式:

(set 'expr (+ 1 2))
(eval expr)
;-> 3

    eval-string 只接受字符串:

(set 'expr "(+ 1 2)")
(eval-string expr)
;-> 3

    使用这两个函数你可以执行任何的nL代码. 在我们默认执行的各种表达式中, 都隐含
了他们的身影. 他们被默认的执行着, 而你一定不能忘记他们曾经来过, 否则你很可能成
为一团浆糊. 当你对 symbol , 对宏对各种表达式的本质和他们的计算迷惑的时候, 回
来重新看看这句话, 你会豁然开朗.
    eval 为什么重要, 因为他代表了自主选择, 你可以在任何需要的时间 , 需要的地点
执行需要的代码. 特别是在操作宏的时候, 你的感受会更深.

    下面是段非常有趣的代码, 他可以不断的重组列表, 然后调用 eval-string 执行他
们, 直到某个表达式得到执行后, 才结束.

(set 'code '(")" "set" "'valid" "true" "("))
(set 'valid nil)
- (until valid
    (set 'code (randomize code)) ; 使用radomize 打乱 code 序列
    (println (join code " "))
    (eval-string (join code " ") MAIN nil))

;输出

) true 'valid ( set
'valid ) ( set true
true set ( 'valid )
'valid true ( set )
'valid ( true set )
) true ( set 'valid
) ( set 'valid true
'valid ) set true (
...
true set ) ( 'valid
true ( 'valid ) set
true 'valid ( set )
true ) 'valid ( set
( set 'valid true )
true

到目前为止newLISP的基础, 基本上算是介绍的差不多了, 接下来介绍的会比较深入点.
context 和宏 .
不过在nL里这些无论是看起来还是用起来, 还是原理上都非常简洁明了.
Good Luck !!!

彩色版本到http://code.google.com/p/newlisp-you-can-do下载使用scite4newlisp观看

2012-05-14 - 2012-05-17 15:10:29

转载于:https://my.oschina.net/darkcode/blog/60385

newLISP你也行 --- 字符串相关推荐

android 多行输入框,EditTextView Android中的多行字符串(Multiline String in EditTextView Android)...
EditTextView Android中的多行字符串(Multiline String in EditTextView Android) 我用这段代码创建了我的EditTextView: mEdit ...
Java 多行字符串
在本文中,我们来说说 Java 的多行字符串(multiline strings ). 从 JDK 15 开始,Java 提供了一个称为 Text Blocks 的语法,你可以在代码中直接使用这个功能 ...
Redis 笔记（11）— 文本协议 RESP（单行、多行字符串、整数、错误、数组、空值、空串格式、telnet 登录 redis）
RESP 是 Redis 序列化协议Redis Serialization Protocol 的简写.它是一种直观的文本协议,优势在于实现异常简单,解析性能极好. Redis 协议将传输的结构数据 ...
python跨行字符串变量_在Python中有没有在多行字符串中使用变量的方法？
所以我把这个作为邮件发送脚本的一部分:try: content = ("""From: Fromname To: Toname MIME-Version: 1.0 Con ...
【廖雪峰python入门笔记】raw 字符串和多行字符串表示
1. raw 字符串描述如果一个字符串包含很多需要转义的字符,对每一个字符都进行转义会很麻烦. 为了避免这种情况,我们可以在字符串前面加个前缀 r,表示这是一个 raw 字符串,里面的字符就不需要 ...
python多行字符串输入_python中怎么输入多行字符串
python中怎么输入多行字符串,疾风,不言,努力,人生,起风了 python中怎么输入多行字符串易采站长站,站长之家为您整理了python中怎么输入多行字符串的相关内容. Python中输入多行字 ...
java编写字符串连接程序注释_一种利用JAVA注释支持多行字符串的方法
从BeetlSql项目将SQL全放在Beetl模板里得到启发,又想到一个比较偏门的用法.以下代码实测通过,详见jSqlBox项目的test\examples\multipleLineSQL\SqlTe ...
多行字符串，带有多余的空格（保留缩进）
本文翻译自:Multi-line string with extra space (preserved indentation) I want to write some pre-defined te ...
如何在Go中编写多行字符串？
本文翻译自:How do you write multiline strings in Go? Does Go have anything similar to Python's multiline ...
在JavaScript中创建多行字符串
我在Ruby中有以下代码. 我想将此代码转换为JavaScript. JS中的等效代码是什么? text = <<"HERE" This Is A Multiline ...

newLISP你也行 --- 字符串

newLISP你也行 --- 字符串相关推荐

最新文章

热门文章