正则表达式处理Nginx

Nginx 日志配置格式

1
2
3
4
5
log_format  main
        '[$upstream_addr] $remote_addr [$time_local] "$request" $status '
        '"$request_body" $body_bytes_sent "$http_referer" "$http_user_agent" '
        'RESP:$upstream_response_time '
        'REQ:$request_time';

样例

1
2
3
4
[192.168.1.5:80] 19.78.22.51 [31/Dec/2015:13:59:02 +0800] "POST /api/mbbb/dup_msg_send?pallow_dubbing=0&partner_msgs_id=279&roles_id=23&score=6700&section_id=512&whole_audio=200.m4a&d
evice_id=112222f0fc3&lang=zh-CN&trigger=user&user_id=516487&v=ios_7.0.3 HTTP/1.1" 200 "audio_fragment=00280%22%3A%7B%22pitch%22%3A47%2C%22rhythm%22%3A95%2C%22tone%22%3A75%7D%2C%226800278%22%3A%7B%22pitch%22%3A70%2C%22rhythm%22%3A90%2%7D%2C%226800276%22
%3A%7B%22pitch%22%3A60%2C%22rhythm%22%3A82%2C%22tone%22D&content=hhhhhh%E5%B0%8F%E5%AB%A9%E8%8D%89" 51 "-" "paipao/7.0.3 (iPhone; iOS 9.2; Scale/2.00)" RESP:0.166 REQ:0.167
[192.168.1.5:80, 192.169.1.33:88] 60.12.246.5 [31/Dec/2015:23:59:02 +0800] "GET /api/mppb/notice/list?device_id=112233fe6a6d991f&lang=zh-CN&user_id=6120&v=ios_7.0.3 HTTP/1.1" 200 "-" 54 "-" "paipao/7.0.3 (iPhone; iOS 9.2; Scale/2.00)" RESP:0.006 REQ:0.006

正则表达式

Python版本

1
2
3
4
5
6
7
8
9
10
p = re.compile(
            r"\[\-?[\d.\:]*[\ \,]*?.*?\]\ [\d.\:]*\ \[(\d+)/(\w+)/(\d+)\:(\S+)\ [\S]+\]\ \"(\S+)\ (\S+)\ .*?\"\ (\d+)\ \"(.*?)\"\ (\d+)\ \"([^\"]*)\"\ \".*?\" .*?")
m = re.findall(p, line)
day = m[0][0]
month = m[0][1]
year = m[0][2]
ttime = m[0][3]
method = m[0][4]
request = m[0][5]
status = m[0][6]

Scala版本

1
2
val regex = new Regex( """\[\-?[\d.\:]*[\ \,]*?.*?\]\ [\d.\:]*\ \[(\d+)/(\w+)/(\d+)\:(\S+)\ [\S]+\]\ \"(\S+)\ (\S+)\ .*?\"\ (\d+)\ \"(.*?)\"\ (\d+)\ \"([^\"]*)\"\ \".*?\" .*?""")
val regex(day, month, year, time, method, request, status, postData, bytes, refer) = line

参考文献

  • http://www.jianshu.com/p/5d8c802be13d
  • https://segmentfault.com/a/1190000002727070
  • http://desert3.iteye.com/blog/1001568
  • http://stackoverflow.com/questions/996536/regex-in-python