c语言坑人的旧式规则

skaiuijing

引言

昨天在写一个编译器前端，出现了一些bug，这些都是很基础的问题，但也是会浪费时间的小麻烦，简单记录一下。

求值顺序

c语言中，函数参数的求值顺序不是固定的，像java中，可能会从左到右，但是，在c语言中，可能从右到左。

就像这样：

struct Stmt *parser_stmts(struct Parser *p)
{
    if (p->look->tag == RBRACE)
        return Stmt_Null;
    struct Stmt *this = parser_stmt(p);
    struct Stmt *other = parser_stmts(p);
    return (struct Stmt *)seq_new(this, other);
}

但是，如果写成这样：


struct Stmt *parser_stmts(struct Parser *p)
{
    if (p->look->tag == RBRACE)
        return Stmt_Null;
    return (struct Stmt *)seq_new(parser_stmt(p), parser_stmts(p));
}

猜猜会发生什么？

程序会变成一个死循环。

因为：

在 C 语言里，函数实参的求值顺序是未指定的，不一定是左到右。

再看看另一个问题：

函数隐式声明

词法分析、语法分析什么的都调试好后，本来以为大功告成，于是加上浮点数支持。

没想到一加，发现出现了错误：

1 2	int i; int j; float v; float x; float[100] a; x = 3.14;

我写的编译器前端解析出来的结果是什么呢？

REAL: lexeme=<null>, real_val=3.140000
constant_float: v = 2.000000
L1:
SET rhs: node=0x21d02d0, tag=271, REAL real_val=2.000000
        x = 2.000000

x的值是2.00几？不是3.14.

出现这个问题，我第一反应就是内存溢出，导致了踩踏，

于是我用gdb设置了断点，写这个浮点数时自动打印上下文：

写一个通用的python脚本：

import gdb

class WatchMember(gdb.Command):
    """Watch a specific member of a struct.
Usage: watch-member <expr> <member>
Example: watch-member myStructPtr real_val
"""

    def __init__(self):
        super(WatchMember, self).__init__("watch-member", gdb.COMMAND_USER)

    def invoke(self, arg, from_tty):
        try:
            args = gdb.string_to_argv(arg)
            if len(args) != 2:
                print("Usage: watch-member <expr> <member>")
                return

            expr = args[0]
            member = args[1]

            # Evaluate the expression (pointer to struct)
            struct_ptr = gdb.parse_and_eval(expr)
            if struct_ptr is None:
                print("Error: expression evaluated to NULL")
                return

            # Dereference to get struct
            struct_val = struct_ptr.dereference()

            # Get the member
            member_val = struct_val[member]

            # Compute absolute address of the member
            member_addr = member_val.address

            print("Watching member: %s->%s" % (expr, member))
            print("Member address: %s" % (member_addr))

            # Set hardware watchpoint
            gdb.execute("watch -l *(%s)%s" % (member_val.type.pointer(), member_addr))

        except gdb.error as e:
            print("gdb error: %s" % e)
        except Exception as e:
            print("Python error: %s" % e)

WatchMember()

运行：


Hardware watchpoint 2: -location *(float *)0x60b29c

Old value = <unreadable>
New value = 3.13999987
new_lexer_token_real (v=3.13999987) at /home/skaiuijing/compiler/src/lexer.c:57
57          return t;
(gdb) q
A debugging session is active.

        Inferior 1 [process 8556] will be killed.

发现，token确实获取到了3.14，那为什么会变成2.0呢？

顺着函数调用链准备看代码，点到constant_float函数时，我发现vscode居然跳转不过去。

忽然，灵光一闪，我知道怎么回事了。

我的浮点数相关的函数没有在头文件中声明！c语言给我把参数按int传过去了！

在头文件中声明一下就好了。

其实gcc已经在编译警告中报告了这个消息，只不过混在一堆类型转换警告的信息里没注意到。

解析

最后，运行编译器前端，看看编译器前端解析结果：


skaiuijing@ubuntu:~/compiler/build$ ./littleCompiler 
TOKEN: tag=278, str={
TOKEN: tag=257, str=int
TOKEN: tag=264, str=i
TOKEN: tag=281, str=;
TOKEN: tag=257, str=int
TOKEN: tag=264, str=j
TOKEN: tag=281, str=;
TOKEN: tag=257, str=float
TOKEN: tag=264, str=v
TOKEN: tag=281, str=;
TOKEN: tag=257, str=float
TOKEN: tag=264, str=x
TOKEN: tag=281, str=;
TOKEN: tag=257, str=float
TOKEN: tag=297, str=[
TOKEN: tag=269, str=100
TOKEN: tag=298, str=]
TOKEN: tag=264, str=a
TOKEN: tag=281, str=;
TOKEN: tag=264, str=x
TOKEN: tag=289, str==
TOKEN: tag=271, str=3.140000
constant_float: v = 3.140000
TOKEN: tag=281, str=;
TOKEN: tag=274, str=while
TOKEN: tag=276, str=(
TOKEN: tag=273, str=true
TOKEN: tag=277, str=)
TOKEN: tag=278, str={
TOKEN: tag=259, str=do
TOKEN: tag=264, str=i
TOKEN: tag=289, str==
TOKEN: tag=264, str=i
TOKEN: tag=284, str=+
TOKEN: tag=269, str=1
TOKEN: tag=281, str=;
TOKEN: tag=274, str=while
TOKEN: tag=276, str=(
TOKEN: tag=264, str=a
TOKEN: tag=297, str=[
TOKEN: tag=264, str=i
TOKEN: tag=298, str=]
TOKEN: tag=60, str=#60
TOKEN: tag=264, str=v
TOKEN: tag=277, str=)
TOKEN: tag=281, str=;
TOKEN: tag=259, str=do
TOKEN: tag=264, str=j
TOKEN: tag=289, str==
TOKEN: tag=264, str=j
TOKEN: tag=285, str=-
TOKEN: tag=269, str=1
TOKEN: tag=281, str=;
TOKEN: tag=274, str=while
TOKEN: tag=276, str=(
TOKEN: tag=264, str=a
TOKEN: tag=297, str=[
TOKEN: tag=264, str=j
TOKEN: tag=298, str=]
TOKEN: tag=62, str=#62
TOKEN: tag=264, str=v
TOKEN: tag=277, str=)
TOKEN: tag=281, str=;
TOKEN: tag=265, str=if
TOKEN: tag=276, str=(
TOKEN: tag=264, str=i
TOKEN: tag=263, str=>=
TOKEN: tag=264, str=j
TOKEN: tag=277, str=)
TOKEN: tag=258, str=break
TOKEN: tag=281, str=;
TOKEN: tag=264, str=x
TOKEN: tag=289, str==
TOKEN: tag=264, str=a
TOKEN: tag=297, str=[
TOKEN: tag=264, str=i
TOKEN: tag=298, str=]
TOKEN: tag=281, str=;
TOKEN: tag=264, str=a
TOKEN: tag=297, str=[
TOKEN: tag=264, str=i
TOKEN: tag=298, str=]
TOKEN: tag=289, str==
TOKEN: tag=264, str=a
TOKEN: tag=297, str=[
TOKEN: tag=264, str=j
TOKEN: tag=298, str=]
TOKEN: tag=281, str=;
TOKEN: tag=264, str=a
TOKEN: tag=297, str=[
TOKEN: tag=264, str=j
TOKEN: tag=298, str=]
TOKEN: tag=289, str==
TOKEN: tag=264, str=x
TOKEN: tag=281, str=;
TOKEN: tag=279, str=}
TOKEN: tag=279, str=}
TOKEN: tag=0, str=#0
L1:
        x = 3.140000
L3:
L4:
        iffalse true goto L2
L6:
        i = i + 1
        if a [ i * 8 ] #60 v goto L6
L5:
L8:
        j = j - 1
        if a [ j * 8 ] #62 v goto L8
L7:
        iffalse i >= j goto L9
L10:
        goto L2
L9:
        x = a [ i * 8 ]
L11:
        a [ i * 8 ] = a [ j * 8 ]
L12:
        a [ j * 8 ] = x
        goto L4
L2: