Jackey's 感悟

Do Research

>Linux下的段错误产生的原因及调试方法

>本文内容摘录自过个网站。在Linux下C/C++应用程序段错误经常发生,如何快速准确定位到出错地方,是本文讨论的主要内容。这里介绍两种不同环境下 的解决方法,带gdb调试环境的开发环境和不带gdb的嵌入式或发布环境。这些方法的关键都是利用程序运行调用堆栈来定位出错地点。
关键词:段错误(Segmentation fault),SIGSEGV,gdb,backtrace,objdump

一、段错误简要介绍和分析
[参见http://www.upsdn.net/html/2006-11/775.html]
简而言之,产生段错误就是访问了错误的内存段,一般是你没有权限,或者根本就不存在对应的物理内存,尤其常见的是访问0地址.

一般来说, 段错误就是指访问的内存超出了系统所给这个程序的内存空间,通常这个值是由gdtr来保存的,他是一个48位的寄存器,其中的32位是保存由它指向的 gdt表,后13位保存相应于gdt的下标,最后3位包括了程序是否在内存中以及程序的在cpu中的运行级别,指向的gdt是由以64位为一个单位的表, 在这张表中就保存着程序运行的代码段以及数据段的起始地址以及与此相应的段限和页面交换还有程序运行级别还有内存粒度等等的信息。一旦一个程序发生了越界 访问,cpu就会产生相应的异常保护,于是segmentation fault就出现了.

在编程中以下几类做法容易导致段错误,基本是是错误地使用指针引起的

1)访问系统数据区,尤其是往  系统保护的内存地址写数据
   最常见就是给一个指针以0地址
2)内存越界(数组越界,变量类型不一致等) 访问到不属于你的内存区域

二、解决方案

  1. 带GDB调试环境

先上例子再说。文件d.c内容如下
     1  dummy_function (void)
     2  {
     3          unsigned char *ptr = 0x00;
     4          *ptr = 0x00;
     5  }
     6
     7  int main (void)
     8  {
     9          dummy_function ();
    10
    11          return 0;
    12  }
第四行会造成段错误。
gcc -g d.c -o d 编译生成可执行程序d
运行结果如下
[chen@localhost seg]$ ./d
Segmentation fault

用gdb调试
[chen@localhost seg]$ gdb ./d
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB.  Type “show warranty” for details.
This GDB was configured as “i386-redhat-linux-gnu”…Using host libthread_db library “/lib/libthread_db.so.1”.

(gdb) r
Starting program: /home/chen/code/seg/d

Program received signal SIGSEGV, Segmentation fault.
0x08048364 in dummy_function () at d.c:28
4              *ptr = 0x00;

(gdb) backtrace
#0  0x08048364 in dummy_function () at d.c:4
#1  0x0804837c in main () at d.c:9

按上述步骤,能够清晰看到C文件产生段错误信号SIGSEGV在函数dummy_function()的第四行,和我们判断的一样,backtrace打 印堆栈调用
也是如此。

下面看一个C++例子,多调用一些模板库,会让调试麻烦些,但是这是比较接近实际应用情况。
文件iterbug.cpp内容如下
     1  #include <iostream>
     2  #include <vector>
     3  #include <iterator>
     4  #include <algorithm>
     5  using namespace std;
     6
     7  void dummy_function(void)
     8  {
     9          vector<int> coll1;
    10          vector<int> coll2;
    11
    12          /*
    13           * RUNTIME ERROR:
    14           * – beginning is behind the end of the range
    15           */
    16          vector<int>::iterator pos = coll1.begin();
    17          reverse (++pos,coll1.end());
    18
    19
    20          for ( int i=1; i<=9 ;++i )
    21                  coll2.push_back(i);
    22
    23          /*
    24           * RUNTIME ERROR:
    25           * – overwriting nonexisting elements
    26           */
    27          copy(coll2.begin(), coll2.end(),
    28                          coll1.begin());
    29
    30          /*
    31           * RUNTTIME ERROR:
    32           * – collections mistaken
    33           *   begin() and end() mistaken
    34           */
    35          copy(coll1.begin(), coll2.end(),
    36                          coll1.end());
    37  }
    38  int main()
    39  {
    40          dummy_function();
    41          return 0;
    42  }

三个运行时错误都在代码中注释出来了,后面两个根本不会执行,因为第一个运行时错误会造成程序终止。
 g++ -g iterbug.cpp -o iterbug
将程序编译
运行
[chen@localhost seg]$ ./iterbug
Segmentation fault
[chen@localhost seg]$ gdb ./iterbug
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB.  Type “show warranty” for details.
This GDB was configured as “i386-redhat-linux-gnu”…Using host libthread_db library “/lib/libthread_db.so.1”.

(gdb) r
Starting program: /home/chen/code/seg/iterbug

Program received signal SIGSEGV, Segmentation fault.
0x08048bbf in std::swap<int> (__a=@0x4, __b=@0xfffffffc)
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:97
97            _Tp __tmp = __a;
这时候进入了STL的代码里面,这里不太容易看出自身代码是哪里出错。就用backtrace查看堆栈情况
(gdb) backtrace
#0  0x08048bbf in std::swap<int> (__a=@0x4, __b=@0xfffffffc)
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:97
#1  0x08048c03 in std::__iter_swap<true>::iter_swap<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__a={_M_current = 0x4}, __b=
      {_M_current = 0xfffffffc})
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:127
#2  0x08048c22 in std::iter_swap<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__a={_M_current = 0x4}, __b=
      {_M_current = 0xfffffffc})
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algobase.h:163
#3  0x08048c5f in std::__reverse<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__first={_M_current = 0x4}, __last=
      {_M_current = 0xfffffffc})
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algo.h:1586
#4  0x08048cc3 in std::reverse<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > > (__first={_M_current = 0x4}, __last=
      {_M_current = 0x0})
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algo.h:1611
#5  0x080487bc in dummy_function () at iterbug.cpp:17
#6  0x0804891a in main () at iterbug.cpp:40

发现,我们是在0x080487bc in dummy_function () at iterbug.cpp:17发生错误,即iterbug.cpp的17行。我们来查看代码。
(gdb) l dummy_function()
2       #include <vector>
3       #include <iterator>
4       #include <algorithm>
5       using namespace std;
6
7       void dummy_function(void)
8       {
9               vector<int> coll1;
10              vector<int> coll2;
11
(gdb) l
12              /*
13               * RUNTIME ERROR:
14               * – beginning is behind the end of the range
15               */
16              vector<int>::iterator pos = coll1.begin();
17              reverse (++pos,coll1.end());
18
19
20              for ( int i=1; i<=9 ;++i )
21                      coll2.push_back(i);
好了,确实找到了错误所在。
后面就是修正这些逻辑错误了。

上述例子分别来自http://www.upsdn.net/html/2006-11/775.html 和 The C++ Standard Library A Tutorial and Reference

  1. 不带GDB调试环境

在很多嵌入式开发环境下,不能提供gdb,因为gdb运行环境太大了。不过一般的嵌入式开发环境会提供objdump等工具,那么可以通过 backtrace来获得堆栈信息,再用objdump来查看获得的堆栈信息与代码的关联。

http://www.gnu.org/software/libc/manual/html_node/Debugging-Support.html#Debugging -Support上的说明
摘录如下:

33 Debugging support

Applications are usually debugged using dedicated debugger programs. But sometimes this is not possible and, in any case, it is useful to provide the developer with as much information as possible at the time the problems are experienced. For this reason a few functions are provided which a program can use to help the developer more easily locate the problem.

33.1 Backtraces

A backtrace is a list of the function calls that are currently active in a thread. The usual way to inspect a backtrace of a program is to use an external debugger such as gdb. However, sometimes it is useful to obtain a backtrace programmatically from within a program, e.g., for the purposes of logging or diagnostics.

The header file execinfo.h declares three functions that obtain and manipulate backtraces of the current thread.

— Function: int backtrace (void **buffer, int size)

The backtrace function obtains a backtrace for the current thread, as a list of pointers, and places the information into buffer. The argument size should be the number of void * elements that will fit into buffer. The return value is the actual number of entries of buffer that are obtained, and is at most size.

The pointers placed in buffer are actually return addresses obtained by inspecting the stack, one return address per stack frame.

Note that certain compiler optimizations may interfere with obtaining a valid backtrace. Function inlining causes the inlined function to not have a stack frame; tail call optimization replaces one stack frame with another; frame pointer elimination will stop backtrace from interpreting the stack contents correctly.

— Function: char ** backtrace_symbols (void *const *buffer, int size)

The backtrace_symbols function translates the information obtained from the backtrace function into an array of strings. The argument buffer should be a pointer to an array of addresses obtained via the backtrace function, and size is the number of entries in that array (the return value of backtrace).

The return value is a pointer to an array of strings, which has size entries just like the array buffer. Each string contains a printable representation of the corresponding element of buffer. It includes the function name (if this can be determined), an offset into the function, and the actual return address (in hexadecimal).

Currently, the function name and offset only be obtained on systems that use the ELF binary format for programs and libraries. On other systems, only the hexadecimal return address will be present. Also, you may need to pass additional flags to the linker to make the function names available to the program. (For example, on systems using GNU ld, you must pass (-rdynamic.)

The return value of backtrace_symbols is a pointer obtained via the malloc function, and it is the responsibility of the caller to free that pointer. Note that only the return value need be freed, not the individual strings.

The return value is NULL if sufficient memory for the strings cannot be obtained.

— Function: void backtrace_symbols_fd (void *const *buffer, int size, int fd)

The backtrace_symbols_fd function performs the same translation as the function backtrace_symbols function. Instead of returning the strings to the caller, it writes the strings to the file descriptor fd, one per line. It does not use the malloc function, and can therefore be used in situations where that function might fail.

The following program illustrates the use of these functions. Note that the array to contain the return addresses returned by backtrace is allocated on the stack. Therefore code like this can be used in situations where the memory handling via malloc does not work anymore (in which case the backtrace_symbols has to be replaced by a backtrace_symbols_fd call as well). The number of return addresses is normally not very large. Even complicated programs rather seldom have a nesting level of more than, say, 50 and with 200 possible entries probably all programs should be covered.

     #include <execinfo.h>      #include <stdio.h>      #include <stdlib.h>            /* Obtain a backtrace and print it to stdout. */      void      print_trace (void)      {        void *array[10];        size_t size;        char **strings;        size_t i;              size = backtrace (array, 10);        strings = backtrace_symbols (array, size);              printf ("Obtained %zd stack frames.\n", size);              for (i = 0; i < size; i++)           printf ("%s\n", strings[i]);              free (strings);      }            /* A dummy function to make the backtrace more interesting. */      void      dummy_function (void)      {        print_trace ();      }            int      main (void)      {        dummy_function ();        return 0;      } 

好吧,看了这个说明就大致知道了backtrace等三个函数是如何调用的,以及有什么作用。
更多的可以看http://www.kernel.org/doc/man– pages/online/pages/man3/backtrace.3.html上的man page
这时候再回到http://www.upsdn.net/html/2006-11/775.html上给的一个例子

.利用backtrace和objdump进行分析:
[chen@localhost seg]$ cat -n backtrace.c
     1  #include <stdio.h>
     2  #include <execinfo.h>
     3  #include <stdlib.h>
     4  #include <signal.h>
     5
     6  /*
     7   * A dummy function to make the backtrace more interesting.
     8   */
     9  void
    10  dummy_function(void)
    11  {
    12          unsigned char *ptr = 0x00;
    13          *ptr = 0x00;
    14  }
    15
    16  void dump(int signo)
    17  {
    18          void *array[10];
    19          size_t size;
    20          char **strings;
    21          size_t i;
    22
    23          size = backtrace(array, 10);
    24          strings = backtrace_symbols(array,size);
    25
    26          printf(“Obtained %zd stack frames.\n”, size);
    27
    28
    29          for ( i = 0; i < size ; ++i )
    30                  printf(“%s\n”,strings[i]);
    31
    32          free(strings);
    33          exit(0);
    34  }
    35
    36  int
    37  main(void)
    38  {
    39          signal(SIGSEGV, &dump);
    40          dummy_function();
    41
    42          return 0;
    43  }
一样是第十三行出现错误

gcc -g -rdynamic backtrace.c -o backtrace
编译运行
在man page里面说明了使用backtrace函数需要用-rdynamic参数进行编译

[chen@localhost seg]$ ./backtrace
Obtained 5 stack frames.
./backtrace(dump+0x19) [0x80486c2]
[0x192420]
./backtrace(main+0x2a) [0x8048756]
/lib/libc.so.6(__libc_start_main+0xdc) [0x3e1dec]
./backtrace [0x80485e1]
这里打印出了堆栈调用情况
我们用objdump来查看0x8048756这样的地址到底是什么地方
 objdump -d -S backtrace > backtrace.dump
查看backtrace.dump文件,搜索上面那几个十六进制地址

backtrace:     file format elf32-i386

Disassembly of section .init:

    …

08048694 <dummy_function>:
 * A dummy function to make the backtrace more interesting.
 */
void
dummy_function(void)
{
 8048694:    55                       push   %ebp
 8048695:    89 e5                    mov    %esp,%ebp
 8048697:    83 ec 10                 sub    $0x10,%esp
    unsigned char *ptr = 0x00;
 804869a:    c7 45 fc 00 00 00 00     movl   $0x0,0xfffffffc(%ebp)
    *ptr = 0x00;
 80486a1:    8b 45 fc                 mov    0xfffffffc(%ebp),%eax
 80486a4:    c6 00 00                 movb   $0x0,(%eax)
}
 80486a7:    c9                       leave 
 80486a8:    c3                       ret   

080486a9 <dump>:

void dump(int signo)
{
 80486a9:    55                       push   %ebp
 80486aa:    89 e5                    mov    %esp,%ebp
 80486ac:    83 ec 48                 sub    $0x48,%esp
    void *array[10];
    size_t size;
    char **strings;
    size_t i;

    size = backtrace(array, 10);
 80486af:    c7 44 24 04 0a 00 00     movl   $0xa,0x4(%esp)
 80486b6:    00
 80486b7:    8d 45 cc                 lea    0xffffffcc(%ebp),%eax
 80486ba:    89 04 24                 mov    %eax,(%esp)
 80486bd:    e8 c6 fe ff ff           call   8048588 <backtrace@plt>
 80486c2:    89 45 f4                 mov    %eax,0xfffffff4(%ebp)
    strings = backtrace_symbols(array,size);
 80486c5:    8b 45 f4                 mov    0xfffffff4(%ebp),%eax
 80486c8:    89 44 24 04              mov    %eax,0x4(%esp)
 80486cc:    8d 45 cc                 lea    0xffffffcc(%ebp),%eax
 80486cf:    89 04 24                 mov    %eax,(%esp)
 80486d2:    e8 91 fe ff ff           call   8048568 <backtrace_symbols@plt>
 80486d7:    89 45 f8                 mov    %eax,0xfffffff8(%ebp)

    printf(“Obtained %zd stack frames.\n”, size);
 80486da:    8b 45 f4                 mov    0xfffffff4(%ebp),%eax
 80486dd:    89 44 24 04              mov    %eax,0x4(%esp)
 80486e1:    c7 04 24 40 88 04 08     movl   $0x8048840,(%esp)
 80486e8:    e8 8b fe ff ff           call   8048578 <printf@plt>

    for ( i = 0; i < size ; ++i )
 80486ed:    c7 45 fc 00 00 00 00     movl   $0x0,0xfffffffc(%ebp)
 80486f4:    eb 17                    jmp    804870d <dump+0x64>
        printf(“%s\n”,strings[i]);
 80486f6:    8b 45 fc                 mov    0xfffffffc(%ebp),%eax
 80486f9:    c1 e0 02                 shl    $0x2,%eax
 80486fc:    03 45 f8                 add    0xfffffff8(%ebp),%eax
 80486ff:    8b 00                    mov    (%eax),%eax
 8048701:    89 04 24                 mov    %eax,(%esp)
 8048704:    e8 8f fe ff ff           call   8048598 <puts@plt>
 8048709:    83 45 fc 01              addl   $0x1,0xfffffffc(%ebp)
 804870d:    8b 45 fc                 mov    0xfffffffc(%ebp),%eax
 8048710:    3b 45 f4                 cmp    0xfffffff4(%ebp),%eax
 8048713:    72 e1                    jb     80486f6 <dump+0x4d>

    free(strings);
 8048715:    8b 45 f8                 mov    0xfffffff8(%ebp),%eax
 8048718:    89 04 24                 mov    %eax,(%esp)
 804871b:    e8 38 fe ff ff           call   8048558 <free@plt>
    exit(0);
 8048720:    c7 04 24 00 00 00 00     movl   $0x0,(%esp)
 8048727:    e8 7c fe ff ff           call   80485a8 <exit@plt>

0804872c <main>:
}

int
main(void)
{
 804872c:    8d 4c 24 04              lea    0x4(%esp),%ecx
 8048730:    83 e4 f0                 and    $0xfffffff0,%esp
 8048733:    ff 71 fc                 pushl  0xfffffffc(%ecx)
 8048736:    55                       push   %ebp
 8048737:    89 e5                    mov    %esp,%ebp
 8048739:    51                       push   %ecx
 804873a:    83 ec 14                 sub    $0x14,%esp
    signal(SIGSEGV, &dump);
 804873d:    c7 44 24 04 a9 86 04     movl   $0x80486a9,0x4(%esp)
 8048744:    08
 8048745:    c7 04 24 0b 00 00 00     movl   $0xb,(%esp)
 804874c:    e8 d7 fd ff ff           call   8048528 <signal@plt>
    dummy_function();
 8048751:    e8 3e ff ff ff           call   8048694 <dummy_function>

    return 0;
 8048756:    b8 00 00 00 00           mov    $0x0,%eax
}
 
    …

上面用红色标注出来的就是那几个地址了
[chen@localhost seg]$ ./backtrace
Obtained 5 stack frames.
./backtrace(dump+0x19) [0x80486c2]
[0x192420]
./backtrace(main+0x2a) [0x8048756]
/lib/libc.so.6(__libc_start_main+0xdc) [0x3e1dec]
./backtrace [0x80485e1]
我们再看看这些地址分析下。
最后一次调用堆栈是 ./backtrace(dump+0x19) [0x80486c2]这肯定是backtrace函数调用,
./backtrace(main+0x2a) [0x8048756]则是我们自己写的程序最后出错地方。
8048756:    b8 00 00 00 00           mov    $0x0,%eax
对应的是return 0,呵呵,我们出错的地址是在8048756的上一条。即
8048751:    e8 3e ff ff ff           call   8048694 <dummy_function>
所以我们在没有用gdb的情况,可以判断段错误发生在函数dummy_function上,但是具体在哪一行就不能获得。

建议将这些内容看看,如果对这些感兴趣的。
http://www.upsdn.net/html/2006-11/775.html
http://www.gnu.org/software/libc/manual/html_node/Debugging-Support.html#Debugging-Support
http://www.kernel.org/doc/man-pages/online/pages/man3/backtrace.3.html
<完>

Advertisements

发表评论

Fill in your details below or click an icon to log in:

WordPress.com 徽标

You are commenting using your WordPress.com account. Log Out /  更改 )

Google photo

You are commenting using your Google account. Log Out /  更改 )

Twitter picture

You are commenting using your Twitter account. Log Out /  更改 )

Facebook photo

You are commenting using your Facebook account. Log Out /  更改 )

Connecting to %s

%d 博主赞过: