numatop(1m) 맨 페이지 - 윈디하나의 솔라나라

개요

섹션
맨페이지이름
검색(S)

numatop

Name
     numatop - A tool for memory access locality characterization
     and analysis.

Synopsis
     numatop [-s sampling_precision]
             [-l log_level] [-f log_file] [-d dump_file] [-h]

Description
     Most modern systems use  Non-Uniform  Memory  Access  (NUMA)
     design  for  multiprocessing.   In  NUMA systems, memory and
     processors are organized in such a way that  some  parts  of
     memory are closer to a given processor while other parts are
     farther from it. A  processor  can  access  memory  that  is
     closer  to  it,  much faster than the memory that is farther
     from it.  Hence, the latency between the processors and dif-
     ferent  portions of the memory in a NUMA machine may be sig-
     nificantly different.


     numatop is an observation tool for the runtime memory local-
     ity  characterization  and analysis of processes and threads
     running on a NUMA system. It helps the user characterize the
     NUMA  behavior  of  processes and threads and identify where
     the NUMA-related performance bottlenecks  reside.  The  tool
     can be used to:

         o    Characterize the locality of all running  processes
              and  threads  to  identify  those  with the poorest
              locality in the system.

         o    Identify the `hot'  memory  areas,  report  average
              memory  access  latency,  and  provide the location
              where accessed memory is allocated.  A `hot' memory
              area  is  where process/thread(s) accesses are most
              frequent. numatop has a metric called ACCESS%  that
              specifies  what  percentage  of memory accesses are
              attributable to each memory area.

         Note -

           numatop records only the memory  accesses  which  have
           latencies greater than a predefined threshold.

         o    Provide the call-chain(s) when  the  process/thread
              generates  certain  counter  events, such as Remote
              Memory Access (RMA),  Local  Memory  Access  (LMA),
              Instruction  Retired  (IR), and CPU cycles (CYCLE).
              The call-chains help the  user  locate  the  source
              code that generates the events.

         o    Provide per-node  statistics  for  memory  and  CPU
              utilization.  A node is a region of memory in which
              every byte has the same distance from each CPU.

         o    Show, using a user-friendly interface, the list  of
              processes/threads   sorted   by  some  metrics  (by
              default, sorted by CPU utilization), with  the  top
              process  having  the highest CPU utilization in the
              system and the bottom one  having  the  lowest  CPU
              utilization.  Users  can also use hotkeys to resort
              the output by these metrics: RMA, LMA, RMA/LMA, CPU
              cycle  per  Instruction  (CPI), and CPU Utilization
              (CPU%).


     numatop is a GUI tool that periodically tracks and  analyzes
     the NUMA activity of processes and threads and displays use-
     ful metrics. Users can scroll up/down by  using  the  up  or
     down  key  to  navigate  in  the  current window and can use
     several hot keys shown at  the  bottom  of  the  window,  to
     switch between windows or to change the running state of the
     tool.  For example, hotkey  R  refreshes  the  data  in  the
     current window.


     The tool supports the Intel Westmere-EX and Sandy  Bridge-EP
     platforms.


     Below is a detailed description of the various display  win-
     dows and the data items that numatop displays:

     WIN1 - Monitoring processes and threads
         Get the locality characterization of all processes. This
         is the first window upon startup and numatop's Home win-
         dow. This window displays a list processes. The top pro-
         cess  has  the  highest  system  CPU utilization (CPU%),
         while the bottom process has the lowest CPU% in the sys-
         tem.  Generally,  the  memory-intensive  process is also
         CPU-intensive, so the processes shown in WIN1 are sorted
         by  CPU%  by default.  The user can use hotkeys 1, 2, 3,
         4, or 5 to resort the output by RMA, LMA, RMA/LMA,  CPI,
         or CPU% respectively.

           [KEY METRICS]:
           RMA(K): number of Remote Memory Access (unit is 1000).
           RMA(K) = RMA / 1000
           LMA(K): number of Local Memory Access (unit is 1000).
           LMA(K) = LMA / 1000
           RMA/LMA: ratio of RMA / LMA.
           CPI: CPU cycles per instruction.
           CPU%: System CPU utilization (busy time across all CPUs).
           [HOTKEY]:
           `Q': Quit the application.
           `H': WIN1 refresh.
           `R': Refresh to show the latest data.
           `I': Show the normalized data.
           `N': Show the per-node statistics
           <Enter>: Switch to WIN3 for the selected process.
           `1': Sort by `RMA'.
           `2': Sort by `LMA'.
           `3': Sort by `RMA/LMA'.
           `4': Sort by `CPI'.
           `5': Sort by `CPU%'



     WIN2 - Monitoring processes and threads (normalized)
         Get the  normalized  locality  characterization  of  all
         processes.

           [KEY METRICS]:
           RPI(K): RMA normalized by 1000 instructions.
           RPI(K) = RMA / (IR / 1000);
           LPI(K): LMA normalized by 1000 instructions.
           LPI(K) = LMA / (IR / 1000);
           Other metrics remain the same.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
           `N': Show the per-node statistics.
           <Enter>: Switch to WIN3 for the selected process.
           `1': Sort by `RPI'.
           `2': Sort by `LPI'.
           `3': Sort by `RMA/LMA'.
           `4': Sort by `CPI'.
           `5': Sort by `CPU%'



     WIN3 - Monitoring the process
         Get the locality characterization with node affinity  of
         a specified process.

           [KEY METRICS]:
           NODE: the node ID.
                CPU%: per-node CPU utilization.
                Other metrics remain the same.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
                `N': Show the per-node statistics.
           `L': Show the latency information.
                `C': Show the call-chain.
                <Enter>: Switch to WIN4 for the specified process.



     WIN4 - Monitoring all threads
         Get the locality characterization of all  threads  in  a
         specified process.

           [KEY METRICS]:
           CPU%: per-CPU CPU utilization
                Other metrics remain the same.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
                `N': Show the per-node statistics.



     WIN5 - Monitoring the thread
         Get the locality characterization with node affinity  of
         a specified thread.

           [KEY METRICS]:
           CPU%: per-CPU CPU utilization.
                Other metrics remain the same.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
           `N': Show the per-node statistics.
           `L': Show the latency information.
           `C': Show the call-chain.



     WIN6 - Monitoring memory areas
         Get the memory area use with  the  associated  accessing
         latency of a specified process/thread.

           [KEY METRICS]:
           ADDR: starting address of the memory area.
           SIZE: size of memory area (K/M/G bytes)
           ACCESS%: percentage of memory accesses are to this memory area.
           LAT(ns): the average latency (nanosecond) of memory accesses.
           DESC: description of memory area (from /proc/<pid>/maps).

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
           `D': Show the memory access node distribution.
           `M': Recalculate the address mapping.
                <Enter>: Show break down the memory area into physical memory on node.



     WIN7 - Memory access node distribution overview
         Get the percentage of memory  accesses  originated  from
         the process/thread to each node.

           [KEY METRICS]:
           NODE: the node ID.
                ACCESS%: percentage of memory accesses are to this node.
                LAT(ns): the average latency (nanoseconds) of memory accesses
                    to this node.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
           `M': Recalculate the address mapping.



     WIN8 - Break down the memory area into physical memory on
     node
         Break down the memory area into the physical mapping  on
         node   with   the  associated  accessing  latency  of  a
         process/thread.

           [KEY METRICS]:
           NODE: the node ID.
                Other metrics remain the same.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
           `M': Recalculate the address mapping.



     WIN9 - Call-chain when process/thread generates the speci-
     fied event
         Shows the call-chains when the  process  generates  RMA,
         LMA, CYCLE, or IR.

           [KEY METRICS]:
           Call-chain list: a list of call-chains.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.
                `1': Show the call-chain for RMA
                `2': Show the call-chain for LMA
                `3': Show the call-chain for CYCLE
                `4': Show the call-chain for IR



     WIN10 - Node Overview
         Shows the basic per-node statistics for this system.

           [KEY METRICS]:
           LG: node id of this node.
           MEM.ALL: total physical memory in this node.
                MEM.FREE: free physical memory in this node.
                CPU%: per-node CPU utilization
                Other metrics remain the same

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window (WIN).
           `R': Refresh to show the latest data.
                <Enter>: Show the information of the specified node.



     WIN11 - Information of the node
         Shows the memory use and CPU utilization for the  speci-
         fied node.

           CPU: array of logical CPUs which belong to this node.
                CPU%: per-node CPU utilization
                Other metrics remain the same.

           [HOTKEY]:
           `Q': Quit the application.
           `H': Switch to WIN1.
           `B': Back to previous window.
           `R': Refresh to show the latest data.

Options
     The following options are supported:

     -s sampling_precision

                              normal
                                        Balance the precision and
                                        overhead (default)


                              high
                                        High  sampling  precision
                                        (high overhead)


                              low
                                        Low  sampling  precision,
                                        suitable  for  high  load
                                        system



     -l log_level
                              Specifies the level of  logging  in
                              the log file. The valid values are:

                                   None (default)
                                   Unknown (reserved)
                                   All



     -f log_file
                              Specifies the log file where output
                              will be written.


     -d dump_file
                              Specifies the dump file  where  the
                              screen data will be written.


     -h
                              Displays the command's usage.

Examples

     Example 1 Launching numatop With Default Behavior


     The following command launches the tool with default  values
     for the supported options:


       # numatop



     Example 2 Launching numatop With High Sampling Precision


     The following command launches the tool with  high  sampling
     precision:


       # numatop -s high



     Example 3 Specifying a log File


     The following command sets the log file to  /tmp/numatop.log
     and dumps all warning messages into it.


       # numatop -l 2 -o /tmp/numatop.log



     Example 4 Specifying a Dump File


     The following command sets the dump  file  to  /tmp/dump.log
     and dumps all screen data into it.


       # numatop -d /tmp/dump.log

Exit Status
     The following exit values are returned:

     0
                    Successful operation.


     Other Value
                    An error occurred.

Usage
     You must have root privileges to run numatop.

Attributes
     See attributes(5) for descriptions of the  following  attri-
     butes:



     tab() box; cw(2.75i) |cw(2.75i) lw(2.75i) |lw(2.75i)  ATTRI-
     BUTE     TYPEATTRIBUTE    VALUE    _    Architecturex86    _
     Availabilitydiagnostic/numatop          _          Interface
     StabilityCommitted
맨 페이지 내용의 저작권은 맨 페이지 작성자에게 있습니다.
RSS ATOM XHTML 1.0 CSS3