Work and Note

Linux Us

Follow me on GitHub

basics

  • errno:
    • thread local
    • never cleared until actual error

system limits

<unistd.h> // compile time
sysconf()
pathconf() fpathconf()

file

  • execution bit:
    • for file, become executable
    • for directory, you can access its content (read allow you to ls)
    • chmod +X: only apply x for folder
  • hard link: create a file with the same inode number
    • inode is unique on one device. Thus, hard link does not work across drives.
    • also, reference counting is necessary
  • symbol link: create a symbolic file that points to another file
    • ln -s, -r, --relative
  • bind mount: make a folder available at another position
    • there are possible drawbacks when having the same folder appear twice in the system
    • rbind: bind recursively
  • loop mount: make a file accessible as a block device
    • if the file contains an entire file system, it can then be mounted as a disk device

uid

  • RUID (real)
    • actual owner of a process
  • EUID (effective)
    • authority of a process is determined according to EUID
  • SUID (set)
    • When a command with SUID bit is run, its EUID become that of the owner of the file, rather than that of the user who is running it.
    • rws for SUID + x, rwS for SUID only
  • chmod +s
    • for file: apply SUID
    • for directory: the file created in this directory will have same group as this directory, instead of have user’s group
  • access check if RUID has capability
    • by default, will follow symbolic link
  • suid will not work for things which need a interpreter ex. script
  • ltrace does not work because of security

sticky bit

  • used for shared directories
  • can create, read and execute files owned by other users
  • but cannot remove files owned by other users
  • chmod +t

acl

# change default acl
setfacl -d -m g:basic:rwx .
# remove entry from acl
setfacl -x
# remove acl
setfacl -bn

attribute

chattr

  • A: do not update atime record
  • S: changes are synchronously write to disk
  • a: only open in append mode
  • i: cannot be modified (no renaming, no symbolic link, no execution)
  • j: write journal before content
  • t: no tail-merging (multiple tail block can be merged to reduce disk usage)

Reference

fd

  • file status flags:
    • O_SYNC:
      • when fd close fail, data is not guaranteed to be written
    • O_APPEND:
      • cannot lseek -> gurantee that write is sequential
  • file descriptor flags: fnctl
    • FD_CLOEXEC: if set, fd is closed when exec()
      • prevent fd leak to child process
  • shared access:
    • fd can point to same file -> dup()
    • file can be open multiple times
    • lseek() + read() / write is not atomic
    • pread(), pwrite()
  • umask: set file permission for various syscall
    • mode & ~umask
    • can be read from /proc/pid/status and /proc/self/status
  • additional action
    • fnctl, ioctl
  • stdio can be treated the same way as other fd
    • we can pass /proc/self/ /dev/fd/
    • some program use - to differentiate read from stdin
      • such as cat

io mode

  • blocking
  • non-blocking: O_NONBLOCK
    • polling mode
    • partial success
    • EAGAIN
  • multiplexing
    • select: FDSET
    • poll: fd array
    • epoll: change fds while waiting
  • signal based io: SIGIO
    • check if able to read / write, do it after signal
  • async: aio_
    • read / write first, check result later
    • EINPROGRESS

stdio

  • input / output stream
  • buffer
    • line: flush when receiving line end
      • associated with terminal
    • full: flush when buffer is full
      • associated with disk file
      • pipe
    • none
  • buffer can cause problem
    • leftover buffer after fork
    • unexpected buffer where no one flushes

process

the kernel starts as a dummy process 0 the first process to run is init as process 1 SysV systemd

  • fork: duplicate the current process context
    • will copy the address space
  • exec: replace current context with new process
  • wait: wait for process return
  • clone: similar to fork, but share the address space

subprocess

  • if a parent does not read its child process exit state (wait),
    • the child become zombie after it finished execution
    • if we do not want zombie
      • we can fork twice so that child is orphaned instead.
      • if we ignore child state change signal, the terminal state will not be saved
        • SIG_IGN SIGCLD
    • ps aux | grep Z
  • if a parent exit before its child process,
    • the child is transfered to init (orphaned)
    • init will clean up periodically
    • orphaned process get SIGHUP and SIGCONT
  • daemon process
    • let child process become its own session

session

  • parent process id ppid
  • process group pgid
    • getpgid() / setpgid()
    • pipelined process can be placed into one group
  • session sid
    • setsid() create new session
      • the process become the session leader
      • network hang up is sent to session leader
    • a session can have one controlling terminal
  • controlling terminal: some tty device
    • /dev/tty: current tty device
    • foreground / background process group
      • terminal send signal to the foreground pg tpgid
      • tcgetpgrp / tcsetpgrp
      • bg can talk to /dev/tty
  • job control
    • when reading in background, process get SIGTTIN
      • if orphaned, read gets EIO
    • when writing in background, process get SIGTTOU

signal

signal, pause, setjmp, longjmp

  • signal is reset to default after trigger
  • signal cannot be turned off
  • system call will be interrupted
    • might auto restart
  • reentrant function: some func cannot be used because their use of global struct

sigaction, sigsuspend, sigsetjmp, siglongjmp,

  • sigaction does not reset
  • sigaction allow mask
    • automatically masked unless SA_NODEFER
    • mask is inherit through fork so we need to make sure that we only attach handler when necessary
  • sa_flags:
    • SA_RESTART, SA_INTERRUPT: default to restart
    • SA_SIGINFO: more info for signal (also, enable signal queue)
  • sigsuspend is atomic:
    • race between signal set and pause
  • jmp needs to make sure that mask is cleared

signal note

  • async signal safe != thread safe
    • thread is in band, thread safe can be achieved with mutex
    • signal is out of band, reentrant cannot have broken global state
    • handle signal in special thread
  • when ignoring SIGCLD / SIGCHLD, will not generate zombie
    • SIGCLD:
      • if child can be waited, signal is triggered immediately
        • only set signal after wait
    • SIGCHLD:
      • generate when child state actually changes?
  • SIGHUP: does not make sense for daemon process
    • so they are usually used to reload config

pthread

pthread_atfork()

strerror_r()

ipc

memory mapped io

  • mmap()
  • shm_open()
    • create fd which is not back by actual file on disk
    • on linux, equivalent to create file in /dev/shm
  • memfd_create()
    • similar to shm_open, but the name does not refer to actual file

hugetlbfs https://github.com/libhugetlbfs/libhugetlbfs

https://linux.die.net/man/2/get_mempolicy

https://www.kernel.org/doc/Documentation/vm/numa_memory_policy.txt

LD_DEBUG

seccomp

limit syscall

seccomp_init
seccomp_rule_add
seccomp_load

syscall tables - strace

https://fedora.juszkiewicz.com.pl/syscalls.html https://github.com/hrw/syscalls-table

reference

APUE

https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md

https://rigtorp.se/low-latency-guide/

ipc

https://blog.cloudflare.com/scalable-machine-learning-at-cloudflare/

https://github.com/cloudflare/mmap-sync

https://github.com/goldsborough/ipc-bench