Nn Note

rl

state, action, policy, reward (short term), value / return (long term)

dqn

This post shows that dqn works by estimating a Q-network. Which predicts the maximum expected discounted future return for taking any action a in state s. When deployed, q-network will provide optimal action at each step.

policy gradient

This post shows that Policy Gradients works by leaving out the reward. During inference, the action can be randomly determined. The correct return is fed to the network after it can be determined.

policy gradient requires a positive feedback. Which might be hard to achieve.

implications

notice that the reward does not have to be differentiable. So, we could use rl to help train network which has non-differentiable components. ex. perform the action randomly and reward the ones with good outcome.

Gradient Estimation Using Stochastic Computation Graphs Reinforcement Learning Neural Turing Machines - Revised

fusion

bn with conv

\[ \begin{gather} s = \frac{s}{\sqrt{var + eps}}
W_c = W_c * s
B_c = (B_c - m) * s + B_{bn} \end{gather} \]

bn known problem:
- some channel might have value around zero
- gamma term might be negative

https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L2041 https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Normalization.cpp

imagenet preparation

Source

mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..

mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash

A copy of the script is kept here

Notice that there are some problematic data in imagenet.

Reference

moving average

simple moving average
- \( m = \frac{1}{n} \sum_{i=0}^{n-1} p_{-i} \)
cumulative moving average
- \( m_{n+1} = \frac{x_{n+1} + n m_n}{n+1} = m_n + \frac{x_{n+1} - m_n}{n+1} \)
exponential moving average
- \( m_{n+1} = m_n + \alpha (x_{n+1} - m_n) \)

detection

box size + position vs ground truth box: box loss
box confidence score vs ground truth iou: obj loss
classification loss

nms

many proposals can be made from the network
nms: remove proposal that duplicate with others
- accept proposal with the highest confidence
- a proposal will be rejected if it has a high iou with accepted proposals
- repeat until no more proposal
soft-nms: instead of removing box completely, reduce it by the amount of iou

todo section

genetic

https://towardsdatascience.com/reinforcement-learning-without-gradients-evolving-agents-using-genetic-algorithms-8685817d84f

parameter initialization

asymmetry in kernel? http://cs231n.github.io/neural-networks-2/#init

batch size vs number of iteration

small batch -> small steps in convergence https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

regularizer

https://stackoverflow.com/questions/46615623/do-we-need-to-add-the-regularization-loss-into-the-tot

adam

https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/