Intro to 경량화

Intro to 경량화

모델 최적화 2021. 11. 22. 15:38
1. Efficient architecture design; AutoML, Neural Architecture Search

사람의 직관을 상회하는 성능의 모듈들을 찾아낼 수 있다.

2. Network Pruning; 찾은 모델 줄이기

중요도가 낮은 파라미터를 제거

좋은 중요드를 정의, 찾는 것이 주요 연구 토픽 중 하나 (L2 norm이 크면, loss gradient 크면)

structured/unstructured pruning으로 나뉨

Structured Pruning: 파라미터를 그룹 단위(channel, filter, layer 등)로 pruning 하는 기법으로 Dense computation에 최적화됨

Unstructured Pruning: 파라미터를 독립적으로 pruning 하는 기법으로, 네트워크 내부의 행렬이 점차 sparse 해진다. sparse computatoin에 최적화된 소프트웨어 또는 하드웨어에 적합.

3. Knowledge distillation

학습된 큰 네트워크를 작은 네트워크의 학습 보조로 사용하는 방법

soft targets에는 ground truth 보다 더 많은 정보를 담고 있음

student network와 gt label의 cross entropy & teacher network와 student network의 Inference 결과에 대한 KLD loss로 구성

4. Matrix/Tensor decomposition

하나의 Tensor를 작은 Tensor들의 operation들의 조합(합, 곱)으로 표현하는 것

CP-decomposition: rank 1 vector들의 outer product의 합으로 tensor를 approximation

5. Network Quantization

일반적인 float32 데이터타입의 network의 연산과정을 그보다 작은 크기의 데이터타입으로 변환하여 연산을 수행

6. Network Compiling

학습이 완료된 network를 deploy 하려는 target hardware에서 inference가 가능하도록 compile 하는 것

속도에 가장 큰 영향을 미치는 기법

TensorRT(NVIDIA), Tflite(Tensorflow), TVM(apache)

각 compile library 마다 성능 차이 발생 → compile 과정에서, layer fusion등의 최적화가 수행됨
댓글

ABOUT ME

꾸준히 꾸준히

티스토리툴바