Technical Writings and Posts
Collection of all my blog articles related to computer science
Reward Hacking in Reinforcement Learning
Reward shaping in RL is challenging. Reward hacking occurs when an RL agent exploits flaws or ambiguities in the reward function to obtain high rewards without genuinely learning the intended behaviors or completing the task as designed. In recent years, several related concepts have been proposed, all referring to some form of reward hacking
Read article →Constructing and Optimizing a C Compiler in Rust
Building a compiler in Rust involves parsing source code into an AST, performing semantic analysis, and generating optimized machine code through techniques like constant folding, dead code elimination, and loop unrolling. The optimization phase transforms the intermediate representation to produce faster executables while preserving program behavior
Read article →How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
In this post, I’ll iteratively optimize an implementation of matrix multiplication written in CUDA. My goal is not to build a cuBLAS replacement, but to deeply understand the most important performance characteristics of the GPUs that are used for modern deep learning. This includes coalescing global memory accesses, shared memory caching and occupancy optimizations, among others.
Read article →Optimizing Neural Network Performance
Numpy can multiply two 1024x1024 matrices on a 4-core Intel CPU in ~8ms. This is incredibly fast, considering this boils down to 18 FLOPs / core / cycle, with a cycle taking a third of a nanosecond. Numpy does this using a highly optimized BLAS implementation.
Read article →