this is a space to document and distill my thoughts on publications and articles within the research community. literature reviews are personal and messy; blog post are intended for a wider audience (marked with a dot).
i write for my own edification; and posts are subject to my own interest for the topic at the time of writing.
September 28, 2024
research - abliteration; probing; what makes a model good? - entropix code - more open source contributions. but what projects? i want to write c++, torch, and jax. - cuda; gpu programming. - jax - implement deepseek mla. why hasn't this caught on yet? - evaluate contextual retrieval on mindex. use local oss model for context. prompt caching on local models? - colpali to handle pdfs in mindex. life - go on a ski trip...shard_map
. xmap
and pmap
which i've used extensively are deprecated, let's learn modern spmd
September 17, 2024
I spent the last few days trying to reverse engineer the process behind o1 and I'd like to share what I think o1 is, and why it's important for the future of LLMs. I feel pretty confident to say that the people who don't understand o1 are chucking this model up to glorified CoT while the people I respect realize this as a paradigm shift. It all goes back to what Karpathy talks about in this lengthy tweet: > "RLHF is that it is just barely RL, in a...
August 29, 2024
i read two posts [1, 2] which led to some notes, which led to this post. --- Most people are resource-constrained, and even those who aren't still need to utilize their resources effectively. Optimizing deep learning systems is therefore crucial, especially as models grow larger and more complex. To do this effectively, we need to understand the kinds of constraints that our system may suffer under, and how these constraints interact with different aspects of model architecture and training pipelines. Typically, the accelerator of choice here is a GPU, and...
July 24, 2024
Richard Sutton's The Bitter Lesson is a fantastic piece that I highly recommend you read. He posits that progress in AI research over the past 70 years fundamentally boils down to two principles: - Develop general methods, discarding assumptions and attempts at modeling intelligence. - Leverage computation by scaling data and compute resources. This approach has proven successful across prominent ML fields, including computer vision, reinforcement learning, and speech recognition. The latest example is the astounding progress in NLP. As available compute increases at an extraordinary scale, leveraging it consistently...