Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Matrix Orthogonalization Improves Memory in Recurrent Models (ayushtambde.com)

70 points by at2005 11 hours ago | 19 comments

hasley 22 minutes ago [-]

I suspect with "orthogonalization" they mean to find vectors that form an orthogonal bases (same subspace) for the vectors in the source matrix.

I wonder what would be the result if they used a matrix that is orthogonal and closest to the source matrix. Usually one uses the Frobenius norm (root of the sum of all squared matrix entries). Maybe, one could even try another norm that gives a sparser matrix.

imurray 2 hours ago [-]

Here is a pytorch optimizer that can maintain a matrix as orthogonal throughout optimization:

https://github.com/adrianjav/pogo — POGO: A Proximal One-step Geometric Orthoptimizer

https://arxiv.org/abs/2602.14656 — An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale; Adrián Javaloy, Antonio Vergari

BirbSingularity 9 hours ago [-]

I can't help but think of orthogonal frequency-division multiplexing and it's use in encoding data on multiple carrier frequencies, and it makes me wonder what other parallels we will discover between digital transmission technology for cross-domain stuff like this.

dapperdrake 9 hours ago [-]

Not even cross-domain. (Nor cross-co-domain.)

Trigonometric polynomials are also polynomials. And linear spaces are all "the same". That is what the definition is for. Even the transpose-mapping is linear.

hgoel 1 hours ago [-]

I feel like this is an inverted interpretation? Transmission tech uses those methods because the math shows the desired properties.

Linear algebra is used everywhere, orthogonalization, SVD, eigenvalues etc are valuable because the resulting properties are very useful in many places.

BirbSingularity 1 hours ago [-]

Yea, I could have used a better word choice. I was thinking about the domains here in the generalized sense such as signal processing and wireless communication being applicable to the domain of artificial intelligence. In reality, you are correct that it's all tied together under of domain of applied maths or computer science.

chimpanzee2 5 hours ago [-]

I have this strange sensation that I can't put into words that somehow we are on the brink of unveiling an entirely new paradigm of AIs or perhaps even of combining AI with classical algorithms in a way to rapidly iterate between each other (and sensor data) that will instantly 10x or 100x current capabilities.

Anyone else feel this?

digdugdirk 4 hours ago [-]

I think part of it is the feeling of false understanding that comes from using llms regularly. They let you operate at a higher conceptual level, and they paper over enough of the actual details that your conceptual model might not actually be correct.

I'm a mechanical engineer by training, and have similar vibes with the similarities I see between llm training and metallurgy. I could probably put together a formal concept for these vibes at this point, but is there actually a "there" there? I have no idea. And it would take me years to actually dive in and learn everything to gain the deep understanding that would be required to know if I'm just experiencing my own brand of AI psychosis or not.

It's a brave new world, that's for sure.

seanhunter 3 hours ago [-]

Andrej Karpathy said something along the lines of “while you can use llms to outsource some of your thinking, you can’t use them to outsource your understanding “.

cyanydeez 4 hours ago [-]

no. we're approach a sigmoid. AI is bloated carcass and we're tweaking out the size of the models and speed they'll run on smaller hardware.

I think to feel what you're feeling, you've bought into "all we need is more context". I think evolution demonstrates that's not really true.

geysersam 39 minutes ago [-]

They said "there are algorithmic changes that remains to be discovered" and you said they bought into the idea that "all we need is more context". Seems like opposites to me.

chimpanzee2 4 hours ago [-]

would you really bet that this is it? there is nothing beyond this?

reminds me of the famous anecdote of a 19th century physics professor who said "there is nothing left to be discovered in physics, only minor corrections"

then came Einstein...

seanhunter 2 hours ago [-]

That wasn’t just a physics professor that was William Thompson aka Lord Kelvin (the dude the temperature unit is named after and one of the most important mathematical physicists of the 19th century [1]), who also said that heavier than air flight was physically impossible only a couple of weeks before the Wright Brothers (and presumably in spite of having at least once in his lifetime seen a bird). Proof that you can be both very smart and simultaneously a bit of a jackass.

[1] https://en.wikipedia.org/wiki/Lord_Kelvin

cyanydeez 49 minutes ago [-]

I love these arguments "You know, we thought we couldn't cross the ocean, and now we did!"

This means we can just jump over to mars, then explore other planets, etc, etc.

We know tons of regimes where there is non-continuous progress. Finding a smart dude with an anecdote does not invalidate the breadth and width of all human experience with non-continuous systems.

Some dude thought all fluid was newtonian, and then we discovered non-newtonian fluid. It does exactly what yuou don't expect. Which basically demos physics is complex but that still doesn't mean progress is fluid.

cyanydeez 52 minutes ago [-]

see, I don't need to "bet this"; the inverse is true: the people placing large bets are either going to get their AGI, or fail miserably.

I don't need to bet anything. I'm not a sociopath who thinks the AI god needs to be built, appeased, etc. That's the torment nexus.

So, it's pretty easy to see realistically if you are satisified with local models and how they affect what you actually do.

I can see the POV of a software engineer that isn't specialized to any specific topic being replaced by various models.

But again, I see the sigmoid, not the "AGI" or the "this baby has grow very big in 1 year, urely it'll become a giant in 5.

phkahler 3 hours ago [-]

If it can be made orthogonal, can you go a step further and diagonalize it? The storage and performance improvement from that would be huge.

bee_rider 1 hours ago [-]

I don’t know AI, but, weight matrices aren’t square in general, right? My first guess for something like this would be to take the SVD instead, since you can always do that, but I’m sure that’s been tried already.

harveyrook 3 hours ago [-]

Now I’m wondering what is the eigenspace of an LLM? If I take a set of LLM’s with the same number of parameters, then what are the eigenvectors? Do they have different personalities?

bee_rider 1 hours ago [-]

Neural networks are non-linear, so I think you wouldn’t be able to compute typical eigenvalues. You could compute the eigenvalues and/or singular of the individual weight matrices (I’m sure this has been studied). SVDs are very conventional for making low-rank approximations, so it must have been studied.

The concept of nonlinear eigenvalues exists, but it is a bit more exotic.

mv_d5339e31 7 hours ago [-]

[dead]

Rendered at 15:46:17 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.