Skip to content

GSoC 2025 ‐ Aayush Khanna

Aayush Khanna edited this page Aug 26, 2025 · 38 revisions

About me

Hey there! I'm Aayush Khanna from Noida, Uttar Pradesh, India. I am a third year undergrad pursuing civil engineering at the Indian institute of Technology (BHU), Varanasi. I am interested in all things related to tech in general! more recently, I've been trying to learn how interpreters work. I am also a huge football enthusiast :)

Project overview

My project aims to advance the state of LAPACK routines in stdlib, by extending conventional LAPACK APIs which ensures easy compatibility with stdlib ndarrays and adding support for both row-major (C-style) and column-major (Fortran-style) storage layouts. The project covers both lower level helper routines and higher level user facing routines. To further optimize these routines, techniques such as loop reordering and loop tiling were be used. A lot of time was spent on benchmarking and testing of these routines against the actual LAPACK implementations as well! The initial goal was to cover all LAPACK routines up to dgeev but we didn't quite make it there unfortunately.

Project recap

I started off by parsing the LAPACK source code into a directed graph where the nodes represent a LAPACK routine and the edges represent dependencies between these routines. To pick out which routines to work on first, I performed a topological sorting and started working on the ones with no dependencies. I've documented the process in this repository!

After that I started implementing these routines one by one, my workflow consisted of writing a base API that took strides and offsets as input parameters, this was to be kept private and not exported. Then I would make the ndarray wrappers over that and another API that was consistent with the LAPACK function signature.

The testing and benchmarking of these routines was a very important part of the project, for testing I would compare the outputs of our implementation and the actual LAPACK routine in various different cases by storing them in a JSON format and using tape to write the tests for it. This process soon became very tedious so I ended up writing a script to auto-generate the ndarray test fixtures for a routine given the standard inputs. This script can be found here. This saved us a lot of time and helped us to move faster.

The existing reference LAPACK implementation is Fortran-based and hence follows a column-major layout by default. However, in JavaScript, we can provide the user with the freedom to choose whether they want to pass the matrix in a row-major or column-major order. This flexibility is important since matrices are represented as arrays in JavaScript, ensuring contiguous memory allocation.

We did this by representing a matrix in linear memory using strides and offsets. For example:

  A   = [ 1, 2, 3 ]
        [ 4, 5, 6 ]
        [ 7, 8, 9 ] (3X3)
A_row = [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ] (row-major)
A_col = [ 1, 4, 7, 2, 5, 8, 3, 6, 9 ] (column-major)

here, we would define two strides to iterate over the two dimensions of the matrix, strideA1 and strideA2. For row-major matrices strideA1 would be greater than strideA2 and the opposite otherwise. Note that swapping the strides and the dimensions of a matrix also gives us it's transpose. This allows for various optimizations which I'll talk about later.

To ensure consistency with the LAPACK function signatures, we have had to use a single element typed array to pass elements by value where needed, for example in dlacn2 the LAPACK API is:

SUBROUTINE DLACN2( N, V, X, ISGN, EST, KASE, ISAVE )

where KASE is an integer value that changes repeatedly between multiple function calls to dlacn2. Hence, to pass the variables by reference we used a single-element typed array, that would make out JavaScript API to be:

function dlacn2( N, V, strideV, offsetV, X, strideX, offsetX, ISGN, strideISGN, offsetISGN, EST, offsetEST, KASE, offsetKASE, ISAVE, strideISAVE, offsetISAVE )

and KASE[ offsetKASE ] represents the element that we're passing by reference.

Completed work

LAPACK routines

Cleanup and maintainance

Current state

I'm still working on dlaqr5 and dorm2r but other than that, the JavaScript implementations of all routines that are listed above are complete.

What remains

There are around 1700 LAPACK routines in all, while we obviously cannot cover them all in a 12 week project, my work provides a proof of concept for various types of LAPACK routines.

A lot of LAPACK routines are yet to be implemented, the correct order to implement them can be seen by the topological sort that I performed.

Apart from that I would love to work on adding the C and Fortran implementations of these routines and high level ndarray wrappers to make these routines production ready!

Challenges and lessons learned

The biggest challenge that I faced was trying to keep up with the weekly targets that I had set in the proposal. I massively underestimated the time it takes to implement a LAPACK routine as per stdlib standards and while I communicated this to my mentor early on, I will continue chipping away at these routines one by one after the GSoC period as well!

Dealing with the intricacies of row-major and column-major storage layout is also something that I had to get used to. It took me a while to build the intuition to spot the loops which can be optimised further or where a matrix can be transposed and passed to another routine etc.

The biggest lesson that I've learnt by this project is to be patient while solving a problem. A lot of times while working on complicated routines like dlarfb or dlatrs I felt the urge to give up but just then something would click. Working through these routines has really helped me step out of my comfort zone and helped me go deeper into the technical weeds of numerical computing! Which is why I chose this project in the first place.

I could have done a better job in estimating the timeline for the project as well but I was not as familiar with implementing LAPACK routines as I am now which is why I ended up with a very unsustainable weekly plan.

Conclusion

This was a very fruitful project overall which has taught me a lot about various optimizations and was a very nice introduction to high performance numerical computing. Other than that, it's also helped me elevate my problem solving ability and helped me step outside my comfort zone. I would like to thank the Org admins Athan Reines and Phillip Buckhardt for the opportunity, guidance and support throughout this journey. I would also like to thank Karan Anand, Gunj Joshi, Gururaj Gurram and Shabareesh Shetty for their support as well. It's been almost a year since I opened my first PR to stdlib and this is a wonderful reminder of how far I've come and yet I've only scratched the surface. stdlib is a wonderful project that I hold very close to my heart, the people that I've met while contributing here are wonderful and I would definitely be active in the community after the GSoC period as well.

Clone this wiki locally