-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Full name
Hemang Choudhary
University status
Yes
University name
M.B.M University | IIT Madras
University program
B.E in Information Technology | B.s in Data Science
Expected graduation
2027
Short biography
I am a current undergraduate student who is working towards earning two degrees in Information Technology and Data Science. In my years of learning, I have identified numerical computing, web development, and statistical analysis as my areas of interest. I am most interested in C, C++, JavaScript, Python, and Java programming languages, especially in backend and frontend development using Node.js, React, SQL, MongoDB, Flask, and Pytest.
In my Data Science studies, I have tried statistics and found it quite interesting, especially when it comes to its application in the real world in data analysis and decision making.
In my free time, I am an active member of developer communities. I am a member of AGNU (Alpha Gamma Nu) and DEVx Engineers, where I participate in group projects and group learning. I also head AceX Developers, a community at IIT Madras, where we discuss development techniques and participate in open source projects.
I like working on projects that enhance efficiency and performance and would like to contribute to stdlib by incorporating useful statistical functions that will be useful to developers and researchers.
From my experience with Python’s statistical libraries such as scipy, I have seen the value of well-documented, efficient, and best practice compliant implementations. I am particularly interested in contributing to the stdlib repository since I believe that my experience in programming as well as statistical analysis will enable me to make a real contribution.
Timezone
Indian Standard Time (IST), UTC +5:30
Contact details
email:[email protected], email:[email protected]
Platform
Windows
Editor
I prefer VSCode because of its simplicity, versatility, and strong integration with multiple programming languages. It offers intuitive UI, powerful debugging tools, and built-in Git support, making development seamless.
With extensions for various languages, it allows me to easily switch between different programming environments like JavaScript, Python, C, C++, and SQL without any hassle.
Its lightweight yet powerful nature makes it an ideal choice for both web development and numerical computing tasks. The user-friendly interface and built-in terminal further enhance productivity, making coding smoother and more efficient.
Programming experience
I have extensive programming experience across multiple languages and frameworks, with particular focus on web development, system-level programming, and software engineering.
Languages & Technologies
- Primary Languages: Python, Java, C++, JavaScript
- Web Technologies: HTML/CSS/Tailwind, React, Next.js, DaisyUI
- Other Technologies: SQL, PHP, Linux, GitHub, Selenium, WordPress
Notable Projects
Domino's Website Reimagine
I completed a frontend project where I reimagined the Domino's Pizza website using HTML, CSS, and JavaScript. This project focused on:
- Improving UI/UX consistency across the site
- Implementing responsive design principles
- Enhancing the overall user experience
- Applying modern web design techniques
Project Link: https://dominos-ohoh.netlify.app/
Custom Git Implementation
In December 2024, I engineered a version control system in JavaScript that replicates 90% of core Git functionalities, including:
- Repository initialization and object storage mechanisms
- Commit structuring similar to Git's internal architecture
- Implementation of JavaScript hashing libraries for efficient storage and retrieval
- Integration of compression and encryption techniques using libraries like z-lib
Project Link: github/hemang111/My-Own-git
Custom POSIX Shell
I constructed a custom POSIX shell using C++ that incorporated 90% of standard shell functionalities:
- Engineered modular header file structures for better scalability
- Implemented string streams for robust command parsing
- Created an efficient command handling system
- Adhered to POSIX standards for system-level programming compatibility
Project Link: github/hemang111/My-shell
Note: both hemang111 and coehemang are my profiles. I switched to this profile (coehemang) because of github student pack issue
These projects demonstrate my ability to work across the stack, from frontend interfaces to system-level implementations, while applying software engineering best practices and exploring complex technical concepts.
JavaScript experience
I have extensive experience with JavaScript, having used it in various projects involving web development, backend services, and system programming. I have worked with Node.js, React, MongoDB, SQL, and ORMs, building full-stack applications and implementing features like authentication, real-time updates, and RESTful APIs. Additionally, I have explored Deno for secure and modern JavaScript/TypeScript runtime.
Favorite Feature of JavaScript
One of my favorite features of JavaScript is its asynchronous nature with Promises and async/await. This makes handling concurrent tasks and API calls much smoother, reducing callback hell and improving code readability. It's particularly useful in web applications, where non-blocking operations enhance performance and user experience.
The async/await syntax introduced in ECMAScript 2017 has been a game-changer for writing asynchronous code. For example, instead of chaining multiple .then() calls, I can write code that looks synchronous but executes asynchronously:
async function fetchProducts() {
try {
const response = await fetch("https://api.example.com/products");
if (!response.ok) {
throw new Error(`HTTP error: ${response.status}`);
}
const data = await response.json();
return data;
} catch (error) {
console.error(`Could not get products: ${error}`);
}
}Least Favorite Feature of JavaScript
One of the most frustrating aspects of JavaScript is its type coercion and weak typing system. Implicit type conversion can sometimes lead to unexpected behaviors, such as:
console.log([] + []); // "" (empty string)
console.log([] + {}); // "[object Object]"
console.log(1 == "1"); // true (due to type coercion)These behaviors can introduce subtle bugs that are difficult to debug. For instance, the + operator has dual purposes in JavaScript - it can perform arithmetic addition or string concatenation depending on the operands. This ambiguity can lead to unexpected results:
console.log("5" + 5); // Output: "55" (string concatenation)
console.log(5 + "5"); // Output: "55" (string concatenation)
console.log("10" - 2); // Output: 8 (numeric subtraction)Another example is comparing strings lexicographically rather than numerically:
console.log('23' < '3'); // Output: trueThis returns true because string comparison in JavaScript compares characters one by one, and '2' comes before '3' in lexicographical ordering.
Node.js experience
I have extensive experience using Node.js to build robust backend systems.
In my project My Own Git, I leveraged Node.js to manage file system operations—reading, writing, and tracking changes—which allowed me to implement core version control functionalities effectively and the availability of libraries like zlib and crypto helped me in compression (Zlib functions like deflate and inflate) and encrypting (Cryto library) the data.
The asynchronous, event-driven nature of Node.js enabled smooth handling of file operations and ensured high performance. Additionally, while developing the backend for a course portal, I used Node.js with Express.js to build a secure and scalable API. This backend handled user authentication, and managed dynamic content delivery seamlessly. Overall, Node.js has been instrumental in building high-performance, reliable applications across my projects.
C/Fortran experience
Thanks to my course curriculum, I learned C as my first programming language, which provided me with a solid foundation in programming fundamentals. This early exposure to C helped me gain a deep understanding of:
- Core concepts like data types, loops, and control structures
- Memory management and pointer manipulation
- Low-level system interactions
While I don't have direct experience with Fortran, my strong background in C and other languages gives me confidence in my ability to learn new programming languages quickly. I'm always eager to expand my skill set, and if Fortran knowledge becomes necessary for a project, I'm prepared to invest the time to learn it effectively. However, based on my understanding of the project's primary focus on C and JavaScript, Fortran expertise is likely not a critical requirement for contributing.
Interest in stdlib
I like stdlib because it adds a lot of useful math and stats functions to JavaScript that I've always missed when coming from other languages.
What caught my attention was how it brings some of the functionality I used to rely on Python libraries for, right into the JavaScript ecosystem. The statistical functions are probably what I find most valuable - being able to work with probability distributions and run statistical tests without switching languages is really convenient.
I also appreciate how they've organized everything in a modular way. You can just pull in the specific functions you need rather than importing a massive library. The documentation is pretty solid too, which makes a big difference when you're trying to figure out how to use a new function.
Version control
Yes
Contributions to stdlib
I am newly introduced to the Stdlib repository hence, my PRs are still Open PRs pending for review.
stdlib showcase
A Visualizer build using Stdlib functions for visualization of three of the main trigonometric functions
Goals
My goal is to implement a multivariate normal distribution utility for stdlib-js that supports PDF, log-PDF, CDF, log-CDF, and random sampling majorly in JavaScript and if possible then in C and Fortan 95 . To achieve this, I will integrate LAPACK/BLAS routines for critical matrix operations like Cholesky decomposition and linear system solving. This project will provide JavaScript developers with a fast, reliable tool for multivariate statistics while maintaining simplicity and ease of use.
Goals
-
Core Features:
- Implement PDF/log-PDF for density evaluation.
- Develop CDF/log-CDF using numerical integration (low dimensions) and Monte Carlo methods (high dimensions).
-
LAPACK/BLAS Integration:
Use optimized routines for matrix operations to ensure speed and accuracy. -
Accessibility:
Provide clear documentation and examples for users unfamiliar with advanced linear algebra.
Key LAPACK/BLAS Routines
| Routine | Purpose | Dependencies |
|---|---|---|
dpotrf |
Cholesky decomposition (covariance matrix validation) | dgemm, dtrsm, dsyrk |
dpotrs |
Solve linear systems (quadratic term in PDF) | dtrsm |
dgemm |
Matrix multiplication (sampling and CDF) | None (core BLAS) |
dgetrf/dgetri |
LU decomposition/inversion (fallback for invalid matrices) | dgemm, dtrsm |
Dependency Diagram
Fulfilled Dependency: dgemm (@stdlib/blas/base), xerbla (level 2 blas error handler) (@stdlib/blas/base)
graph TD
A[Multivariate Normal] --> B[dpotrf]
A --> C[dpotrs]
B --> E[dgemm]
B --> F[dtrsm]
B --> G[dsyrk]
F --> C
Implementation Plan
1. Core Features
-
PDF/Log-PDF
- Use
dpotrfto decompose the covariance matrix (Σ) and compute determinants.
- Use
-
CDF/Log-CDF
- For low dimensions (≤3): Use Genz’s algorithm (numerical integration).
- For high dimensions: Use Monte Carlo sampling with
dgemmfor matrix operations.
2. LAPACK/BLAS Workflow
- Step 1: Validate Σ with
dpotrf. - Step 2: Compute PDF using
dpotrsfor linear solves. - And more...
3. Error Handling
- Detect non-positive-definite matrices via
dpotrferror codes. - Provide user-friendly error messages (e.g., “Covariance matrix is invalid. Try adding a small diagonal matrix to stabilize it.”).
Why LAPACK/BLAS?
- Speed: BLAS Level 3 routines (
dgemm,dtrsm) are optimized for large matrices and run 100x faster than naive JavaScript code. - Accuracy: LAPACK’s
dpotrfensures numerical stability when decomposing matrices. - Reusability: These routines are industry-standard and battle-tested in libraries like NumPy and R.
Testing & Validation
- Accuracy Tests:
- Compare PDF/CDF results with SciPy (Python) and
mvtnorm(R).
- Compare PDF/CDF results with SciPy (Python) and
- Performance Benchmarks:
- Measure time for 100x100 matrix operations against pure JavaScript.
- Edge Cases:
- Test degenerate matrices (e.g., zero variance in one dimension).
Documentation
- API Docs: Explain inputs/outputs for all functions.
/** * Computes the multivariate normal PDF. * @param {Array} x - Input vector * @param {Array} mu - Mean vector * @param {Array} sigma - Covariance matrix * @returns {number} PDF value */ function multivariateNormalPDF(x, mu, sigma) { ... }
Why this project?
This project excites me because it combines my interests in programming, mathematics, and open-source contributions. Implementing the multivariate normal distribution in stdlib-js will help bridge the gap between JavaScript and scientific computing, enabling developers to perform advanced statistical tasks directly in JavaScript/C without relying on external tools like Python or R.
I am particularly excited about learning and applying numerical computing techniques, such as Cholesky decomposition and matrix operations, while working with industry-standard libraries like LAPACK and BLAS. The opportunity to contribute to an open-source project that simplifies complex statistical operations for developers motivates me to deliver high-quality work that has a lasting impact.
This project aligns perfectly with my academic background and career aspirations, giving me a chance to grow as a developer while creating something meaningful for the community.
Qualifications
Academic Background
I have completed two foundational courses at IITM BS that are highly relevant to this project:
-
BSMA1002: Basic Statistics and Probability
- Covered probability distributions, including the normal distribution, and statistical inference.
- Learned techniques for parameter estimation and validation, which are directly applicable to the multivariate normal distribution.
-
BSMA1004: Applied Linear Algebra
- Studied key matrix operations like decomposition, inversion, and determinants.
- Gained a strong understanding of numerical methods for solving linear systems, critical for implementing Cholesky decomposition and other matrix operations required for this project.
Why I Am Suited for This Project
-
Strong Mathematical Foundation:
My coursework in statistics and linear algebra has equipped me with the theoretical knowledge required to understand and implement the multivariate normal distribution. -
Focused Learning:
I have studied LAPACK/BLAS routines such asdpotrf(Cholesky decomposition),dpotrs(linear system solving), anddgemm(matrix multiplication) to prepare for this project. -
Alignment with Goals:
This project aligns perfectly with my academic background and interests in applying mathematics to real-world programming problems.
I am confident that my academic training and technical skills will enable me to successfully implement this project while learning new concepts along the way.
Prior art
The multivariate normal distribution has been widely implemented in various programming languages and libraries, each leveraging efficient numerical methods for operations like sampling, PDF/CDF evaluation, and matrix computations. Below is a summary of how others have achieved the goals of this project:
1. Python (NumPy and SciPy)
- Implementation:
- NumPy provides tools for matrix operations (e.g.,
numpy.linalg.cholesky) and random sampling (numpy.random.multivariate_normal). - SciPy extends this with functions for PDF evaluation (
scipy.stats.multivariate_normal.pdf) and CDF computation using numerical approximations.
- NumPy provides tools for matrix operations (e.g.,
- Key Techniques:
- Cholesky decomposition is used for covariance matrix validation and sampling.
- Numerical integration or Monte Carlo methods are used for CDF evaluation.
2. R (mvtnorm Package)
- Implementation:
- The
mvtnormpackage provides functions likedmvnorm(PDF),pmvnorm(CDF), andrmvnorm(sampling). - It uses FORTRAN-based algorithms, including Alan Genz’s method for CDF computation.
- The
- Key Techniques:
- Cholesky decomposition for efficient sampling.
- Numerical integration for low-dimensional CDFs and quasi-Monte Carlo methods for higher dimensions.
3. MATLAB (Statistics Toolbox)
- Implementation:
- Functions like
mvnpdf,mvnrnd, andmvncdfhandle PDF, random sampling, and CDF computations, respectively.
- Functions like
- Key Techniques:
- Matrix operations are performed using MATLAB’s highly optimized linear algebra backend.
- The CDF is computed using numerical integration or Monte Carlo methods.
4. C++ (NORMAL_DATASET Program)
- Implementation:
- The
NORMAL_DATASETprogram generates random samples from a multivariate normal distribution using Cholesky decomposition. - Available in C++, FORTRAN90, and MATLAB versions.
- The
- Key Techniques:
- Efficient use of Cholesky decomposition to transform standard normal samples into multivariate samples.
- Focused on generating large datasets efficiently.
5. IMSL Library (C)
- Implementation:
- Provides functions like
imsls_d_multivariate_normal_cdfto evaluate the multivariate normal CDF. - Supports high-dimensional distributions with efficient numerical algorithms.
- Provides functions like
- Key Techniques:
- Numerical integration methods for CDF evaluation.
- Optimized matrix operations for performance.
**Challenges that I might face **
Some of the challenges that I think I might face could be deciding whether or not to create a separate utility for a function which can be implemented directly in the main MVN or other dependency implementations. For the same, I have been exploring packages like dgemm in stdlib where the utility of isame is not utilized and is implemented directly. Secondly, handling the MVN Computation can be bit tricky due to a lot of edge cases but for that I have a lot of reference material to look at from Scipy to R's MVTNORM and the netlib itself. Also, I am exploring options like modified Cholesky decompositions or Single valued decomposition in cases where Cholesky can fail, mainly when Cov Matrix is near singular, or diagonal elements are very small that can result in a rounding error.
Key Takeaways for My Project
- Most implementations rely on Cholesky decomposition (
dpotrf) for covariance matrix validation, determinant computation, and sampling. - Numerical integration (Genz’s algorithm) or Monte Carlo methods are used for CDF evaluation in higher dimensions.
- Libraries like R’s
mvtnormand SciPy leverage highly optimized FORTRAN/C routines, which I plan to replicate using LAPACK/BLAS in JavaScript.
By studying these implementations, I aim to bring similar functionality to JavaScript through stdlib-js, ensuring accuracy, performance, and ease of use while leveraging LAPACK/BLAS routines for numerical efficiency.
Commitment
I am fully committed to dedicating the necessary time and effort to successfully complete this project. I plan to invest 25 hours per week during the Google Summer of Code program. This will allow me to make steady progress on the project while ensuring high-quality implementation and thorough testing.
During GSoC
- I will dedicate 25 hours per week to coding, testing, and documenting the project.
- My schedule is flexible, and I will adjust my time allocation as needed to meet milestones and deadlines.
After GSoC
- I plan to remain involved in maintaining and improving the multivariate normal distribution utility.
- I am committed to addressing feedback from users and contributors, adding enhancements, and helping with related features in
stdlib-js.
Other Commitments
I do not any other major commitments during the GSoC period.
With 25 hours per week dedicated to this project, along with my enthusiasm for learning and contributing, I am confident in my ability to deliver a successful implementation within the program timeline.
Schedule
Community Bonding Period
Week 1:
- Familiarize yourself with the
stdlib-jscodebase, contribution guidelines, and repository structure. - Study existing statistical utilities in
stdlib-jsand explore how they are implemented. - Research LAPACK/BLAS routines (
dpotrf,dpotrs,dgemm, etc.) and their JavaScript and c bindings.
Week 2:
- Engage with the community through discussions, meetings, and forums to understand expectations and gather feedback on the project proposal.
- Finalize the design and architecture of the multivariate normal distribution utility.
- Set up the development environment and identify dependencies.
Week 3:
- Prepare a detailed technical document outlining implementation strategies for:
- Probability Density Function (PDF)
- Cumulative Distribution Function (CDF)
- Random sampling
- Matrix operations
- Create a detailed timeline for deliverables during the coding phase.
12-Week GSoC Coding Phase Schedule
Week 1
- Implement basic matrix operations required for the project, such as Cholesky decomposition using
dpotrf. - Write unit tests to validate these operations.
Week 2
- Develop the core functionality for computing the Probability Density Function (PDF) and Log-PDF.
- Integrate
dpotrsfor solving linear systems as part of PDF computation. - Test edge cases and validate results against reference implementations (e.g., SciPy, R).
Week 3
- Begin implementing Cumulative Distribution Function (CDF) and Log-CDF using numerical integration techniques.
- Optimize performance by leveraging
dgemmfor matrix multiplications.
Week 4
- Finalize and test PDF/Log-PDF functionality.
- Begin working on error handling for invalid inputs (e.g., singular covariance matrices).
Week 5
- Complete implementation of CDF/Log-CDF, including Monte Carlo methods for higher dimensions.
- Validate results for accuracy using statistical test cases.
Week 6 (Midterm Evaluation)
- Submit progress report summarizing completed work (PDF, Log-PDF, CDF/Log-CDF implementations).
- Demonstrate test results and benchmarks to mentors.
- Address feedback from mentors and community to refine implementation.
Week 7
- Implement random sampling from a multivariate normal distribution using Cholesky decomposition.
- Write tests to ensure randomness properties and consistency with theoretical expectations.
Week 8
- Focus on error handling and edge case scenarios, such as singular covariance matrices or invalid inputs.
- Add detailed logging and debugging support for easier troubleshooting.
Week 9
- Optimize all implemented functions for performance, focusing on LAPACK/BLAS routine efficiency.
Week 10
- Benchmark performance against existing libraries like SciPy or R's
mvtnorm.
Week 11
- Write comprehensive documentation for all implemented functions, including usage examples.
Week 12
- Conduct final testing to ensure robustness, accuracy, and performance.
- Submit final code, documentation, and a detailed report summarizing the work completed during GSoC.
Notes:
- The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
- Usually, even week 1 deliverables include some code.
- By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
- By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
- During the final week, you'll be submitting your project.
Related issues
related to issue #11
Checklist
- I have read and understood the Code of Conduct.
- I have read and understood the application materials found in this repository.
- I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
- I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
- I have read and understood the stdlib showcase requirement which is necessary for my application to be considered for acceptance.
- The issue name begins with
[RFC]:and succinctly describes your proposal. - I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.