Skip to content

[RFC]: implementing the multinomial distribution in stdlib #126

@Taaha-Tariq

Description

@Taaha-Tariq

Full name

Muhammad Taaha Tariq

University status

Yes

University name

National University of Science and Technology, Islamabad

University program

BSCS

Expected graduation

2028

Short biography

I am currently doing my BS in Computer Science and have good programming skills. I have over two years of experience in C/C++ and JavaScript, along with other languages. What primarily interests me is mathematics, and so, I tend to self-study different fields of mathematics, which is actually how I got into probability theory when I did the "Introduction to Probability" course offered by Harvard. Apart from that, I am a full-stack PERN developer and work with different JavaScript frameworks and node.js.

Timezone

GMT +5

Contact details

[email protected]

Platform

Windows

Editor

For JavaScript, C/C++, I prefer to go with VSCode but for Java, I use IntelliJ. The reason being that VSCode is a lightweight, easy to use code editor with extensive support for JavaScript(extensions) and C/C++. Plus, it offers integration with Git/Github and Github bot. And offers high customizability. As for Java, IntelliJ streamlines the process of building a java project for both maven or gradle build.

Programming experience

I made many projects and wrote algorithms in C when I was learning the language, such as quicksort and the Unix implementation of fopen, among others, which helped me familiarize myself with the more complicated concepts of the language. Apart from that, I made a game in Python and a scientific calculator in C++ with a GUI. And I have also made many websites using different technologies(React, Express, node). My favorite was the implementation of fopen because i had to work with the more basic libraries of C and learned how the language interacts with the operating system to request file access.

JavaScript experience

I have been working with JavaScript for more than two years now and have worked with different frameworks of JS, such as (React, Node, Express) for Full-stack development. I recently made a website that uses the Spotify API to request access to songs and creates a playlist using React, and I am currently working on a full-stack application for my semester project. My favorite feature of JavaScript is be async/await functionality it provides because when I first learned about it, I was amazed by its implementation, how it uses an event loop to handle asynchronous tasks in a single-threaded environment. My least favorite feature would be the iterator functions, because although they simplify working with arrays, I believe them to be overly complicated and hard to grasp especially when you first learn about them.

Node.js experience

I use Node.js for web development mainly and have many projects that use Node.js for the server implementation using Express.js. Besides that, when I was first learning JavaScript, I made a DNA simulator in JS.

C/Fortran experience

I have worked with C for two years and have familiarized myself with most of the language. I made projects such as a reverse polish calculator which uses a stack and reverse polish notation to perform calculations. The reason why this notation was used in the first calculators and is even used today is because it is more efficient from a hardware/computational point of view. Other than that, I have written many algorithms in C all of which can be found on my github. As for fortran, I have never worked with it before.

Interest in stdlib

I recently came across this library and was amazed to see something like stdlib existing in JS since I had always heard that only Python had extensive support and libraries for statistics and other math-related things. And this is what intrigued me about stdlib and is one of the primary reasons as to why I wanna work on this project. So that we can provide their functionalities in the JS environment. As of now, my favorite feature would be the distributions that are already implemented and the linear algebra algorithms that exist in the library.

Version control

Yes

Contributions to stdlib

I haven't made any contributions to the library, but I am looking into making some contributions in the near future.

stdlib showcase

I haven't worked with the library since i recently discovered it, but will try to use it for my upcoming projects.

Goals

I plan to implement the multinomial distribution in the library. It is the generalization of the binomial, bernoulli, and categorical distributions, and so can be used to model these distributions with extensive use in modelling discrete probabilistic problems, such as simulating that from a poll of ten people and two hundred votes, what is the probability that the Xth person gets Y votes. In short, It will allow users to simulate discrete problems with more where there are more than one categories to which the things can belong and n number of trials.

Why this project?

What excites me about this project is the wide application of this distribution. It can be used to model discrete problems on the web and in the node environment without reliance on Python libraries. Besides that, it will give me an opportunity to work on my two interests, which are probability and statistics, and programming.

Qualifications

I did the "Introduction to Probability" Course offered by Harvard. So, I believe myself to have ample knowledge when it comes to probability distributions and statistical models. As for the programming and implementation side of things, my major is CS, and in my free time I like to learn different programming languages and concepts, which have helped me to have a strong foundation in programming. Because of the above-mentioned things, I believe myself to be a perfect fit for this project.

Prior art

This distribution has already been implemented in the Scipy library and has good documentation on it. And it is also a part of the R programming language. Both the implementations rely on using vectors for implementing this distributions since it requires k-inputs.

Commitment

I have no commitments this summer and so I will be able to dedicate most of my time to this project (40 hours a week or so). Besides this, I plan on self-studying mathematics and fostering my coding skills this summer.

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period:

  • I plan on learning more about the distribution.

  • Week 1-3:

  • I plan on working on the documentation for the API and identifying the different use cases of it, and identifying the challenges that may be faced during the implementation. Plus, it will also give me a chance to modify the implementation to make it more efficient.

  • Week 4-7:

  • Will focus on implementing the basic structure of the API and implementing the algorithms for entropy, covariance/correlation matrix and mean.

  • Week 8:

  • For testing the implementation that was done in the period from 4-7th week.

  • Week 9-11:

  • Will focus on the implementation of the probability mass function and the moment generating function which is the most important part of the API.

  • Week 12:

  • For thoroughly testing the PMF and MGF implemented in the previous weeks to ensure they work perfectly using Scipy as a reference.

  • Final Week:

  • Will focus on wrapping up the implementation and making sure that everything is perfect from documentation to implementation followed by submission.

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

No response

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • I have read and understood the stdlib showcase requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    20252025 GSoC proposal.received feedbackA proposal which has received feedback.rfcProject proposal.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions