Project lead: Andrew Hallam ([email protected])
This project is about using a combination of Artificial Intelligence(AI), Machine Learning(ML), and Deep Learning(DL) technique to create algorithms that crack different CAPTCHA system
Deliverables:
- AI/ML Models that can crack CAPTCHAs of different forms
- Documentation showcasing the development of the model (documentation should also track discontinued models, so that they are not re-attempted)
- Hand-over Documentation
- (Long Term) - a software that interacts with browsers to detect a CAPTCHA and and crack it.
Data Engineering/Data Scrappings:
Lead: Hugh Wan ([email protected])
- Github Responsibilities (maintaining, approving merge requests, etc.)
- Data Responsibilities (obtaining, integrity, maintaining)
- Data Pipeline
- Real time website scraping
Project Team A (Image based CAPTCHA)
Lead: Cecilia Sammut ([email protected])
- Research (both CAPTCHA and Model research)
- Model development
- Model Reviews
Project Team B (Text based CAPTCHA)
Lead: William Tan Yoon Lok ([email protected])
- Research (both CAPTCHA and Model research)
- Model developmentsss
- Model Review
I have compiled some of the findings based on all the post from MS Teams and can be seen below:
- Kaggle Captcha version 2 Image by fournierp
- Kaggle Cracking captcha by aakashnain
- Learning resource about Python, ML & AL by Angus
- CAPTCHA breaking with Deep Learning: This research paper mention that the have used PyCaptcha, which is a python package for CAPTCHA generation, to make custom CAPTCHA image dataset. It offers several degree of freedom such as font style, distortion and noise, so that people can exploit increase the diversity of the data and the difficulty of the recognition task.
GitHub is a version control system that allows for collaboration amongst team members working on a project. It is also publishing a platform that allows all members to view the changes that have been made to strcture of the project or the code. It enables us to keep track of the codebase, save our projects as it develop and revisit prior points of the project should it be required.
You can use GitHub either via its web version or desktop application. Branches are the central operating mechanism GitHub uses for collaboration. It allows us to have different versions of a repository simultaneously without making an change to the main source of code. The work done on different branches will not show up on the main branch until you merge it, allowing for experimentation with the code.
On the web version of GitHub, you simply have to select the branch you want to work on, be in the 'main' or other branches after which, you will select the edit icon. After adding or making any changes to the code or documentation, scroll down to bottom and select 'Create a new branch for this commit and start a pull request'. Rename the branch any way you'd like and click on 'propose changes'. The following page provides you with an option to 'leave a comment' and you should comment on the exact nature of your changes. This will help other members understand your changes more efficiently. Once done, click 'Create pull request'. It then brings you to a page where you 'push' your changes to the selected branch. Again, it provides you with an option to 'leave a comment' and once you have explained the changes made, select 'Merge pull request' and then select 'Confirm merge'. This step will merge all the changes with the main branch that was selected. Finally, select 'delete branch', which deletes the branch that was copied and you were working on prior to the merge. Go back to the main page to view your changes.
For a step-by-step guide on how to operate GitHub, you can click on this link: https://www.youtube.com/watch?v=RGOj5yH7evk. The link provides a good foundation on understanding not only GitHub but also Git.
You can contact me if you have any questions regarding this repository:
- Hugh ([email protected])
- Rayvinder Athwal ([email protected])
This documentation will be updated constantly so keep an eye out!!
Last edited: 21/03/2022 6:36PM