Skip to content

Refactor Bug Count Notebook Data Schema Architecture #386

@john-a-flinn

Description

@john-a-flinn

As-is

Currently these are the full Bug Count modules tables currently.

Image

Task List

Overview of the schema.

Image

bug_count.Rmd Overview

Bug_count take a parsed Jira and Gitlog repo and merges them together into one table based on issue/s. It also has the capability to check the amount of bugs in a repo file. It uses the generic node and edge functions to generate graphs from this data.

Bug_count's tables has many transformations and filters but they do not change the number of columns only the number of rows. Because of this any table that does not change the columns will be ignored.

Github

project_git

project_git is the initial parse of a repo with all the necessary columns to run. It is the exact same as gitlog's project_git #358.

project_git <- parse_gitlog(perceval_path,git_repo_path)

author_name_email author_datetimetz commit_hash committer_name_email committer_datetimetz commit_message file_pathname lines_added lines_removed file_pathname_renamed
John Doe #@apache.org 1999-08-17 15:59:33 cc3fca20f700829592add524e705d525fca287f2 John Doe #@apache.org Tue Aug 17 15:59:33 1999 +0000 Initial revision git-svn-id: https://svn.apache.org/repos/asf/apr/apr/trunk@59151 13f79535-47bb-0310-9956-ffa450edef68 Makefile.in 71 0 NA
Image

transform_commit_message_id_to_network

transform_commit_message_id_to_network takes a parsed repo and a exact commit id (regex) and turns it into the generic node and edgelist.

project_commit_message_id_edgelist <- transform_commit_message_id_to_network(project_git_slice,commit_message_id_regex = issue_id_regex)

nodes    
name type color
R/parsers.R FALSE #f4
edgelist    
from to weight
R/metric.R R/parsers.R 12
Image

project_git (parse_commit_message_id)

Extract only the detected issue id from the commit message, and add it in a separate column.

project_git <- parse_commit_message_id(project_git, issue_id_regex)

Image

Jira

jira_issues

Extracts and parses a custom Jira table using a given path.

jira_issues <- parse_jira(jira_issues_path)[["issues"]]

issue_key issue_summary issue_type issue_status issue_componets issue_description issue_affect_version issue_fix_version issue_lables issue_created_datetimetz issue_resolution_datetimz
"KAFKA-14666" string "bug" "done" "connect" string "3.2.0" "3.3.0" "" 2023-01-31T19:41:42.00+0000 2023-01-31T19:41:42.00+0002
Image

Merge

project_git

Performs a left join using the extracted key of the git log against the issue key from the issue data.

project_git <- merge(project_git,jira_issues,all.x=TRUE,by.x="commit_message_id",

commit_message_id author_name_email author_datetimetz commit_hash committer_name_email committer_datetimetz commit_message file_pathname lines_added lines_removed file_pathname_renamed issue_summary issue_type issue_status issue_componets issue_description issue_affect_version issue_fix_version issue_lables issue_created_datetimetz issue_resolution_datetimz
  John Doe #@apache.org 1999-08-17 15:59:33 cc3fca20f700829592add524e705d525fca287f2 John Doe #@apache.org Tue Aug 17 15:59:33 1999 +0000 Initial revision git-svn-id: https://svn.apache.org/repos/asf/apr/apr/trunk@59151 13f79535-47bb-0310-9956-ffa450edef68 Makefile.in 71 0 NA string "bug" "done" "connect" string "3.2.0" "3.3.0" "" 2023-01-31T19:41:42.00+0000 2023-01-31T19:41:42.00+0002
Image

file_bug_count

Counts the commits for fixing bugs in a Jira or Github file.

file_pathname bug_count
string int
Image

file_bug_count <- project_git[,.(bug_count=.N),by = "file_pathname"]

Task List

Target Files: All .Rmd files

Proposed updates

TBD

1. Schema

  • TBD

2. Function signatures

  • TBD

3. Renamed column

  • TBD

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions