Skip to content

Refactor Depends/Scitools Notebook Data Schema Architecture  #372

@john-a-flinn

Description

@john-a-flinn

As-is

Currently these are the full Depends/Scitools modules tables currently.

Image Image

Task List

Overview of the schema.

Image

depends_showcase.Rmd Overview

depends_showcase takes a file and a sparse matrix of calls that relate to only file dependencies into a unique node and generic edge structure. It is more vague than understands.Rmd thus takes less data and time to run. Good for a quick overview of a repo/s dependencies. depends_showcase also contains filters and transformation but they only affect the amount of rows or change the rows format so they will be left out. All unique tables are included (different columns).

project_dependencies

project_dependencies takes a node (file) and sparse matrix of calls that relate to only file dependencies.

```{r}
project_dependencies <- parse_dependencies(depends_jar_path,git_repo_path,language=language)
```

node

filepath  
atomic/netware/apr_atomic.c  

edgelist

src_filepath dest_filepath Import Parameter Contain Use Implement ImplLink Return Cast Call
util-misc/apr_rmm.c include/apr_strings.h 1 0 0 0 0 0 0 0 0
Image

project_file_network

project_file_network then takes the node (file) with a color specified in the conf and also transforms the matrix into and edge generic. It assigns a weight for each edge object based on the sum of the rows in the spare matrix.

project_file_network <- transform_dependencies_to_network(project_dependencies_slice,
weight_types = keep_dependencies_type)

nodes

name color
atomic/netware/apr_atomic.c #f4dbb5

edgelists

from to weight
util-misc/apr_rmm.c include/apr_strings.h 1
Image

understands.Rmd Overview

understands takes a specialized (non generic) node_list (file) and a specialized edge_list that can be any entity (variable, function, class) and creates a network based on their connections using the Scitools understands. understands also contains filters and transformation but they only affect the amount of rows or change the rows format so they will be left out. All unique tables are included (different columns).

file_dependencies <- parse_understand_dependencies(dependencies_path = file_dependencies_path)

node_list

node_label id long_name
HelixAdminWebApp.java 9469 /home/john/kaiaulu/rawdata/helix/git_repo/Helix/helix-admin-webapp/src/main/java/org/apache/heli...

edge_list

label_from label_to id_from id_to dependency_kind
HelixRestServer.java AuditLogFilter.java 109067 1 Import
Image

Task List

Target Files: All .Rmd files

Proposed updates

All functions will be kept since they are all essential to run the notebook and connections to other modules.

Rename src_filepath and dest_filepath to fall inline with current naming conventions in this and other modules (gitlog)

1. Schema

Image Image

2. Function signatures

depends_showcase.Rmd

All functions are needed to take a initial spare matrix to a graph

  • project_dependencies_nodes
  • project_dependencies_edgelist
  • project_file_network_nodes
  • project_file_network_edgelist

understands.Rmd

Cannot function without either a node or a edge for a node edge graph

  • node_list
  • edge_list

3. Renamed column

We need to standadize the src_filepath and dest_filepath with the other notebooks they use the syntax from and to respectively

  • project_dependencies_edgelist.src_filepath to project_dependencies_edgelist.from
  • project_dependencies_edgelist.dest_filepath to project_dependencies_edgelist.to

Keep the non normalized names for understands.Rmd since the node and edge structure is to differenet from the generics.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions