Skip to content

Commit 12f7d28

Browse files
committed
Move config into YAML file, update related docs
1 parent 6e90a21 commit 12f7d28

File tree

4 files changed

+253
-84
lines changed

4 files changed

+253
-84
lines changed

README.md

Lines changed: 85 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,16 @@ Running Our Software
1010
The main prerequisites to running the [PHP][php], [Java][java], and [Rascal][rascal] code
1111
used to implement our analysis are:
1212

13-
* Java JDK version 11: Java 11 is available from the [Java download][java] page. We recommend using the Eclipse Temurin release to avoid any licensing issues.
13+
* Java JDK version 17: Java 11 is available from the [Java download][java] page. We recommend using the Eclipse Temurin release to avoid any licensing issues.
1414
* PHP version 8: Although you can download the sourcecode for PHP from the [PHP download][php] page, it's often easier to use a precompiled version. For macOS, you can use [Homebrew][homebrew] and the [Homebrew PHP Formula][homebrew-php] to easily install a working version of PHP. For Windows, [XAMPP][xampp] provides a working version of PHP. For Linux, you should be able to use your package manager to install the newest version. Note that PHP no longer is included on macOS with the developer tools.
1515

16-
To edit and run Rascal code, please see the relevant information in the [online Rascal documentation][run-rascal]. You can use the command line, Eclipse, or VScode. All included code should work with the newest release of Rascal.
16+
To edit and run Rascal code, please see the relevant information in the [online Rascal documentation][run-rascal]. You can use the command line, Eclipse, or VScode (recommended). All included code should work with the newest release of Rascal.
1717

1818
To parse PHP code, we are using a fork of an open-source PHP
1919
Parser. This is also available in our Github repository, and
2020
is named [PHP-Parser][phpp]. You will want to clone this project to a convenient location.
2121

22-
[java]: https://adoptium.net/temurin/releases/?version=11
22+
[java]: https://adoptium.net/temurin/releases/?version=17
2323
[rascal]: http://www.rascal-mpl.org
2424
[php]: http://www.php.net/downloads.php
2525
[phpp]: https://github.com/cwi-swat/PHP-Parser
@@ -52,71 +52,105 @@ Second, you will want to add PHP AiR as a plugin dependency in the pom.xml file.
5252
<dependency>
5353
<groupId>org.rascalmpl</groupId>
5454
<artifactId>php-analysis</artifactId>
55-
<version>0.2.1-SNAPSHOT</version>
55+
<version>0.3.1</version>
5656
</dependency>
5757
```
5858

59+
The `version` will depend on whether you also have PHP AiR installed locally. If you are also making changes to the PHP AiR project, you will want to use the version from the `pom.xml` file in that project. If not, you should select a version available to download as a dependency.
60+
5961
You will now be able to import libraries from PHP AiR in your project and in a Rascal console created in the context of your project.
6062

6163
Configuring PHP AiR
6264
-------------------
6365

64-
Before you first use PHP AiR, you need to create a file named Config.rsc that will be in the folder src/main/rascal/lang/php/config. This will be the module `lang::php::config::Config`. You will do this either directly in the PHP AiR project or within your own project that imports PHP AiR as a dependency.
66+
Before you first use PHP AiR, you need to set the values of configuration variables in a YAML file and set the environment variable `PHP_AIR_CONFIG` to point to this file. For instance, to set this to a file named `config.yaml` in a folder under your home directory named `/Projects/php-analysis/php-analysis`, you would use command `export PHP_AIR_CONFIG=$HOME/Projects/php-analysis/php-analysis/config.yaml`. You can also set this when using the `code` command to start VSCode, like `PHP_AIR_CONFIG=$HOME/Projects/php-analysis/php-analysis/config.yaml code .` to launch VSCode in the directory of the PHP AiR project.
6567

66-
An example of this configuration file is shown below. Note that we assume, for this README, that all code being analyzed, and all intermediate results, are stored in a directory named `PHPAnalysis` in the user's home directory (i.e., `~/PHPAnalysis` on a Mac or Linux machine). Note also that there is an existing file under src/main/rascal/lang/php.config.rsc-dist. This file contains a template that you can copy to create your config file, and is the easiest way to create a new one. This file also contains Rascal `@doc` comments for each item. These comments are not shown below.
68+
An example of this configuration file, based on one used by one of the contributors to this project, is shown below. This file is also included in the root of the repository. Note that we assume, for this README, that all code being analyzed, and all intermediate results, are stored in a directory named `PHPAnalysis` in the user's home directory (i.e., `~/PHPAnalysis` on a Mac or Linux machine).
6769

6870
```
69-
module lang::php::config::Config
70-
71-
public bool usePhpParserJar = false;
72-
73-
public loc phploc = |file:///usr/local/php5/bin/php|;
74-
75-
public loc parserLoc = |file:///Users/hillsma/Projects/phpsa/PHP-Parser|;
76-
77-
public loc analysisLoc = |file:///Users/hillsma/Projects/phpsa/rascal/php-analysis/|;
78-
79-
public str parserMemLimit = "1024M";
80-
81-
public str astToRascal = "lib/Rascal/AST2Rascal.php";
82-
83-
public loc parserWorkingDir = (parserLoc + astToRascal).parent;
84-
85-
public loc baseLoc = |home:///PHPAnalysis|;
86-
87-
public loc parsedDir = baseLoc + "serialized/parsed";
88-
89-
public loc statsDir = baseLoc + "serialized/stats";
90-
91-
public loc corpusRoot = baseLoc + "systems";
92-
93-
public loc countsDir = baseLoc + "serialized/counts";
94-
95-
public bool useBinaries = false;
96-
97-
public bool includePhpDocs = false;
98-
public bool includeLocationInfo = false;
99-
public bool resolveNamespaces = false;
100-
101-
public int logLevel = 2;
71+
# Main PHP Analysis configuration settings.
72+
php-air:
73+
# The location of the PHP executable in Rascal location format.
74+
phpLoc: "file:///opt/homebrew/bin/php"
75+
# The debugging level for log statements.
76+
# 0 means disable logging
77+
# 1 means typical logging statements
78+
# 2 means debug-level logging
79+
logLevel: 2
80+
# The location of the cloc tool, used for source lines of code,
81+
# in Rascal location format.
82+
clocLoc: "file:///opt/homebrew/bin/cloc"
83+
84+
# Settings related to parsing PHP code.
85+
parsing:
86+
# Indicates whether to use the parser contained in a distributed jar
87+
# file or from the directory given as parserLoc. By default, this should
88+
# be false unless you have such a file (e.g., a Java-based parsing library
89+
# for PHP).
90+
usePhpParserJar: false
91+
# The base install location for the PHP-Parser project, in Rascal location format.
92+
parserLoc: "file:///Users/hillsma/Projects/php-analysis/PHP-Parser"
93+
# The memory limit for PHP when the parser is run. This may need to
94+
# be increased if the parser runs out of memory, e.g., because of an
95+
# especially large or deeply-nested script.
96+
parserMemLimit: "1024M"
97+
# The name of the AST to Rascal conversion script. This should not be
98+
# modified unless you have created your own version of this.
99+
astToRascal: "AST2Rascal.php"
100+
# The working directory for when the parser runs, in Rascal location format.
101+
parserWorkingDir: "file:///Users/hillsma/Projects/php-analysis/PHP-Parser"
102+
103+
# Analysis-related settings.
104+
analysis:
105+
# The base location for the corpus and any serialized files, in Rascal
106+
# location format. You would normally put code to analyze under this folder,
107+
# but this isn't required. Any serialized data will be stored under this folder.
108+
baseLoc: "home:///PHPAnalysis"
109+
# The base install location for the php-analysis project. This is only
110+
# needed if you are working directly on the project, versus using it as
111+
# a dependency, since this is needed to run tests. This is given in
112+
# Rascal location format.
113+
analysisLoc: "file:///Users/hillsma/Projects/php-analysis/php-analysis/"
114+
# Where to put the binary representations of parsed systems, in Rascal
115+
# location format.
116+
parsedDir: "home:///PHPAnalysis/serialized/parsed"
117+
# Where to put the binary representations of extracted statistics, in
118+
# Rascal location format.
119+
statsDir: "home:///PHPAnalysis/serialized/stats"
120+
# Where to put extracted counts (e.g., SLOC), in Rascal location format.
121+
countsDir: "home:///PHPAnalysis/serialized/counts"
122+
# Where the PHP sources for the corpus reside. This is for systems given
123+
# with each system version in a separate directory. This is in Rascal
124+
# location format.
125+
corpusRoot: "home:///PHPAnalysis/systems"
126+
# This should only ever be true if we don't have source, i.e., we only have the
127+
# extracted binaries for parsed systems. This should normally be false.
128+
useBinaries: false
102129
```
103130

104131
The configurable values in this file are as follows:
105132

106-
* phploc, of type `loc`, points to the location of the php binary
107-
* parserLoc, of type `loc`, points to the location of the PHP-Parser project
108-
* analysisLoc, of type `loc`, points to the location of the php-analysis project itself
109-
* astToRascal, of type `str`, points to the location of file AST2Rascal.php inside PHP-Parser
110-
* parserWorkingDir, of type `loc`, points to the location of the working directory for when the parser runs
111-
* baseLoc, of type `loc`, provides the base location for a number of files created as part of
133+
* `phploc` points to the location of the php binary
134+
* `logLevel` indicates how much debugging information will be seen, and can be set to
135+
0 to turn output of this information off
136+
* `clocLoc` is the location of the `cloc` tool, which is used to compute metrics about source code
137+
* `usePhpParserJar` should generally be `false`, and is mainly present for historical reasons
138+
* `parserLoc` points to the location of the PHP-Parser project
139+
* `parserMemoryLimit` gives the memory limit value to pass to PHP, and can be increased if the parser is running out of memory
140+
* `astToRascal` should not be changed unless a file other than `AST2Rascal.php` is being used to build Rascal ASTs
141+
* `parserWorkingDir` points to the location of the working directory for when the parser runs
142+
* `baseLoc` provides the base location for a number of files created as part of
112143
the parsing and extraction process, including the directory where parsed
113144
files are stored and the root of the corpus; the remaining directories are
114145
subdirectories of this
115-
* logLevel indicates how much debugging information will be seen, and can be set to
116-
0 to turn output of this information off
117-
118-
Make sure that `useBinaries` is false; this should only be true in cases where you
119-
have the binaries built and no longer have the source. Due to some recent changes, this may not work correctly in all cases.
146+
* `analysisLoc` points to the location of the php-analysis project itself
147+
* `parsedDir` indicates where serialized versions of parsed PHP systems should be stored
148+
* `statsDir` contains computed stats for PHP systems
149+
* `countsDir` contains computed counts for PHP systems
150+
* `corpusRoot` is the location of the systems under analysis; systems do not need to be stored there, but this allows
151+
some of the built-in functionality for counting, finding, and parsing systems to be used
152+
* `useBinaries` should generally be `false`, and is only needed when you have the serialized parse trees but no longer have access
153+
to the source code you want to analyze
120154

121155
To check to ensure that the directories are properly set up, you can run the following:
122156

@@ -130,7 +164,7 @@ Parsing Older Code
130164
------------------
131165

132166
We currently support PHP code up to version 8, including new features such
133-
as nullable annotations on types. Because of the evolution of the language,
167+
as nullable annotations on types and property hooks. Because of the evolution of the language,
134168
some older code does not parse, though, which means those scripts will be
135169
represented as a special type of _error script_ using the `errscript`
136170
`Script` constructor. The main issues we are aware of are the following:

config.yaml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Main PHP Analysis configuration settings.
2+
php-air:
3+
# The location of the PHP executable in Rascal location format.
4+
phpLoc: "file:///opt/homebrew/bin/php"
5+
# The debugging level for log statements.
6+
# 0 means disable logging
7+
# 1 means typical logging statements
8+
# 2 means debug-level logging
9+
logLevel: 2
10+
# The location of the cloc tool, used for source lines of code,
11+
# in Rascal location format.
12+
clocLoc: "file:///opt/homebrew/bin/cloc"
13+
14+
# Settings related to parsing PHP code.
15+
parsing:
16+
# Indicates whether to use the parser contained in a distributed jar
17+
# file or from the directory given as parserLoc. By default, this should
18+
# be false unless you have such a file (e.g., a Java-based parsing library
19+
# for PHP).
20+
usePhpParserJar: false
21+
# The base install location for the PHP-Parser project, in Rascal location format.
22+
parserLoc: "file:///Users/hillsma/Projects/php-analysis/PHP-Parser"
23+
# The memory limit for PHP when the parser is run. This may need to
24+
# be increased if the parser runs out of memory, e.g., because of an
25+
# especially large or deeply-nested script.
26+
parserMemLimit: "1024M"
27+
# The name of the AST to Rascal conversion script. This should not be
28+
# modified unless you have created your own version of this.
29+
astToRascal: "AST2Rascal.php"
30+
# The working directory for when the parser runs, in Rascal location format.
31+
parserWorkingDir: "file:///Users/hillsma/Projects/php-analysis/PHP-Parser"
32+
33+
# Analysis-related settings.
34+
analysis:
35+
# The base location for the corpus and any serialized files, in Rascal
36+
# location format. You would normally put code to analyze under this folder,
37+
# but this isn't required. Any serialized data will be stored under this folder.
38+
baseLoc: "home:///PHPAnalysis"
39+
# The base install location for the php-analysis project. This is only
40+
# needed if you are working directly on the project, versus using it as
41+
# a dependency, since this is needed to run tests. This is given in
42+
# Rascal location format.
43+
analysisLoc: "file:///Users/hillsma/Projects/php-analysis/php-analysis/"
44+
# Where to put the binary representations of parsed systems, in Rascal
45+
# location format.
46+
parsedDir: "home:///PHPAnalysis/serialized/parsed"
47+
# Where to put the binary representations of extracted statistics, in
48+
# Rascal location format.
49+
statsDir: "home:///PHPAnalysis/serialized/stats"
50+
# Where to put extracted counts (e.g., SLOC), in Rascal location format.
51+
countsDir: "home:///PHPAnalysis/serialized/counts"
52+
# Where the PHP sources for the corpus reside. This is for systems given
53+
# with each system version in a separate directory. This is in Rascal
54+
# location format.
55+
corpusRoot: "home:///PHPAnalysis/systems"
56+
# This should only ever be true if we don't have source, i.e., we only have the
57+
# extracted binaries for parsed systems. This should normally be false.
58+
useBinaries: false

src/main/rascal/lang/php/config/Config.rsc

Lines changed: 107 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ import lang::php::util::Option;
1616

1717
import IO;
1818
import Exception;
19+
import String;
20+
import util::SystemAPI;
1921
import lang::yaml::Model;
2022

2123
public data Exception
@@ -51,41 +53,115 @@ public Config getConfig() {
5153
return c;
5254
}
5355

54-
private Config loadConfig() {
55-
bool usePhpParserJar = false;
56-
loc phpLoc = |file:///opt/homebrew/bin/php|;
57-
loc parserLoc = |file:///Users/hillsma/Projects/php-analysis/PHP-Parser|;
58-
loc analysisLoc = |file:///Users/hillsma/Projects/php-analysis/php-analysis/|;
59-
str parserMemLimit = "1024M";
60-
str astToRascal = "AST2Rascal.php";
61-
loc parserWorkingDir = (parserLoc + astToRascal).parent;
62-
loc baseLoc = |home:///PHPAnalysis|;
63-
loc parsedDir = baseLoc + "serialized/parsed";
64-
loc statsDir = baseLoc + "serialized/stats";
65-
loc corpusRoot = baseLoc + "systems";
66-
loc countsDir = baseLoc + "serialized/counts";
67-
bool useBinaries = false;
68-
int logLevel = 2;
69-
loc clocLoc = |file:///opt/homebrew/bin/cloc|;
56+
public Option[str] findStringValueInMappingByKey(Node yml, str key) {
57+
for ( /Node m:mapping(_) := yml, Node k <- m.\map, scalar(key) := k) {
58+
if (scalar(str s) := m.\map[k]) {
59+
return some(s);
60+
}
61+
}
62+
return none();
63+
}
64+
65+
public Option[loc] findLocValueInMappingByKey(Node yml, str key) {
66+
for ( /Node m:mapping(_) := yml, Node k <- m.\map, scalar(key) := k) {
67+
if (scalar(str s) := m.\map[k]) {
68+
int sepPosition = findFirst(s, "://");
69+
return some(|<s[..sepPosition]>://<s[sepPosition+3..]>|);
70+
}
71+
}
72+
return none();
73+
}
7074
75+
public Option[int] findIntValueInMappingByKey(Node yml, str key) {
76+
for ( /Node m:mapping(_) := yml, Node k <- m.\map, scalar(key) := k) {
77+
if (scalar(int n) := m.\map[k]) {
78+
return some(n);
79+
}
80+
}
81+
return none();
82+
}
83+
84+
public Option[bool] findBoolValueInMappingByKey(Node yml, str key) {
85+
for ( /Node m:mapping(_) := yml, Node k <- m.\map, scalar(key) := k) {
86+
if (scalar(bool b) := m.\map[k]) {
87+
return some(b);
88+
}
89+
}
90+
return none();
91+
}
92+
93+
private Config loadConfig() {
7194
Config c = config(
72-
some(usePhpParserJar),
73-
some(phpLoc),
74-
some(parserLoc),
75-
some(analysisLoc),
76-
some(parserMemLimit),
77-
some(astToRascal),
78-
some(parserWorkingDir),
79-
some(baseLoc),
80-
some(parsedDir),
81-
some(statsDir),
82-
some(corpusRoot),
83-
some(countsDir),
84-
some(useBinaries),
85-
some(logLevel),
86-
some(clocLoc)
95+
none(),
96+
none(),
97+
none(),
98+
none(),
99+
none(),
100+
none(),
101+
none(),
102+
none(),
103+
none(),
104+
none(),
105+
none(),
106+
none(),
107+
none(),
108+
none(),
109+
none()
87110
);
88111
112+
senv = getSystemEnvironment();
113+
if ("PHP_AIR_CONFIG" notin senv) {
114+
throw configMissing("", "PHP_AIR_CONFIG environment variable is not set");
115+
} else {
116+
configPath = senv["PHP_AIR_CONFIG"];
117+
configFile = |file://<configPath>|;
118+
if (!exists(configFile)) {
119+
throw configMissing("", "The file <configPath> does not exist");
120+
} else if (!isFile(configFile)) {
121+
throw configMissing("", "<configPath> is not a file");
122+
} else {
123+
try {
124+
yml = loadYAML(readFile(configFile));
125+
126+
Option[loc] phpLoc = findLocValueInMappingByKey(yml, "phpLoc");
127+
Option[int] logLevel = findIntValueInMappingByKey(yml, "logLevel");
128+
Option[loc] clocLoc = findLocValueInMappingByKey(yml, "clocLoc");
129+
130+
Option[bool] usePhpParserJar = findBoolValueInMappingByKey(yml, "usePhpParserJar");
131+
Option[loc] parserLoc = findLocValueInMappingByKey(yml, "parserLoc");
132+
Option[str] parserMemLimit = findStringValueInMappingByKey(yml, "parserMemLimit");
133+
Option[str] astToRascal = findStringValueInMappingByKey(yml, "astToRascal");
134+
Option[loc] parserWorkingDir = findLocValueInMappingByKey(yml, "parserWorkingDir");
135+
136+
Option[loc] analysisLoc = findLocValueInMappingByKey(yml, "analysisLoc");
137+
Option[loc] baseLoc = findLocValueInMappingByKey(yml, "baseLoc");
138+
Option[loc] parsedDir = findLocValueInMappingByKey(yml, "parsedDir");
139+
Option[loc] statsDir = findLocValueInMappingByKey(yml, "statsDir");
140+
Option[loc] corpusRoot = findLocValueInMappingByKey(yml, "corpusRoot");
141+
Option[loc] countsDir = findLocValueInMappingByKey(yml, "countsDir");
142+
Option[bool] useBinaries = findBoolValueInMappingByKey(yml, "useBinaries");
143+
144+
c = config(usePhpParserJar,
145+
phpLoc,
146+
parserLoc,
147+
analysisLoc,
148+
parserMemLimit,
149+
astToRascal,
150+
parserWorkingDir,
151+
baseLoc,
152+
parsedDir,
153+
statsDir,
154+
corpusRoot,
155+
countsDir,
156+
useBinaries,
157+
logLevel,
158+
clocLoc);
159+
} catch Exception e: {
160+
throw configMissing("", "The config file did not load correctly: <e>");
161+
}
162+
}
163+
}
164+
89165
return c;
90166
}
91167

0 commit comments

Comments
 (0)