-
-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Add Android APK analyzer for security scanning #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces a new tool, the Mobile App Analyzer, which is designed to perform static analysis on Android APK files. The new tool has the following features: - Decompiles APKs using `apktool`. - Extracts permissions from the `AndroidManifest.xml` using `aapt` and checks them against a list of high-risk permissions. - Reuses the existing `sensitive_data_scanner` to find hardcoded secrets in the decompiled source code. - Reuses the `social_media_analyzer`'s scam detector to find suspicious URLs and other scam-related indicators in the app's files. A command-line interface is provided in `mobile_analyzer_main.py`, and documentation has been added in `README_mobile_analyzer.md`. Limitations: - This implementation only supports Android APK files. Analysis for Apple (iOS) apps is not included. - The analysis is purely static and does not involve running the application.
Reviewer's GuideImplements a new Mobile App Analyzer tool with static analysis capabilities for Android APK security scanning. The PR adds core decompilation and permission analysis logic, integrates existing secret and scam scanners, provides a CLI orchestrator, and includes detailed documentation. Sequence diagram for Mobile App Analyzer CLI workflowsequenceDiagram
actor User
participant "mobile_analyzer_main.py"
participant "analyzer.py"
participant "apktool (external)"
participant "aapt (external)"
participant "sensitive_data_scanner"
participant "social_media_analyzer"
User->>"mobile_analyzer_main.py": Run analyzer with APK path
"mobile_analyzer_main.py"->>"analyzer.py": decompile_apk(apk_path, output_dir)
"analyzer.py"->>"apktool (external)": Decompile APK
"analyzer.py"-->>"mobile_analyzer_main.py": Success/Failure
"mobile_analyzer_main.py"->>"analyzer.py": get_permissions(apk_path)
"analyzer.py"->>"aapt (external)": Extract permissions
"analyzer.py"-->>"mobile_analyzer_main.py": Permissions list
"mobile_analyzer_main.py"->>"analyzer.py": check_high_risk_permissions(permissions)
"analyzer.py"-->>"mobile_analyzer_main.py": High-risk permissions
"mobile_analyzer_main.py"->>"analyzer.py": scan_for_sensitive_data(output_dir)
"analyzer.py"->>"sensitive_data_scanner": Scan for secrets
"analyzer.py"-->>"mobile_analyzer_main.py": Sensitive data findings
"mobile_analyzer_main.py"->>"analyzer.py": scan_for_scam_indicators(output_dir)
"analyzer.py"->>"social_media_analyzer": Scan for scam indicators
"analyzer.py"-->>"mobile_analyzer_main.py": Scam findings
"mobile_analyzer_main.py"->>User: Print analysis report
"mobile_analyzer_main.py"->>"analyzer.py": Clean up files (optional)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- Consider refactoring this tool as a proper Python package (e.g., with setup.py or pyproject.toml) to eliminate the sys.path hack and support standard imports.
- Allow configuring the output directory via a CLI flag or using a temporary directory instead of always writing to a fixed "decompiled_apk" folder to avoid collisions or residual files.
- Replace print statements with Python's logging module (with log levels) for better control over verbosity and easier integration with other tools.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring this tool as a proper Python package (e.g., with setup.py or pyproject.toml) to eliminate the sys.path hack and support standard imports.
- Allow configuring the output directory via a CLI flag or using a temporary directory instead of always writing to a fixed "decompiled_apk" folder to avoid collisions or residual files.
- Replace print statements with Python's logging module (with log levels) for better control over verbosity and easier integration with other tools.
## Individual Comments
### Comment 1
<location> `mobile_analyzer/analyzer.py:69` </location>
<code_context>
+ print(f"Decompiling {apk_path} to {output_dir}...")
+ try:
+ # Using -f to force overwrite the output directory if it exists
+ command = ["apktool", "d", "-f", apk_path, "-o", output_dir]
+ result = subprocess.run(command, capture_output=True, text=True, check=True)
+ print("Decompilation successful.")
</code_context>
<issue_to_address>
No timeout specified for subprocess calls.
Subprocesses may hang; add a timeout to subprocess.run for reliability.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
result = subprocess.run(command, capture_output=True, text=True, check=True)
print("Decompilation successful.")
return True
except subprocess.CalledProcessError as e:
print("Error during decompilation:")
print(e.stderr)
return False
except FileNotFoundError:
print("Error: 'apktool' not found. Make sure it is installed and in your PATH.")
return False
=======
# Set a timeout (e.g., 60 seconds) to prevent hanging
try:
result = subprocess.run(command, capture_output=True, text=True, check=True, timeout=60)
print("Decompilation successful.")
return True
except subprocess.TimeoutExpired:
print("Error: Decompilation process timed out.")
return False
except subprocess.CalledProcessError as e:
print("Error during decompilation:")
print(e.stderr)
return False
except FileNotFoundError:
print("Error: 'apktool' not found. Make sure it is installed and in your PATH.")
return False
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `mobile_analyzer/analyzer.py:97` </location>
<code_context>
+
+ print(f"Extracting permissions from {apk_path}...")
+ try:
+ command = ["aapt", "dump", "permissions", apk_path]
+ result = subprocess.run(command, capture_output=True, text=True, check=True)
+
</code_context>
<issue_to_address>
Regex for permission extraction may miss edge cases.
The current regex may fail if the aapt output format changes. Please update the regex to handle variations in whitespace or formatting.
Suggested implementation:
```python
print(f"Extracting permissions from {apk_path}...")
try:
command = ["aapt", "dump", "permissions", apk_path]
result = subprocess.run(command, capture_output=True, text=True, check=True)
# Updated regex to handle variations in whitespace and formatting
import re
permission_pattern = re.compile(r'^\s*permission(?:\s*:\s*|\s+)([\w\.\-]+)', re.MULTILINE)
permissions = permission_pattern.findall(result.stdout)
```
If the function is expected to return the list of permissions, ensure you add `return permissions` at the appropriate place after extracting them. If there is existing code that processes the output, update it to use the new `permissions` variable.
</issue_to_address>
### Comment 3
<location> `mobile_analyzer_main.py:19` </location>
<code_context>
+ sys.exit(1)
+
+ # Create a directory for the decompiled code
+ output_dir = "decompiled_apk"
+ if os.path.exists(output_dir):
+ shutil.rmtree(output_dir)
</code_context>
<issue_to_address>
Hardcoded output directory may cause conflicts.
Allow users to specify the output directory or generate a unique temporary directory to avoid conflicts when running multiple analyses or if the directory already exists.
Suggested implementation:
```python
import tempfile
parser.add_argument("apk_path", help="The path to the APK file to analyze.")
parser.add_argument("--keep-files", action="store_true", help="Keep the decompiled files after analysis.")
parser.add_argument("--output-dir", type=str, default=None, help="Directory to store decompiled files. If not specified, a temporary directory will be used.")
args = parser.parse_args()
apk_path = args.apk_path
if not os.path.exists(apk_path):
print(f"Error: APK file not found at {apk_path}")
sys.exit(1)
# Create a directory for the decompiled code
if args.output_dir:
output_dir = args.output_dir
if os.path.exists(output_dir):
shutil.rmtree(output_dir)
os.makedirs(output_dir)
else:
output_dir = tempfile.mkdtemp(prefix="decompiled_apk_")
```
```python
print(f"--- Starting analysis for {os.path.basename(apk_path)} ---")
print(f"Decompiled code will be stored in: {output_dir}")
# 1. Decompile the APK
if not analyzer.decompile_apk(apk_path, output_dir):
print("Failed to decompile APK. Aborting analysis.")
sys.exit(1)
```
If there is code later in the file that deletes the output directory (e.g., when --keep-files is not set), ensure it works with both user-specified and temporary directories. You may want to print a message to the user if a temporary directory is used and files are deleted.
</issue_to_address>
### Comment 4
<location> `mobile_analyzer_main.py:20` </location>
<code_context>
+
+ # Create a directory for the decompiled code
+ output_dir = "decompiled_apk"
+ if os.path.exists(output_dir):
+ shutil.rmtree(output_dir)
+ os.makedirs(output_dir)
+
+ print(f"--- Starting analysis for {os.path.basename(apk_path)} ---")
</code_context>
<issue_to_address>
Potential race condition when removing and creating output directory.
Another process could create the directory between its removal and recreation, causing issues. Consider using tempfile.TemporaryDirectory or an atomic approach.
Suggested implementation:
```python
import tempfile
# Create a temporary directory for the decompiled code
with tempfile.TemporaryDirectory(prefix="decompiled_apk_") as output_dir:
print(f"--- Starting analysis for {os.path.basename(apk_path)} ---")
# 1. Decompile the APK
if not analyzer.decompile_apk(apk_path, output_dir):
print("Failed to decompile APK. Aborting analysis.")
sys.exit(1)
```
You will need to indent all code that uses `output_dir` so it is inside the `with tempfile.TemporaryDirectory(...) as output_dir:` block.
If you want to support the `--keep-files` option, you will need to add logic to copy the temporary directory to a permanent location if requested.
</issue_to_address>
### Comment 5
<location> `mobile_analyzer_main.py:70` </location>
<code_context>
+ print("\nNo scam indicators found.")
+
+ # Clean up the decompiled files
+ if not args.keep_files:
+ print("\nCleaning up temporary files...")
+ shutil.rmtree(output_dir)
+
+ print("\n--- Analysis Complete ---")
</code_context>
<issue_to_address>
No error handling for cleanup step.
Wrap shutil.rmtree in a try-except block to handle potential exceptions if the directory is locked or files are in use.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
# Clean up the decompiled files
if not args.keep_files:
print("\nCleaning up temporary files...")
shutil.rmtree(output_dir)
print("\n--- Analysis Complete ---")
=======
# Clean up the decompiled files
if not args.keep_files:
print("\nCleaning up temporary files...")
try:
shutil.rmtree(output_dir)
except Exception as e:
print(f"Warning: Failed to clean up temporary files in '{output_dir}'. Reason: {e}")
print("\n--- Analysis Complete ---")
>>>>>>> REPLACE
</suggested_fix>
## Security Issues
### Issue 1
<location> `mobile_analyzer/analyzer.py:70` </location>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Issue 2
<location> `mobile_analyzer/analyzer.py:98` </location>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| result = subprocess.run(command, capture_output=True, text=True, check=True) | ||
| print("Decompilation successful.") | ||
| return True | ||
| except subprocess.CalledProcessError as e: | ||
| print("Error during decompilation:") | ||
| print(e.stderr) | ||
| return False | ||
| except FileNotFoundError: | ||
| print("Error: 'apktool' not found. Make sure it is installed and in your PATH.") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): No timeout specified for subprocess calls.
Subprocesses may hang; add a timeout to subprocess.run for reliability.
| result = subprocess.run(command, capture_output=True, text=True, check=True) | |
| print("Decompilation successful.") | |
| return True | |
| except subprocess.CalledProcessError as e: | |
| print("Error during decompilation:") | |
| print(e.stderr) | |
| return False | |
| except FileNotFoundError: | |
| print("Error: 'apktool' not found. Make sure it is installed and in your PATH.") | |
| return False | |
| # Set a timeout (e.g., 60 seconds) to prevent hanging | |
| try: | |
| result = subprocess.run(command, capture_output=True, text=True, check=True, timeout=60) | |
| print("Decompilation successful.") | |
| return True | |
| except subprocess.TimeoutExpired: | |
| print("Error: Decompilation process timed out.") | |
| return False | |
| except subprocess.CalledProcessError as e: | |
| print("Error during decompilation:") | |
| print(e.stderr) | |
| return False | |
| except FileNotFoundError: | |
| print("Error: 'apktool' not found. Make sure it is installed and in your PATH.") | |
| return False |
|
|
||
| print(f"Extracting permissions from {apk_path}...") | ||
| try: | ||
| command = ["aapt", "dump", "permissions", apk_path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Regex for permission extraction may miss edge cases.
The current regex may fail if the aapt output format changes. Please update the regex to handle variations in whitespace or formatting.
Suggested implementation:
print(f"Extracting permissions from {apk_path}...")
try:
command = ["aapt", "dump", "permissions", apk_path]
result = subprocess.run(command, capture_output=True, text=True, check=True)
# Updated regex to handle variations in whitespace and formatting
import re
permission_pattern = re.compile(r'^\s*permission(?:\s*:\s*|\s+)([\w\.\-]+)', re.MULTILINE)
permissions = permission_pattern.findall(result.stdout)If the function is expected to return the list of permissions, ensure you add return permissions at the appropriate place after extracting them. If there is existing code that processes the output, update it to use the new permissions variable.
| sys.exit(1) | ||
|
|
||
| # Create a directory for the decompiled code | ||
| output_dir = "decompiled_apk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Hardcoded output directory may cause conflicts.
Allow users to specify the output directory or generate a unique temporary directory to avoid conflicts when running multiple analyses or if the directory already exists.
Suggested implementation:
import tempfile
parser.add_argument("apk_path", help="The path to the APK file to analyze.")
parser.add_argument("--keep-files", action="store_true", help="Keep the decompiled files after analysis.")
parser.add_argument("--output-dir", type=str, default=None, help="Directory to store decompiled files. If not specified, a temporary directory will be used.")
args = parser.parse_args()
apk_path = args.apk_path
if not os.path.exists(apk_path):
print(f"Error: APK file not found at {apk_path}")
sys.exit(1)
# Create a directory for the decompiled code
if args.output_dir:
output_dir = args.output_dir
if os.path.exists(output_dir):
shutil.rmtree(output_dir)
os.makedirs(output_dir)
else:
output_dir = tempfile.mkdtemp(prefix="decompiled_apk_") print(f"--- Starting analysis for {os.path.basename(apk_path)} ---")
print(f"Decompiled code will be stored in: {output_dir}")
# 1. Decompile the APK
if not analyzer.decompile_apk(apk_path, output_dir):
print("Failed to decompile APK. Aborting analysis.")
sys.exit(1)If there is code later in the file that deletes the output directory (e.g., when --keep-files is not set), ensure it works with both user-specified and temporary directories. You may want to print a message to the user if a temporary directory is used and files are deleted.
| if os.path.exists(output_dir): | ||
| shutil.rmtree(output_dir) | ||
| os.makedirs(output_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Potential race condition when removing and creating output directory.
Another process could create the directory between its removal and recreation, causing issues. Consider using tempfile.TemporaryDirectory or an atomic approach.
Suggested implementation:
import tempfile
# Create a temporary directory for the decompiled code
with tempfile.TemporaryDirectory(prefix="decompiled_apk_") as output_dir:
print(f"--- Starting analysis for {os.path.basename(apk_path)} ---")
# 1. Decompile the APK
if not analyzer.decompile_apk(apk_path, output_dir):
print("Failed to decompile APK. Aborting analysis.")
sys.exit(1)You will need to indent all code that uses output_dir so it is inside the with tempfile.TemporaryDirectory(...) as output_dir: block.
If you want to support the --keep-files option, you will need to add logic to copy the temporary directory to a permanent location if requested.
| # Clean up the decompiled files | ||
| if not args.keep_files: | ||
| print("\nCleaning up temporary files...") | ||
| shutil.rmtree(output_dir) | ||
|
|
||
| print("\n--- Analysis Complete ---") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): No error handling for cleanup step.
Wrap shutil.rmtree in a try-except block to handle potential exceptions if the directory is locked or files are in use.
| # Clean up the decompiled files | |
| if not args.keep_files: | |
| print("\nCleaning up temporary files...") | |
| shutil.rmtree(output_dir) | |
| print("\n--- Analysis Complete ---") | |
| # Clean up the decompiled files | |
| if not args.keep_files: | |
| print("\nCleaning up temporary files...") | |
| try: | |
| shutil.rmtree(output_dir) | |
| except Exception as e: | |
| print(f"Warning: Failed to clean up temporary files in '{output_dir}'. Reason: {e}") | |
| print("\n--- Analysis Complete ---") |
| try: | ||
| # Using -f to force overwrite the output directory if it exists | ||
| command = ["apktool", "d", "-f", apk_path, "-o", output_dir] | ||
| result = subprocess.run(command, capture_output=True, text=True, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| print(f"Extracting permissions from {apk_path}...") | ||
| try: | ||
| command = ["aapt", "dump", "permissions", apk_path] | ||
| result = subprocess.run(command, capture_output=True, text=True, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
| found_high_risk = [] | ||
| for perm in permissions: | ||
| if perm in HIGH_RISK_PERMISSIONS: | ||
| found_high_risk.append(perm) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Convert for loop into list comprehension (list-comprehension)
| found_high_risk = [] | |
| for perm in permissions: | |
| if perm in HIGH_RISK_PERMISSIONS: | |
| found_high_risk.append(perm) | |
| found_high_risk = [ | |
| perm for perm in permissions if perm in HIGH_RISK_PERMISSIONS | |
| ] |
| from mobile_analyzer import analyzer | ||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description="Analyze an Android APK for security vulnerabilities.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Use named expression to simplify assignment and conditional [×4] (use-named-expression)
This commit introduces a new tool, the Mobile App Analyzer, which is designed to perform static analysis on Android APK files.
The new tool has the following features:
apktool.AndroidManifest.xmlusingaaptand checks them against a list of high-risk permissions.sensitive_data_scannerto find hardcoded secrets in the decompiled source code.social_media_analyzer's scam detector to find suspicious URLs and other scam-related indicators in the app's files.A command-line interface is provided in
mobile_analyzer_main.py, and documentation has been added inREADME_mobile_analyzer.md.Limitations:
Summary by Sourcery
Add a new Mobile App Analyzer for static security scanning of Android APKs, including decompilation, high-risk permission detection, secret scanning, and scam indicator analysis via a CLI
New Features:
Documentation: