Skip to content

Commit 0769feb

Browse files
committed
提交文件
0 parents  commit 0769feb

31 files changed

+3023
-0
lines changed

.github/workflows/ci.yml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [ main, master ]
6+
pull_request:
7+
branches: [ main, master ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ['3.9', '3.10', '3.11', '3.12']
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Install dependencies
25+
run: |
26+
python -m pip install --upgrade pip
27+
pip install -e ".[dev]"
28+
29+
- name: Lint with ruff
30+
run: |
31+
pip install ruff
32+
ruff check src/ tests/ --exit-zero
33+
34+
- name: Run tests
35+
run: |
36+
pytest tests/ -v --tb=short
37+
38+
build:
39+
runs-on: ubuntu-latest
40+
needs: test
41+
42+
steps:
43+
- uses: actions/checkout@v4
44+
45+
- name: Set up Python
46+
uses: actions/setup-python@v5
47+
with:
48+
python-version: '3.11'
49+
50+
- name: Install build tools
51+
run: |
52+
python -m pip install --upgrade pip
53+
pip install build
54+
55+
- name: Build package
56+
run: python -m build
57+
58+
- name: Upload artifacts
59+
uses: actions/upload-artifact@v4
60+
with:
61+
name: dist
62+
path: dist/

.gitignore

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
*.egg-info/
24+
.installed.cfg
25+
*.egg
26+
27+
# PyInstaller
28+
*.manifest
29+
*.spec
30+
31+
# Installer logs
32+
pip-log.txt
33+
pip-delete-this-directory.txt
34+
35+
# Unit test / coverage reports
36+
htmlcov/
37+
.tox/
38+
.nox/
39+
.coverage
40+
.coverage.*
41+
.cache
42+
nosetests.xml
43+
coverage.xml
44+
*.cover
45+
*.py,cover
46+
.hypothesis/
47+
.pytest_cache/
48+
49+
# Translations
50+
*.mo
51+
*.pot
52+
53+
# Environments
54+
.env
55+
.venv
56+
env/
57+
venv/
58+
ENV/
59+
env.bak/
60+
venv.bak/
61+
62+
# IDE
63+
.idea/
64+
.vscode/
65+
*.swp
66+
*.swo
67+
*~
68+
69+
# macOS
70+
.DS_Store
71+
72+
# Project specific
73+
realcite.yaml
74+
*.log
75+
output/
76+
reports/
77+
.cache/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 RealCite Team
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

Makefile

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
.PHONY: install dev test lint format build clean help
2+
3+
# 默认目标
4+
help:
5+
@echo "RealCite - 学术文献引用真实性检测工具"
6+
@echo ""
7+
@echo "可用命令:"
8+
@echo " make install 安装项目"
9+
@echo " make dev 安装开发依赖"
10+
@echo " make test 运行测试"
11+
@echo " make lint 代码检查"
12+
@echo " make format 代码格式化"
13+
@echo " make build 构建发布包"
14+
@echo " make clean 清理临时文件"
15+
@echo " make demo 运行示例演示"
16+
@echo ""
17+
18+
# 安装
19+
install:
20+
pip install -e .
21+
22+
# 安装开发依赖
23+
dev:
24+
pip install -e ".[dev]"
25+
26+
# 运行测试
27+
test:
28+
pytest tests/ -v
29+
30+
# 快速测试(失败即停)
31+
test-fast:
32+
pytest tests/ -v -x --ff
33+
34+
# 代码检查
35+
lint:
36+
ruff check src/ tests/
37+
38+
# 代码格式化
39+
format:
40+
black src/ tests/
41+
ruff check --fix src/ tests/
42+
43+
# 构建发布包
44+
build:
45+
python -m build
46+
47+
# 清理临时文件
48+
clean:
49+
rm -rf build/ dist/ *.egg-info
50+
rm -rf .pytest_cache/ .ruff_cache/
51+
find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true
52+
find . -type f -name "*.pyc" -delete 2>/dev/null || true
53+
54+
# 运行示例演示
55+
demo:
56+
@echo "=== 验证真实文献 ==="
57+
realcite tests/fixtures/valid.bib -v
58+
@echo ""
59+
@echo "=== 验证虚假文献 ==="
60+
realcite tests/fixtures/fake.bib -v
61+
@echo ""
62+
@echo "=== 验证混合文献 ==="
63+
realcite tests/fixtures/mixed.bib -v
64+
65+
# 验证示例文件
66+
demo-sample:
67+
realcite examples/sample.bib -v -o examples/report.md
68+
@echo ""
69+
@echo "报告已保存到 examples/report.md"

README.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# RealCite
2+
3+
学术文献引用真实性检测工具 - 检测AI幻觉产生的虚假引用
4+
5+
[English](README_EN.md) | 中文
6+
7+
## 这个工具是干什么的?
8+
9+
当你使用ChatGPT等AI工具辅助写论文时,AI可能会"编造"一些不存在的参考文献(这叫AI幻觉)。RealCite可以自动检测你的BibTeX文件中的引用是否真实存在。
10+
11+
## 从零开始运行(3步)
12+
13+
### 第1步:创建Python环境
14+
15+
```bash
16+
# 使用conda创建新环境(推荐Python 3.9+)
17+
conda create -n realcite python=3.11 -y
18+
conda activate realcite
19+
```
20+
21+
### 第2步:安装RealCite
22+
23+
```bash
24+
# 进入项目目录
25+
cd /data/250010040/workspace/project/realcite
26+
27+
# 安装(开发模式)
28+
pip install -e .
29+
```
30+
31+
### 第3步:运行验证
32+
33+
```bash
34+
# 验证一个BibTeX文件
35+
realcite your_references.bib
36+
37+
# 或者用我们提供的测试文件试试
38+
realcite tests/fixtures/valid.bib -v
39+
```
40+
41+
就这么简单!
42+
43+
## 使用示例
44+
45+
### 命令行使用
46+
47+
```bash
48+
# 最简单的用法:直接验证
49+
realcite references.bib
50+
51+
# 保存报告到文件
52+
realcite references.bib -o report.md
53+
54+
# 输出JSON格式(方便程序处理)
55+
realcite references.bib -f json -o report.json
56+
57+
# 详细模式(显示验证过程)
58+
realcite references.bib -v
59+
60+
# 只用特定的验证源
61+
realcite references.bib --sources dblp,arxiv
62+
```
63+
64+
### Python代码中使用
65+
66+
```python
67+
from realcite import RealCite
68+
69+
# 创建验证器
70+
checker = RealCite()
71+
72+
# 验证BibTeX文件
73+
report = checker.validate("references.bib")
74+
75+
# 查看结果
76+
print(f"总共: {report.total_count} 条引用")
77+
print(f"已验证: {report.verified_count}")
78+
print(f"可疑: {report.suspicious_count}")
79+
print(f"未找到: {report.not_found_count}")
80+
81+
# 导出报告
82+
checker.export_report(report, "report.md", format="markdown")
83+
```
84+
85+
## 输出结果说明
86+
87+
| 状态 | 含义 |
88+
|------|------|
89+
| ✅ 已验证 (verified) | 在学术数据库中找到了匹配的文献 |
90+
| ⚠️ 可疑 (suspicious) | 找到了部分匹配,但置信度不够高,建议人工核实 |
91+
| ❌ 未找到 (not_found) | 在所有数据库中都没找到,很可能是虚假引用 |
92+
| ⛔ 错误 (error) | 验证过程出错(如标题太短无法验证) |
93+
94+
## 验证源说明
95+
96+
RealCite会依次查询以下学术数据库:
97+
98+
| 数据库 | 说明 | 适合验证 |
99+
|--------|------|----------|
100+
| DBLP | 计算机科学领域最权威 | CS会议/期刊论文 |
101+
| Semantic Scholar | AI领域覆盖全面 | AI/ML/NLP论文 |
102+
| arXiv | 预印本权威来源 | arXiv预印本 |
103+
| CrossRef | DOI官方数据库 | 有DOI的任何文献 |
104+
105+
## 项目结构
106+
107+
```
108+
realcite/
109+
├── src/realcite/ # 源代码
110+
│ ├── main.py # 命令行入口
111+
│ ├── core.py # 核心验证逻辑
112+
│ ├── parser.py # BibTeX解析
113+
│ ├── matcher.py # 匹配算法
114+
│ ├── reporter.py # 报告生成
115+
│ └── validators/ # 各数据库验证器
116+
│ ├── dblp.py
117+
│ ├── semantic_scholar.py
118+
│ ├── arxiv.py
119+
│ └── crossref.py
120+
├── tests/ # 测试文件
121+
│ └── fixtures/ # 测试用的bib文件
122+
│ ├── valid.bib # 真实文献
123+
│ ├── fake.bib # 虚假文献
124+
│ └── mixed.bib # 混合文献
125+
├── pyproject.toml # 项目配置
126+
└── README.md # 本文件
127+
```
128+
129+
## 常见问题
130+
131+
**Q: 为什么有些真实的文献被标记为"可疑"?**
132+
133+
A: 可能是因为:
134+
- 标题拼写与数据库中略有不同
135+
- 文献太新,还没被索引
136+
- 非CS领域的文献(DBLP只收录CS)
137+
138+
可以尝试降低阈值:`realcite references.bib --threshold 0.6`
139+
140+
**Q: 验证速度很慢?**
141+
142+
A: 每条引用需要查询多个API,每个API有请求间隔限制。100条引用大约需要3-5分钟。
143+
144+
**Q: 需要API密钥吗?**
145+
146+
A: 基本使用不需要。但如果验证量大,建议申请Semantic Scholar的API Key以提高限额。
147+
148+
## 运行测试
149+
150+
```bash
151+
# 安装测试依赖
152+
pip install pytest pytest-mock
153+
154+
# 运行测试
155+
pytest tests/ -v
156+
```
157+
158+
## 许可证
159+
160+
MIT License

0 commit comments

Comments
 (0)