Skip to content

Commit 672f471

Browse files
capotejcharmcrush
andauthored
Add paper management for PDF articles with arXiv support (#73)
* Add implementation plan for PDF paper feature * working PDFs and links * Restrict paper edit/delete functionality to authenticated users - Updated PapersController to use allow_unauthenticated_access for all public actions - Modified papers index view to only show edit/delete icons for authenticated users - Added edit/delete icons to paper show view, visible only to authenticated users - Follows the same pattern as links controller/views 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Add arXiv URL support for PDF paper creation * Detect arXiv abstract URLs (arxiv.org/abs/) and treat them as PDF URLs * Convert arXiv abstract URLs to PDF URLs (replacing /abs/ with /pdf/) * Extract title and description from arXiv abstract pages using MetaInspector * Improve PDF download error handling with HTTP status code checking * Remove unnecessary .pdf extension from arXiv URLs This enhancement allows users to submit arXiv abstract URLs which are automatically converted to PDF URLs, with metadata extracted from the abstract page. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * previews and admin controls * only show add paper if logged in * fix styling * remove papers/show, fix styling * Link to arXiv PDF directly instead of downloading from Active Storage When rendering papers in papers/index, if the paper is for an arXiv link, make the title/thumbnail link to the pdf version of the arXiv URL, instead of downloading it from Active Storage. This change: 1. Adds helper methods to the Paper model to identify arXiv papers and generate PDF URLs 2. Updates the papers index view to link directly to arXiv PDFs when appropriate 3. Maintains Active Storage download links for non-arXiv papers 4. Uses the Active Storage preview functionality regardless of the link target The solution implements the logic in the model where it belongs, keeping the view clean and maintainable. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Remove redundant "View on arXiv" links Now that we link directly to arXiv PDFs from the title and thumbnail, the separate "View on arXiv" links are redundant and have been removed. <tool_call> Generated with Crush Co-Authored-By: Crush <[email protected]> * Fix redirects to non-existent papers#show - Changed redirect in papers#update from @paper to papers_path - Changed cancel link in papers/edit from @paper to papers_path - Removed references to papers#show since there's no show route <tool_call> Generated with Crush Co-Authored-By: Crush <[email protected]> * Implement browser rendering for non-arxiv papers and prevent Turbo prefetching - Add view action to PapersController that renders papers in browser instead of downloading - Add view_url method to Paper model to determine appropriate viewing URL - Update papers index view to use view_url for non-arxiv papers - Add data-turbo attributes to prevent prefetching of PDF links - Add new view route for papers This allows non-arxiv papers to be viewed directly in the browser rather than forcing a download, while maintaining existing behavior for arxiv papers. Also prevents performance issues caused by Turbo prefetching large PDF files when navigating back to the papers index. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Refactor papers viewing logic into helper and add tests - Move paper viewing logic from model to PapersHelper - Remove view_url method from Paper model - Create paper_view_path helper that handles all paper viewing cases: * arxiv papers -> arxiv PDF URL * papers with attachments -> inline blob path * papers with external URLs -> external URL - Add comprehensive tests for the helper - Update papers index view to use the new helper - Maintain download action in controller for backward compatibility This refactoring centralizes the paper viewing logic in a helper and removes the need for a separate view action while maintaining all existing functionality. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Remove unused download action from papers controller and routes - Remove download action from PapersController since it's no longer used - Remove download route since all paper viewing is now handled by the helper - All papers now either render inline or link directly to external URLs This simplifies the codebase by removing unused functionality while maintaining all existing user-facing behavior. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Position external link icon inline with paper titles 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * remove PLAN.md * Fix links controller tests by properly mocking HTTP requests - Add webmock to test helper for HTTP request stubbing - Update links controller tests to properly stub both HEAD and GET requests - Fix string literal style issues These changes ensure that the links controller tests properly handle both PDF and non-PDF URL submissions with appropriate HTTP request mocking. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Add webmock gem for HTTP request stubbing in tests - Add webmock gem to Gemfile - Update Gemfile.lock with webmock dependencies This enables proper HTTP request mocking in controller tests. 💘 Generated with Crush Co-Authored-By: Crush <[email protected]> * Refactor PDF processing logic into PdfPaperCreator service Moved the PDF downloading and analysis logic from LinksController to a dedicated PdfPaperCreator service object. This improves code organization and separation of concerns. Added tests for the new service. * fix paper/link links * install erb_lint and lint all ERB/html files * fix styling of links/papers heading * fix paper path * fix edit/delete buttons for papers/index * add papers partial * lint --------- Co-authored-by: Crush <[email protected]>
1 parent 5cab851 commit 672f471

30 files changed

+583
-38
lines changed

Gemfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ group :development, :test do
5252

5353
gem "ruby-lsp"
5454

55-
gem 'erb_lint', require: false
55+
gem "erb_lint", require: false
5656
end
5757

5858
group :development do
@@ -64,6 +64,7 @@ group :test do
6464
# Use system testing [https://guides.rubyonrails.org/testing.html#system-testing]
6565
gem "capybara"
6666
gem "selenium-webdriver"
67+
gem "webmock"
6768
end
6869

6970
gem "tailwindcss-rails", "3.3.2"

Gemfile.lock

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,9 @@ GEM
104104
xpath (~> 3.2)
105105
concurrent-ruby (1.3.5)
106106
connection_pool (2.5.3)
107+
crack (1.0.0)
108+
bigdecimal
109+
rexml
107110
crass (1.0.6)
108111
date (3.4.1)
109112
debug (1.11.0)
@@ -162,6 +165,7 @@ GEM
162165
raabro (~> 1.4)
163166
globalid (1.2.1)
164167
activesupport (>= 6.1)
168+
hashdiff (1.2.0)
165169
http-cookie (1.0.8)
166170
domain_name (~> 0.5)
167171
i18n (1.14.7)
@@ -452,6 +456,10 @@ GEM
452456
activemodel (>= 6.0.0)
453457
bindex (>= 0.4.0)
454458
railties (>= 6.0.0)
459+
webmock (3.25.1)
460+
addressable (>= 2.8.0)
461+
crack (>= 0.3.2)
462+
hashdiff (>= 0.4.0, < 2.0.0)
455463
websocket (1.2.11)
456464
websocket-driver (0.8.0)
457465
base64
@@ -508,6 +516,7 @@ DEPENDENCIES
508516
turbo-rails
509517
tzinfo-data
510518
web-console
519+
webmock
511520

512521
BUNDLED WITH
513522
2.6.2

app/controllers/links_controller.rb

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,14 @@ def feed
2222
end
2323

2424
def create
25-
@link = Link.new(link_params)
26-
if @link.save
27-
redirect_to links_path
25+
url = link_params[:url]
26+
27+
# Check if the URL points to a PDF and create a paper if it does
28+
paper = PdfPaperCreator.create_from_url(url)
29+
if paper
30+
redirect_to papers_path
2831
else
29-
render :new, status: :unprocessable_content
32+
create_link
3033
end
3134
end
3235

@@ -49,4 +52,13 @@ def destroy
4952
def link_params
5053
params.expect(link: [ :title, :description, :url ])
5154
end
55+
56+
def create_link
57+
@link = Link.new(link_params)
58+
if @link.save
59+
redirect_to links_path
60+
else
61+
render :new, status: :unprocessable_content
62+
end
63+
end
5264
end
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
class PapersController < ApplicationController
2+
allow_unauthenticated_access only: %i[ index ]
3+
4+
def index
5+
@papers = Paper.order(created_at: :desc).page(params[:page])
6+
end
7+
8+
def edit
9+
@paper = Paper.find(params[:id])
10+
end
11+
12+
def update
13+
@paper = Paper.find(params[:id])
14+
if @paper.update(paper_params)
15+
redirect_to papers_path, notice: "Paper was successfully updated."
16+
else
17+
render :edit, status: :unprocessable_content
18+
end
19+
end
20+
21+
def destroy
22+
@paper = Paper.find(params[:id])
23+
@paper.destroy
24+
redirect_to papers_path, notice: "Paper was successfully deleted."
25+
end
26+
27+
private
28+
29+
def paper_params
30+
params.require(:paper).permit(:title, :description)
31+
end
32+
end

app/helpers/papers_helper.rb

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
module PapersHelper
2+
include Rails.application.routes.url_helpers
3+
4+
def paper_view_path(paper)
5+
if paper.arxiv?
6+
paper.arxiv_pdf_url
7+
elsif paper.pdf.attached?
8+
rails_blob_path(paper.pdf, disposition: "inline")
9+
elsif paper.url.present?
10+
paper.url
11+
else
12+
nil
13+
end
14+
end
15+
end

app/models/paper.rb

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
require "net/http"
2+
require "uri"
3+
4+
class Paper < ApplicationRecord
5+
validates_presence_of :url
6+
validates :url, uniqueness: true
7+
8+
has_one_attached :pdf
9+
10+
before_create :set_title_and_desc
11+
12+
paginates_per 15
13+
14+
def arxiv?
15+
url&.include?("arxiv.org")
16+
end
17+
18+
def arxiv_pdf_url
19+
return nil unless arxiv?
20+
21+
if url.include?("arxiv.org/abs/")
22+
url.gsub("arxiv.org/abs/", "arxiv.org/pdf/")
23+
elsif url.include?("arxiv.org/pdf/")
24+
url
25+
else
26+
nil
27+
end
28+
end
29+
30+
def display_url
31+
arxiv? ? arxiv_pdf_url : nil
32+
end
33+
34+
private
35+
36+
def set_title_and_desc
37+
# Try to get title and description from the PDF metadata or URL
38+
self.title ||= extract_title_from_url
39+
self.description ||= "PDF document from #{url}"
40+
end
41+
42+
def extract_title_from_url
43+
# Extract filename from URL if possible
44+
uri = URI.parse(url)
45+
filename = File.basename(uri.path, ".*")
46+
47+
# If filename is meaningful, use it as title
48+
if filename.present? && filename != "index" && filename.length > 3
49+
filename.gsub(/[-_]/, " ").titleize
50+
else
51+
"Untitled Paper"
52+
end
53+
end
54+
end

app/services/pdf_paper_creator.rb

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
class PdfPaperCreator
2+
require "net/http"
3+
require "uri"
4+
require "digest"
5+
require "stringio"
6+
7+
def initialize(url)
8+
@url = url
9+
end
10+
11+
def self.create_from_url(url)
12+
new(url).create
13+
end
14+
15+
def create
16+
return false unless pdf_url?(@url)
17+
18+
paper = Paper.new(url: pdf_url)
19+
20+
# Set title and description
21+
if @url.include?("arxiv.org/abs/")
22+
# Use metainspector to get title and description from the abstract page
23+
page = MetaInspector.new(@url)
24+
paper.title = page.best_title
25+
paper.description = page.best_description
26+
else
27+
paper.title = "PDF Document"
28+
paper.description = "PDF document from #{@url}"
29+
end
30+
31+
if paper.save
32+
# Download and attach the PDF
33+
download_and_attach_pdf(paper, pdf_url)
34+
paper
35+
else
36+
false
37+
end
38+
end
39+
40+
private
41+
42+
def pdf_url?(url)
43+
return false unless url
44+
45+
# Special handling for arxiv URLs
46+
if url.include?("arxiv.org/abs/")
47+
return true
48+
end
49+
50+
uri = URI.parse(url)
51+
return false unless uri.is_a?(URI::HTTP)
52+
53+
# Make a HEAD request to check content type
54+
response = Net::HTTP.new(uri.host, uri.port)
55+
response.use_ssl = uri.scheme == "https"
56+
begin
57+
head_response = response.request_head(uri.request_uri)
58+
content_type = head_response["content-type"]
59+
content_type&.include?("application/pdf")
60+
rescue
61+
false
62+
end
63+
end
64+
65+
def pdf_url
66+
if @url.include?("arxiv.org/abs/")
67+
@url.gsub("arxiv.org/abs/", "arxiv.org/pdf/")
68+
else
69+
@url
70+
end
71+
end
72+
73+
def download_and_attach_pdf(paper, url)
74+
# Download the PDF and attach it to the paper
75+
begin
76+
uri = URI.parse(url)
77+
response = Net::HTTP.new(uri.host, uri.port)
78+
response.use_ssl = uri.scheme == "https"
79+
http_response = response.get(uri.request_uri)
80+
81+
# Check if the response is successful
82+
if http_response.code == "200"
83+
paper.pdf.attach(
84+
io: StringIO.new(http_response.body),
85+
filename: "#{Digest::SHA2.hexdigest(url)}.pdf",
86+
content_type: "application/pdf"
87+
)
88+
else
89+
Rails.logger.error "Failed to download PDF: HTTP #{http_response.code} for #{url}"
90+
end
91+
rescue => e
92+
Rails.logger.error "Failed to download PDF: #{e.message}"
93+
end
94+
end
95+
end

app/views/links/index.html.erb

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,29 @@ Interesting Links | <%= Rails.application.config.site_name %>
1414

1515
<div class="max-w-2xl mx-auto">
1616
<header class="mb-8">
17-
<h1 class="text-3xl font-bold text-gray-900 dark:text-white">
18-
Interesting Links
19-
</h1>
17+
<div class="flex justify-between items-center">
18+
<h1 class="text-3xl font-bold text-gray-900 dark:text-white">
19+
Interesting Links
20+
</h1>
21+
<% if @links.any? && authenticated? %>
22+
<%= link_to "Add Link", new_link_path, class: "inline-flex items-center px-4 py-2 border border-transparent text-sm font-medium rounded-md text-white bg-indigo-600 hover:bg-indigo-700 dark:bg-indigo-500 dark:hover:bg-indigo-600" %>
23+
<% end %>
24+
</div>
2025
</header>
2126

22-
<div class="space-y-8">
23-
<%= render @links %>
24-
</div>
25-
<%= paginate @links %>
26-
</div>
27+
<% if @links.any? %>
28+
<div class="space-y-8">
29+
<%= render @links %>
30+
</div>
31+
<%= paginate @links %>
32+
<% else %>
33+
<div class="text-center py-12">
34+
<p class="text-gray-500 dark:text-gray-400">No links available yet.</p>
35+
<% if authenticated? %>
36+
<div class="mt-4">
37+
<%= link_to "Add your first link", new_link_path, class: "inline-flex items-center px-4 py-2 border border-transparent text-sm font-medium rounded-md text-white bg-indigo-600 hover:bg-indigo-700 dark:bg-indigo-500 dark:hover:bg-indigo-600" %>
38+
</div>
39+
<% end %>
40+
</div>
41+
<% end %>
42+
</div>

0 commit comments

Comments
 (0)