Awesome—here’s a practical, research-backed upgrade plan for a GitHub-based CV/website, with copy-paste snippets you can drop into your repo right now.
A. Keep it ultra-simple (static HTML only).
Fastest and fine for a one-pager CV like the file I gave you. You can still add SEO/social tags and a print-to-PDF style. GitHub Pages will publish plain HTML without any build step. If you ever add a .nojekyll file, it simply disables Jekyll’s processing. (GitHub Docs)
B. Jekyll + a ready CV theme (zero-code structure).
- modern-resume-theme — minimal CV layout that runs on GitHub Pages. Great if you want a clean resume quickly. (GitHub)
- al-folio — academic-style site with publications, projects, and blog. Heavier than a pure CV, but perfect if you want an “academic homepage” with BibTeX-driven pubs. (GitHub) GitHub Pages supports Jekyll and themes out of the box, and GitHub now recommends deploying Pages via Actions (lets you use any theme/plugin and other SSGs). (GitHub Docs)
C. Build with another static site generator (Hugo/Eleventy/Astro) and deploy with GitHub Actions. Pages can be published from any generator if you use a small workflow (example Actions exist for Hugo). (GitHub Docs)
Add the Jekyll SEO Tag plugin (if you use Jekyll). It injects canonical URLs, JSON-LD, Open Graph, etc., by adding {% seo %} in your layout and enabling the plugin. (GitHub)
Add Open Graph + Twitter Card tags (works for any site).
Put these in <head> so links look great on LinkedIn/Twitter/Slack:
<meta property="og:title" content="Dr. Muhammad Fahad Arshad — CV">
<meta property="og:description" content="Postdoctoral researcher in catalysis, hydrogen, CO₂ valorization.">
<meta property="og:type" content="website">
<meta property="og:url" content="https://YOUR-DOMAIN/">
<meta property="og:image" content="https://YOUR-DOMAIN/assets/og-image.jpg">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Dr. Muhammad Fahad Arshad — CV">
<meta name="twitter:description" content="Postdoctoral researcher in catalysis, hydrogen, CO₂ valorization.">
<meta name="twitter:image" content="https://YOUR-DOMAIN/assets/og-image.jpg">OG is documented at ogp.me; Twitter Cards are similar meta tags. (Open Graph Protocol)
Add structured data (Person via JSON-LD).
Search engines understand your CV better with a <script type="application/ld+json"> block:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Person",
"name": "Muhammad Fahad Arshad",
"jobTitle": "Post-Doctoral Researcher",
"email": "mailto:m.fahad1448@gmail.com",
"telephone": "+8618801225727",
"sameAs": [
"https://scholar.google.com/citations?user=fwiNZasAAAAJ",
"https://orcid.org/0000-0003-1828-9458",
"https://www.linkedin.com/in/mfahadarshad/"
],
"url": "https://YOUR-DOMAIN/"
}
</script>(Use schema.org Person; you can expand it easily.) (Schema.org)
If you go with Jekyll, put your resume content in a data file like _data/resume.yml, then loop over it in your layout. This keeps the HTML clean and makes updates painless. (Bonus: this also makes it easy to generate a one-page print and a long “portfolio” page from the same data.)
- Best: use jekyll-scholar to generate publications from BibTeX (
_bibliography/references.bib). Note: GitHub Pages doesn’t allow most custom Jekyll plugins unless you build with Actions (or build locally) and push the generated_site. (GitHub) - Lightweight fallback: use a Google Scholar CSV include that works on stock GitHub Pages. (GitHub)
- Run Lighthouse (Chrome DevTools → Lighthouse) and fix whatever it flags (core web vitals, SEO basics, etc.). Aim for mobile first. (Chrome for Developers)
- Mark up structure with proper landmarks (
<header>,<main>,<nav>,<footer>), good contrast, keyboard focus states, and alt text. Use WCAG 2.2 quickref as your checklist; MDN’s landmark guidance is a nice companion. (W3C)
Custom domain + HTTPS (e.g., fahadarshad.dev).
Set a CNAME and point DNS to GitHub Pages in repo → Settings → Pages. Enforce HTTPS once DNS resolves. (GitHub Docs)
Analytics without cookies: add Plausible with one script tag (privacy-friendly). (Plausible Analytics)
Contact form on a static site: use Formspree (works on GitHub Pages, no server). (DEV Community)
One-click PDF export: keep great print CSS (what you already have), and optionally auto-generate a fresh PDF on every push with a Puppeteer GitHub Action. (GitHub)
.github/workflows/pages.yml
name: Build and Deploy CV
on:
push:
branches: [ "main" ]
permissions:
contents: read
pages: write
id-token: write
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# If using Jekyll + custom plugins:
- uses: ruby/setup-ruby@v1
with: { ruby-version: '3.2' }
- run: |
bundle install
bundle exec jekyll build -d _site
# If using Hugo, replace the 3 lines above with a Hugo build step.
- uses: actions/upload-pages-artifact@v3
with:
path: _site
deploy:
runs-on: ubuntu-latest
needs: build
steps:
- id: deployment
uses: actions/deploy-pages@v4(Official Pages + custom workflows flow.) (GitHub Docs)
_config.yml
title: "Dr. Muhammad Fahad Arshad — CV"
url: "https://YOUR-DOMAIN"
plugins:
- jekyll-seo-tagIn your HTML layout, inside <head>:
{% raw %}{% seo %}{% endraw %}(Use the script snippet from section 2.) (Schema.org)
(Use the meta tags from section 2.) (Open Graph Protocol)
-
Keep your current clean single-file CV but add:
- SEO tag block (or raw meta if staying pure HTML), JSON-LD Person, OG/Twitter, and a tall 1200×630 OG image. (Open Graph Protocol)
-
Publish on a custom domain and turn on HTTPS. (GitHub Docs)
-
Add a contact form via Formspree (no backend). (DEV Community)
-
Run Lighthouse and fix any accessibility/SEO issues it reports. (Chrome for Developers)
-
If you want an academic homepage, migrate to al-folio or add jekyll-scholar via the Actions workflow so publications auto-render from BibTeX. (GitHub)
-
Optional: auto-generate a PDF on each push with a Puppeteer action. (GitHub)
If you like, I can generate:
- a ready-to-use
pages.ymlworkflow for your exact setup, - a minimal
_config.yml(withjekyll-seo-tag), and - the
<head>block (JSON-LD + OG/Twitter) wired to your links.
Short answer: yes—let’s wire your GitHub Pages CV to real DOI links and auto-refresh your publications & citation counts. The trick is: don’t rely on scraping Google Scholar (it has no public API); instead, use open bibliographic APIs (OpenAlex, Crossref, optional Semantic Scholar) and schedule a GitHub Action to rebuild your publications list regularly. Google Scholar can still be linked for humans to click, but data sync should come from those APIs. (Google for Developers)
-
Can automate (recommended):
- Pull your works by ORCID from OpenAlex, including titles, venues, years, DOIs, OA links, and cited_by_count. (docs.openalex.org)
- For each DOI, look up metadata and a canonical landing URL in Crossref (and is-referenced-by-count as a Crossref citation metric). (GitHub)
- Resolve DOIs via doi.org (redirects to the publisher page). (DOI)
- (Optional) Enrich with Semantic Scholar fields (requires a free API key). (api.semanticscholar.org)
- Rebuild your publications section on a schedule with GitHub Actions (daily/weekly). (GitHub Docs)
-
Can’t/shouldn’t automate directly:
- Google Scholar scraping. There’s no official public API, and automated queries can be blocked; so treat your Scholar page as a human-clickable link, not a data source. (Google for Developers)
-
Data fetch (build-time): Python script grabs works from OpenAlex using your ORCID (0000-0003-1828-9458), validates/enriches each with Crossref, and writes a JSON + an HTML fragment.
-
Scheduled update: GitHub Action runs weekly (or daily), commits any diffs as a PR so you can review changes. (GitHub)
-
Display: Your
index.htmlincludes the generated HTML (or loops over the JSON). For plain HTML sites, the fragment approach is simplest.
#!/usr/bin/env python3
import json, os, re, time, urllib.parse, pathlib, sys
from datetime import datetime
import requests
ORCID = "0000-0003-1828-9458"
OPENALEX_BASE = "https://api.openalex.org"
CROSSREF_BASE = "https://api.crossref.org/works/"
HEADERS = {"User-Agent": "cv-updater/1.0 (mailto:m.fahad1448@gmail.com)"}
OUT_JSON = pathlib.Path("data/publications.json")
OUT_HTML = pathlib.Path("data/publications.html")
MANUAL = pathlib.Path("data/manual_overrides.json") # optional, create if needed
def openalex_works(orcid):
# filter by author ORCID; page through results
url = f"{OPENALEX_BASE}/works?filter=authorships.author.orcid:{orcid}&per_page=200&sort=publication_year:desc"
works = []
while url:
r = requests.get(url, headers=HEADERS, timeout=60)
r.raise_for_status()
data = r.json()
works.extend(data.get("results", []))
url = data.get("meta", {}).get("next_page_url")
return works
def crossref_by_doi(doi):
# Crossref returns metadata including URL and is-referenced-by-count
r = requests.get(CROSSREF_BASE + urllib.parse.quote(doi), headers=HEADERS, timeout=60)
if r.status_code == 404:
return {}
r.raise_for_status()
return r.json().get("message", {})
def doi_to_publisher_url(doi):
# doi.org will redirect in browser; here we prefer Crossref URL when available
return f"https://doi.org/{doi}"
def best_link_from_openalex(work):
# Prefer publisher landing page if present; fallback to OA landing/pdf
loc = (work.get("primary_location") or {}) or {}
landing = (loc.get("landing_page_url") or
(work.get("best_oa_location") or {}).get("landing_page_url"))
pdf = (work.get("best_oa_location") or {}).get("pdf_url")
return landing or pdf
def normalize_authors(work):
authors = []
for a in work.get("authorships", []):
person = a.get("author") or {}
authors.append(person.get("display_name"))
return [x for x in authors if x]
def clean_text(s):
return re.sub(r"\s+", " ", s or "").strip()
def build_record(work, overrides):
doi = (work.get("doi") or "").replace("https://doi.org/", "")
cr = crossref_by_doi(doi) if doi else {}
title = clean_text(work.get("title") or cr.get("title", [None])[0])
year = work.get("publication_year") or (cr.get("issued", {}).get("date-parts", [[None]])[0][0])
venue = (work.get("primary_location") or {}).get("source", {}).get("display_name") \
or (cr.get("container-title") or [None])[0]
url = overrides.get("links", {}).get(doi) if overrides else None
if not url:
url = cr.get("URL") or best_link_from_openalex(work) or (doi_to_publisher_url(doi) if doi else None)
return {
"title": title,
"year": year,
"venue": venue,
"doi": doi or None,
"url": url,
"openalex_id": work.get("id"),
"cited_by_openalex": work.get("cited_by_count"),
"cited_by_crossref": cr.get("is-referenced-by-count"),
"authors": normalize_authors(work),
"updated": datetime.utcnow().isoformat(timespec="seconds") + "Z"
}
def to_html(records):
lines = []
lines.append('<ol class="pub-list">')
for r in records:
parts = []
if r["authors"]:
parts.append(", ".join(r["authors"]))
if r["year"]:
parts.append(f"({r['year']})")
title = r["title"] or "Untitled"
if r["url"]:
title_html = f'<a href="{r["url"]}" target="_blank" rel="noopener">{title}</a>'
else:
title_html = title
if r["venue"]:
parts.append(f"<em>{r['venue']}</em>")
doi_html = f' <a href="https://doi.org/{r["doi"]}">doi:{r["doi"]}</a>' if r.get("doi") else ""
cites = r.get("cited_by_openalex") or r.get("cited_by_crossref")
cite_html = f' <span class="cites">· citations: {cites}</span>' if cites is not None else ""
lines.append(f"<li>{title_html} — {' '.join(parts)}{doi_html}{cite_html}</li>")
lines.append("</ol>")
return "\n".join(lines)
def main():
overrides = {}
if MANUAL.exists():
with open(MANUAL, "r", encoding="utf-8") as f:
overrides = json.load(f)
works = openalex_works(ORCID)
records = [build_record(w, overrides) for w in works]
# Keep only items with a title and year
records = [r for r in records if r["title"] and r["year"]]
OUT_JSON.parent.mkdir(parents=True, exist_ok=True)
with open(OUT_JSON, "w", encoding="utf-8") as f:
json.dump(records, f, indent=2, ensure_ascii=False)
html = to_html(records)
with open(OUT_HTML, "w", encoding="utf-8") as f:
f.write(html)
print(f"Wrote {len(records)} records to {OUT_JSON} and {OUT_HTML}")
if __name__ == "__main__":
main()Notes • OpenAlex filter by author ORCID and fields like cited_by_count are documented. (docs.openalex.org) • Crossref works endpoint, sorting and is-referenced-by-count are documented. (GitHub) • DOI links via doi.org resolve to publisher pages. (DOI)
<section class="card">
<h2>Publications (auto-updated)</h2>
<!-- This file is generated by the scheduled Action -->
<!-- If you don’t use Jekyll, just include it with an iframe OR paste its contents during build -->
<div id="pubs">
<!-- build step will inline data/publications.html into your index.html -->
</div>
</section>If you’re using plain HTML, the Action below can inline data/publications.html into index.html (simple string replacement), or you can just copy the generated HTML into your page once.
Create data/manual_overrides.json for special links or title fixes:
{
"links": {
"10.1002/kin.21708": "https://onlinelibrary.wiley.com/doi/10.1002/kin.21708"
}
}.github/workflows/publications.yml
name: Refresh publications
on:
schedule:
- cron: "0 6 * * 1" # every Monday 06:00 UTC
workflow_dispatch: {}
permissions:
contents: write
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- name: Install deps
run: python -m pip install --upgrade pip requests
- name: Update publications
run: python scripts/update_publications.py
# (Optional) inline the generated HTML into index.html between markers
- name: Inline publications into index.html
run: |
START="<!-- PUBS:START -->"
END="<!-- PUBS:END -->"
PUBS=$(cat data/publications.html | sed 's/\\/\\\\/g' | sed 's/"/\\"/g')
perl -0777 -pe "s/${START}.*?${END}/${START}\n${PUBS}\n${END}/s" -i index.html
- name: Create PR with changes
uses: peter-evans/create-pull-request@v6
with:
commit-message: "chore(pubs): auto-refresh publications"
title: "Auto-refresh publications"
body: "This PR updates publications.json/html from OpenAlex/Crossref."
branch: "auto/refresh-pubs"
delete-branch: trueThis uses cron-scheduled Actions and creates a PR when your publications or counts change. (GitHub Docs)
- Real links: Every item uses Crossref’s
URLwhen available; otherwise we fall back to the best OpenAlex OA/publisher link, orhttps://doi.org/<doi>to resolve. (GitHub) - Citation counts: We display OpenAlex
cited_by_count(consistent and fast); optionally also show Crossrefis-referenced-by-count. Counts will differ from Google Scholar (different coverage). (docs.openalex.org) - No scraping risk: We keep a normal “View my Google Scholar” link for humans, but all data comes from public APIs. (Google Scholar)
- Fails safe: If an API is down, your site still builds; previous JSON/HTML remains.
- Swap to Jekyll +
jekyll-scholarand generate the list from a BibTeX file produced by the script. This needs building via Actions (custom Jekyll plugin) instead of stock Pages. (GitHub) - Add Semantic Scholar enrichments (fields like influential citations) if you add a free API key. (api.semanticscholar.org)
If you want, I can tailor these snippets to your current repo layout (e.g., add the inlining markers into your index.html for you and prefill with your Scholar/ORCID links).