Potential Security Risks of Open Source Blogs: How to Protect Personal Information from Leaks

Overview

GitHub Pages, as a free open-source blog hosting platform, is widely popular due to its convenience and cost-free nature. However, the free version requires repositories to be public to provide public access services, a feature that may lead to unexpected information leakage risks.

Even if the article content itself does not contain sensitive information, the blog’s source code repository might inadvertently leak personal privacy information. This article will explore these potential risks and provide practical solutions.

🔍 Common Types of Information Leaks

Chinese Sensitive Terms

The following Chinese terms may contain sensitive personal information; it is recommended to check for them before committing code:

  • Password
  • Account
  • ID Card
  • Bank Card
  • Alipay
  • WeChat
  • Phone Number
  • Home Address
  • Workplace
  • Social Security Card
  • Driver’s License
  • Passport
  • Credit Card

English Keywords

Pay special attention to the following keywords in English environments:

  • username
  • password
  • account
  • key
  • ini
  • credential
  • card
  • bank
  • alipay
  • wechat
  • passport
  • id
  • phone
  • address
  • company

Using Regular Expressions for Detection

You can use the following regular expression to scan the repository for potential sensitive information:

(密码|账号|身份证|银行卡|支付宝|微信|手机号|家庭住址|工作单位|社保卡|驾驶证|护照|信用卡|username|password|passwd|account|key\s*:|\.ini|credential|card|bank|alipay|wechat|passport|id\s*:|phone|address|company)

Scanning in VSCode

If you use VSCode as your blog editor, you can perform a site-wide sensitive information scan by following these steps:

  1. Open VSCode
  2. Use the shortcut Ctrl+Shift+F (Windows/Linux) or Cmd+Shift+F (Mac) to open the global search
  3. Enter the above regular expression in the search box
  4. Enable regex mode (click the .* icon next to the search box)
  5. Click search and check the results for potential sensitive information

VSCode Regex Search Example

🕰️ Information Leaks in Git History

Git’s version history might contain sensitive information from deleted files. Even if the current code has no sensitive content, these might still be preserved in historical commits.

Scanning Git History

You can use a simple script to scan the historical commit information of an open-source blog to check for information leaks.

Cleaning Git History

If you confirm the need to clean sensitive information from the Git history, you can use the following method:

⚠️ Important Reminder: Performing the following operations will permanently delete the Git history. Please be sure to back up important data and ensure you fully understand the meaning of the commands.

# Reset to the first commit (preserves working directory changes)
git reset --soft ${first-commit}

# Force push to the remote repository
git push -f

Note: If you need to preserve the complete commit history, do not use the method above.

In addition to manual checks, you can use professional tools for more comprehensive scanning:

TruffleHog

TruffleHog is a powerful tool for discovering, validating, and analyzing leaked credentials.

TruffleHog Logo

Features:

  • GitHub Stars: 17.2k
  • Forks: 1.7k
  • Supports multiple scanning modes
  • Can detect deeply nested sensitive information

🔒 Alternative Solutions for Secure Blog Publishing

If you are concerned about the security risks posed by public repositories, consider the following alternatives:

1. Use GitHub Pro

  • GitHub Pro supports publishing Pages from private repositories
  • Cost: Approximately $4 per month
  • Advantage: Maintains source code privacy while enjoying the convenience of GitHub Pages

2. Use Cloudflare Pages

  • Set the repository to private
  • Deploy via Cloudflare Pages
  • Advantage: Completely free, supports private repositories

3. Dual Repository Strategy

  • Private Repository: Stores articles being edited and drafts
  • Public Repository: Stores only the final published articles
  • Advantage: Maximizes protection for drafts and unpublished content
  • Note: If using comment systems like giscus that depend on GitHub, a public repository is still required

📝 Best Practice Recommendations

  1. Regular Audits: Periodically use scanning tools to check the repository for sensitive information.
  2. Pre-Commit Checks: Check for sensitive information before each code commit.
  3. Use .gitignore: Properly configure the .gitignore file to exclude sensitive files.
  4. Environment Variables: Store sensitive configurations in environment variables, not in the code repository.
  5. Draft Management: Consider using a specialized draft management system to prevent drafts from being accidentally committed.

🎯 Summary

While open-source blogs are convenient, they do carry the risk of information leakage. By using appropriate tools and methods, we can effectively mitigate these risks. Choosing the right publishing platform and strategy allows us to enjoy the convenience of open source while protecting personal privacy and security.

Remember, information security is an ongoing process that requires constant vigilance.

Reference Resources