Breaking & Scraping Pahe.in

Posted on September 3, 2021 By RoofMan

From 2 years ago, i started a hacking project for challenge in WebScraping domain to scrape the whole Pahe.ph website which leaks movies, series, anime & more.

History

At first Pahe website was plain Self-Hosted Offshore WordPress site secured with Surcuri WAF System & links protected using SoraLink WordPress Plugin and there was no other implemented security measures on the website (like Cloudflare, Content Obfuscation & etc).

At this stage the scraper was working fine until they switched to Cloudflare Fronting. After this i needed a way to bypass Cloudflare security & captcha. and after some reconnaissance and probing i figured out a way to bypass Cloudflare in automated manners, and the scraper came back online.

After that they implemented content obfuscation to obfuscate the links section in the website using custom JS algorithm which i reversed and was able break and extract the links.

Results

So at the end i developed a Webscraper capable to scrape the whole website and address these issues

Bypass Surcuri WAF
Handle SoraLink & Extract Direct Links
Bypass Cloudflare Fronting
Extract Posts in Full Details
Decode Download Section Obfuscation
Stateful Scraper which support Resume, Failsafe, Looping operations.

Dependency Diagram

Used Technologies

Public Release

I decided to released the scraper to the internet on GitHub, and the project is in active development right now from my side for maintenance and enhancement for a while, so check the active log on GitHub to know the latest To-Do points.

Build Releases

PaheScraper Alpha 2.5 (Windows) – Latest

How To Use ?

Simpe, just start the application from “PaheScrapper.exe”
You gonna find multiple modes (Continue, Resync, New, Loop, State)
- Continue: continue previous scraping loaded from manager state file.
- Resync: force the scraper to rescan all the manager state file and resync with pahe.in website to the last state.
- New: start new and clean scraping and discard old manager state if exist.
- Loop: start the scraper in loop mode to scrape and then resync forever.
- State: get the current scraper state.
After the scraper finish the website, you gonna find output file with the website context in the scraper directory or always you can check the manager state file to get the non-finished scraping.

Note: Scraper takes long time for the first run to build the manager state (week or more depend on your configuration) to scrape the whole website and takes small amount of time to resync and i’m working and testing the resync to make sure it fast enough to resync with the website daily cause they keep changing download links frequently.

Info: “PaheScrapper.exe.config” is the configuration for the scraper so leave default if you don’t know what you are doing.

Scraper in Action

Category: Technology Tags: cloudflare, decoding, github, hacking, obfuscation, pahe.in, pahe.ph, scraper, soralink, surcuri

15 Comments on “Breaking & Scraping Pahe.in”

Sunil
September 24, 2021

Amazing 🥰
Hope u provide either provide the scrap data or tutorial on how to run this locally by ourselves.

LikeLike

Reply
- RoofMan
  September 25, 2021
  
  Just Download the Alpha Version and Run the Executable File
  https://github.com/roofman2008/Pahe.ph-Scraper/releases/tag/Alpha
  
  All the scrapped data stored in Json file in the program directory.
  
  LikeLike
  
  Reply
kemo
November 15, 2021

i really dont know how to use the scraper any help or tutorial

LikeLike

Reply
- RoofMan
  November 20, 2021
  
  I added some in this article in section “How To Use ?”, thanks for reporting.
  
  LikeLike
  
  Reply
kemo
November 15, 2021

and why only support google chrome version 92

LikeLike

Reply
- RoofMan
  November 20, 2021
  
  Fixed in the new version, thanks for reporting.
  
  LikeLike
  
  Reply
Sunilroy007
November 24, 2021

🔥🔥🔥🔥

LikeLike

Reply
kemo
November 24, 2021

thank you thats make sense now
and thank you for this amazing tool

LikeLike

Reply
Amin
November 26, 2021

Hey, do you plan to implement auto-start for this? So this scraper will auto-start when windows start and continue the scraping. I want to use this in an RDP
Also thanks for this, i think it’s a good tool, i’m in process of scraping

LikeLike

Reply
- RoofMan
  November 26, 2021
  
  I will implement this feature after the Alpha version, but now for monitored testing i’m leaving it that way.
  
  LikeLike
  
  Reply
Ghost
November 27, 2021

Is there any post processing 9f scrapped data as i have completed the scrapping but nothing in the links section. Used alpha 2.5.

LikeLike

Reply
- RoofMan
  November 27, 2021
  
  Because Pahe.ph changed the download obfuscation mechanism, i’m upgrading the scraper with new set of capabilities to process these changes.
  
  Check this Issue
  https://github.com/roofman2008/Pahe.ph-Scraper/issues/11
  
  Please when you have a problem, post in the Issue section on GitHub
  
  LikeLike
  
  Reply
Mex B
April 30, 2022

Error: session not created: This version of ChromeDriver only supports Chrome version 96
Current browser version is 101.0.4951.41

LikeLike

Reply
Yogesh Pandey
December 16, 2023

i am thinking about skipping the steps involved in order to get final drive or mega link , is it possible ? i was able to get to final intercelestial link but there’s some level low javascript thing happening with eval

LikeLike

Reply
- RoofMan
  December 20, 2023
  
  let me clarify the process of you, first you scan the website to extract cloaked links for the media files.
  these files are enough for you if you don’t want the direct link.
  if you want the direct link you need to solve the cloaked links.
  you have 2 options, one option is to use browser automation like me to solve the issue, it’s buggy from browser side or option two, solve the javascript that protect the website natively.
  
  Natively is better and faster in terms of performance and without any browser issues but it requires a lot of reverse engineering for now. I like that approach but i don’t have time. if you can invest time in it, i can support you by consultation freely. but it will be you job cowboy.
  
  LikeLike
  
  Reply

RoofMan Official Blog

"If we teach today as we taught yesterday, we rob our children of tomorrow."

RoofMan Official Blog

"If we teach today as we taught yesterday, we rob our children of tomorrow."

Breaking & Scraping Pahe.in

History

Results

Used Technologies

Public Release

Build Releases

How To Use ?

Scraper in Action

15 Comments on “Breaking & Scraping Pahe.in”

Leave a comment Cancel reply

Subscribe to Blog via Email

RSS

Categories

RoofMan Official Blog

"If we teach today as we taught yesterday, we rob our children of tomorrow."

RoofMan Official Blog

"If we teach today as we taught yesterday, we rob our children of tomorrow."

Breaking & Scraping Pahe.in

History

Results

Used Technologies

Public Release

Build Releases

How To Use ?

Scraper in Action

Rate this:

Share this:

Related

15 Comments on “Breaking & Scraping Pahe.in”

Leave a comment Cancel reply

Subscribe to Blog via Email

RSS

Categories