mirror of
https://git.bakhai.co.in/FbIN/4Get.git
synced 2025-11-05 04:21:31 +05:30
Synced
greppr fix (319640cd77) fix wikipedia crash (cdf958d293) fix MDN answers not rendering properly (2d63475b07) these comments were too long (ae31274db9) re-added stackoverflow instant answers (20ef5b3e3a) doc config changes number twoo (362cf61508) remove backtick (2ca8fb0006) add more hacks (da1ea1d6e8) this was so much pain to figure out (dea8b0a362) i always forget the fucking config (4215f2678d) fix syntax highlighter (2c2bd28a9f) fix #2 for real this time (7c970031d0) added cara.app (acd02d83d4)
This commit is contained in:
parent
145c1e1388
commit
29e1532be0
7 changed files with 1557 additions and 430 deletions
|
|
@ -1,4 +1,4 @@
|
|||
# 4Get configuation options
|
||||
# 4get configuation options
|
||||
|
||||
Welcome! This guide assumes that you have a working 4get instance. This will help you configure your instance to the best it can be!
|
||||
|
||||
|
|
@ -9,37 +9,67 @@ Welcome! This guide assumes that you have a working 4get instance. This will hel
|
|||
4. The captcha font is located in `data/fonts/captcha.ttf`
|
||||
|
||||
# Cloudflare bypass (TLS check)
|
||||
**Note: this only allows you to bypass the browser integrity checks. Captchas & javascript challenges will not be bypassed.**
|
||||
>These instructions have been updated to work with Debian 13 Trixie.
|
||||
|
||||
Configuring this lets you fetch images sitting behind Cloudflare and allows you to scrape the **Yep** & the **Mwmbl** search engines. Please be aware that APT will fight against you and will re-install the openSSL-version of curl constantly when updating.
|
||||
**Note: this only allows you to bypass the browser integrity checks. Captchas & javascript challenges will not be bypassed by this program!**
|
||||
|
||||
First, follow these instructions. Only install the Firefox modules:
|
||||
Configuring this lets you fetch images sitting behind Cloudflare and allows you to scrape the **Yep** search engine.
|
||||
|
||||
https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build
|
||||
|
||||
Once you did this, you should be able to run the following inside your terminal:
|
||||
To come up with this set of instructions, I used [this guide](https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#native-build) as a reference, but trust me you probably want to stick to what's written on this page.
|
||||
|
||||
First, compile curl-impersonate (the firefox flavor).
|
||||
```sh
|
||||
$ curl_ff117 --version
|
||||
curl 8.1.1 (x86_64-pc-linux-gnu) libcurl/8.1.1 NSS/3.92 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 nghttp2/1.56.0
|
||||
Release-Date: 2023-05-23
|
||||
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp ws wss
|
||||
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe UnixSockets zstd
|
||||
```
|
||||
Now, after compiling, you should have a `libcurl-impersonate-ff.so` sitting somewhere. Mine (on my debian install) is located at `/usr/local/lib/libcurl-impersonate-ff.so`.
|
||||
|
||||
Find the `libcurl.so.4` file used by your current installation of curl. For me, this file is located at `/usr/lib/x86_64-linux-gnu/libcurl.so.4`
|
||||
|
||||
Now comes the sketchy part: replace `libcurl.so.4` with `libcurl-impersonate-ff.so`. You can do this in the following way:
|
||||
```sh
|
||||
sudo rm /usr/lib/x86_64-linux-gnu/libcurl.so.4
|
||||
sudo cp /usr/local/lib/libcurl-impersonate-ff.so /usr/lib/x86_64-linux-gnu/libcurl.so.4
|
||||
git clone https://github.com/lwthiker/curl-impersonate/
|
||||
cd curl-impersonate
|
||||
sudo apt install build-essential pkg-config cmake ninja-build curl autoconf automake libtool python3-pip libnss3 libnss3-dev
|
||||
mkdir build
|
||||
cd build
|
||||
../configure
|
||||
make firefox-build
|
||||
sudo make firefox-install
|
||||
sudo ldconfig
|
||||
```
|
||||
|
||||
Make sure to restart your webserver and/or PHP daemon, otherwise it will keep using the old library. You should now be able to bypass Cloudflare's shitty checks!!
|
||||
Now, after compiling, you should have a `libcurl-impersonate-ff.so` sitting somewhere. Mine is located at `/usr/local/lib/libcurl-impersonate-ff.so`. Do some patch fuckery:
|
||||
|
||||
```sh
|
||||
sudo su
|
||||
LD_PRELOAD=/usr/local/lib/libcurl-impersonate-ff.so
|
||||
CURL_IMPERSONATE=firefox117
|
||||
patchelf --set-soname libcurl.so.4 /usr/local/lib/libcurl-impersonate-ff.so
|
||||
ldconfig
|
||||
```
|
||||
|
||||
From here, you will have a broken curl:
|
||||
```sh
|
||||
root@fuckedmachine:/# curl --version
|
||||
curl: /usr/local/lib/libcurl.so.4: no version information available (required by curl)
|
||||
curl: symbol lookup error: curl: undefined symbol: curl_global_trace, version CURL_OPENSSL_4
|
||||
```
|
||||
|
||||
Or not... During testing, I've seen that sometimes curl still works for some reason. What really matters is the output of this command:
|
||||
```
|
||||
root@fuckedmachine:/# php -r 'print_r(curl_version());' | grep ssl_version
|
||||
[ssl_version_number] => 0
|
||||
[ssl_version] => NSS/3.92
|
||||
```
|
||||
|
||||
It **MUST** say NSS, otherwise it didn't work. There's also the option of using the [forked project](https://github.com/lexiforest/curl-impersonate), but that garbage doesn't support NSS. I'm kind of against impersonating chrome cause you never know when Google is gonna add more fingerprinting bullshit.
|
||||
|
||||
Appendix: If you want a functioning `curl` command line utility again in case it doesn't work anymore, you can do the following hack:
|
||||
|
||||
```
|
||||
sudo apt remove curl
|
||||
sudo ln -s /usr/local/bin/curl-impersonate-ff /usr/bin/curl
|
||||
```
|
||||
|
||||
# Robots.txt
|
||||
Make sure you configure this right to optimize your search engine presence! Head over to `/robots.txt` and change the `4g.flossboxin.org.in` domain to your own domain.
|
||||
Make sure you configure this right to optimize your search engine presence! Head over to `/robots.txt` and change the 4get.ca domain to your own domain.
|
||||
|
||||
# Server listing
|
||||
To be listed on https://4get.ca/instances , you must contact *any* of the people in the server list and ask them to add you to their list of instances in their configuration. The instance list is distributed, and I don't have control over it.
|
||||
|
||||
If you see spammy entries in your instances list, simply remove the instance from your list that pushes the offending entries.
|
||||
|
||||
# Proxies
|
||||
4get supports rotating proxies for scrapers! Configuring one is really easy.
|
||||
|
|
@ -60,4 +90,4 @@ Make sure you configure this right to optimize your search engine presence! Head
|
|||
Done! The scraper you chose should now be using the rotating proxies. When asking for the next page of results, it will use the same proxy to avoid detection!
|
||||
|
||||
## Important!
|
||||
If you ever test out a `socks5` proxy locally on your machine and find out it works but doesn't on your server, try supplying the `socks5_hostname` protocol instead. Hopefully this tip can save you 3 hours of your life!
|
||||
If you ever test out a `socks5` proxy locally on your machine and find out it works but doesn't on your server, try supplying the `socks5_hostname` protocol instead. Hopefully this tip can save you 3 hours of your life!
|
||||
Loading…
Add table
Add a link
Reference in a new issue