Automatically Validate Http Proxies

By:


Lets say you downloaded a long list of Web proxy servers. Now you are stuck with the task of weeding out the proxies that are dead, slow, fake, or otherwise unusable. There are some applications out there that claim to validate proxy servers. The most common problem with these applications: they are excruciatingly slow. These apps also tend to get stuck once in a while. And, if your list of proxies is too long, these applications may crash altogether because of numerous memory leaks and other such examples of fine programming.

I would like to bring your attention to the following, hopefully, useful script that will go through some very long proxy lists in just a minute or two and will get rid of the trash. A few words about how it works are in order. I created a simple HTML page on my Web server (see $pvcurl variable below). This page contains a unique text string ($pvcstring variable).

The first step is to ping the proxy and see if it responds in a reasonable period of time. The ping commands are launched in background to speed up the process. If the proxy does respond, the next step is to use wget to see if you can download the $pvcurl and match the $pvcstring. If everything checks out, the proxy is added to the final list of good proxies. Just as the ping command, the wget threads are started in background mode with a 30-second timeout.

#!/bin/ksh

configure() {
pvcurl="http://www.krazyworks.com/pvc.html"
pvcstring="191628769290432845414226"
wget_timeout=30

proxyin="/tmp/proxylist.in"

if [ ! -f "$proxyin" ]
then
echo "Proxy list $proxyin not found. Exiting..."
exit 1
fi

proxyout="/root/proxylist.out"

if [ -f "$proxyout" ]
then
rm "$proxyout"
fi
}

cleanup() {
killall wget
for i in 1 2 3 4 5
do
if [ -f "/tmp/proxy_verify.tmp$i" ]
then
rm "/tmp/proxy_verify.tmp$i"
fi
done
}

wgetrun() {
if [ 'wget -q --timeout=$wget_timeout --tries=1 -O - "$pvcurl" | grep -c "$pvcstring"' -eq 1 ]
then
echo "${proxy}:${port}" >> "$proxyout"
fi
}

pingrun() {
ping -q -c 1 -W 5 $proxy >/dev/null 2>&1

if [ $? -eq 0 ]
then
wgetrun &
fi
}

verify() {
sort "$proxyin" | uniq > "/tmp/proxy_verify.tmp1"
mv "/tmp/proxy_verify.tmp1" "$proxyin"
proxy_total=$(wc -l "$proxyin" | awk '{print $1}')

i=1
j=1
cat "$proxyin" | while read line
do
echo "Processing proxy $i of $proxy_total"
proxy=$(echo $line | awk -F':' '{print $1}')
port=$(echo $line | awk -F':' '{print $2}')
export http_proxy="${proxy}:${port}"
(( i = i + 1 ))

pingrun &

if [ $j -eq 100 ]
then
if [ 'ps -ef | grep -c [w]get' -gt 100 ]
then
sleep $wget_timeout
killall wget
j=1
fi
else
(( j = j + 1 ))
fi
done

echo "Waiting for threads to finish ($wget_timeout seconds)..."
while [ 'ps -ef | egrep -c "[w]get|[p]ing"' -gt 0 ]
do
sleep 5
done
}

# RUNTIME

configure
cleanup
verify
cleanup


About the Author:
http://www.krazyworks.com



Article Originally Published On: http://www.articlesnatch.com


|

Loading...
Related....
Videos...

Recent Computers-and-Technology Articles

Comments

Still can't find what you are looking for? Search for it!

Loading

Copyright 2005-2011 ArticleSnatch, LLC - All Rights Reserved.
Privacy Policy | Terms of Service.