ClioSport.net

Register a free account today to become a member!
Once signed in, you'll be able to participate on this site by adding your own topics and posts, as well as connect with other members through your own private inbox!

  • When you purchase through links on our site, we may earn an affiliate commission. Read more here.

Programme or addon to list all hosts accessed by webpage



  182FF with cup packs
I'm looking for either an application or a firefox addon which will list all hosts that are referenced by a webpage in embedded objects/iframes/whatever.

For instance, a customer needed twitter allowed for a couple of users, but even if you allow any pages from twitter.com to be accessed that doesn't actually make the site viewable as in this case most of the elements are stored on the host twimg.com

Theere's the long winded way of doing it which I do currently which is:

1) using Live HTTP Headers for firefox, access the page,
2) save the output of all the request/response headers to a text file,
3) grep the text file for any lines that contain "Host:" and pipe these to another text file,
4) load that into excel and then filter for unique matches,

Then I have a list of all the hosts.

Unfortunatly doing any sort of google search for "host header addon firefox" give me about 19 million results, none of which have any relevance.

I need it to be fairly idiot proof as well as I will be giving it to the 1st line guys to use, so ideally windows, but if someone has some unix scripting pointer, as I'm sure I could do something using wget, grep and regex etc.

Any ideas greatly appreciated, then maybe I can stop having to placate customeers when they ask for a webpage to be unblocked and the 1st liners only bother to unblock the address URL of the site and none of the subdomains and/or other hosts.
 
  182FF with cup packs
Oh well, it took me all afternoon, but I've written a *nix script to do it.

Code:
wget --convert-links -o wget.log $1
WGOTFILE=`cat wget.log | grep Converting | sed 's/Converting //g' | sed 's/\.\.\..*//g'`
clear
echo
echo ==========Requested Domain and any redirected domain==========
cat wget.log | grep Connecting | sed 's/Connecting to //g' | sed 's/\:.*//g' | sort | uniq
echo
echo ==========Domains for any embedded content \(src= tags\)==========
cat $WGOTFILE | tr " " "\n" | grep "src=\"" | sed 's/^src=\".*\:\/\///g' | sed 's/\/.*//g' | sort | uniq
echo 
echo ==========Domains for any links \(a href= tags\)==========
cat $WGOTFILE | tr " " "\n" | grep "href=\".*\:" | sed 's/^href=\".*\:\/\///g' | sed 's/[\/|\"].*//g' | sort | uniq
echo 
rm $WGOTFILE
rm wget.log
You run it by specifying the domain name:

Code:
./whathosts www.twitter.com
and it outputs:
Code:
==========Requested Domain and any redirected domain==========
www.facebook.com

==========Domains for any embedded content (src= tags)==========
static.ak.fbcdn.net

==========Domains for any links (a href= tags)==========
developers.facebook.com
static.ak.fbcdn.net
www.apple.com
www.facebook.com
www.getfirefox.com
www.microsoft.com

Win. ;)
 


Top