<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blog.forret.com &#187; Linux</title>
	<atom:link href="http://blog.forret.com/categories/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.forret.com</link>
	<description>Tango, photography and whatever&#039;s bleeding edge</description>
	<lastBuildDate>Fri, 20 Jan 2012 20:50:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Track your (Synology) NAS when it&#8217;s stolen</title>
		<link>http://blog.forret.com/2011/04/track-your-synology-nas-when-its-stolen/</link>
		<comments>http://blog.forret.com/2011/04/track-your-synology-nas-when-its-stolen/#comments</comments>
		<pubDate>Sat, 16 Apr 2011 12:34:54 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[hardware]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[nas]]></category>
		<category><![CDATA[synology]]></category>
		<category><![CDATA[theft]]></category>

		<guid isPermaLink="false">http://blog.forret.com/?p=1236</guid>
		<description><![CDATA[When a friend of mine recently got his MacBook stolen, I quickly verified if I had installed Prey Project on each laptop/desktop PC I have. For those who do not know Prey: Prey lets you keep track of your phone or laptop at all times, and will help you find it if it ever gets [...]]]></description>
			<content:encoded><![CDATA[<p>When a friend of mine recently got his MacBook stolen, I quickly verified if I had installed <a href="http://preyproject.com/">Prey Project</a> on each laptop/desktop PC I have. For those who do not know Prey:</p>
<blockquote><p><em>Prey lets you keep track of your phone or laptop at all times, and will help you find it if it ever gets lost or stolen. It&#8217;s lightweight, open source software, and free for anyone to use. And it just works.</em></p>
<p><em><img class="alignnone" src="http://preyproject.com/wp-content/themes/prey/i/prey-logo-big.png" alt="" width="350" height="70" /></em></p></blockquote>
<p>Yes, I had Prey running on each PC. And then I looked at my <a href="http://www.synology.com">Synology NAS</a> (DS410, 4 disks, 8TB raw storage). It could be stolen too. And it&#8217;s basically a Linux box. And Prey is available for Linux &#8230;</p>
<p><img class="alignright" title="Synology DS410" src="http://blog.irelandjoe.com/wp-content/uploads/2010/06/synology410.jpg" alt="" width="200" height="200" />So I figured out how to install Prey on a Synology box:</p>
<ol>
<li>login via ssh as root</li>
<li>install the ipkg/&#8217;Bootstrap&#8217; module on your NAS server &#8211; (from <a href="http://forum.synology.com/wiki/index.php/How_to_Install_Bootstrap">forum.synology.com</a>) and this is a list of the right <a href="http://tools.forret.com/synology/bootstrap.php">bootstraps for the right Synology model</a>.</li>
<li>install bash shell &#8211; &#8220;<code>ipkg install bash</code>&#8221; (from <a href="http://forum.synology.com/enu/viewtopic.php?f=27&amp;t=7800#p33062">forum.synology.com</a>)</li>
<li>install textutils &#8211; &#8220;<code>ipkg install textutils</code>&#8221; (from <a href="http://forum.synology.com/enu/viewtopic.php?f=90&amp;t=24679#p99275">forum.synology.com</a>)</li>
<li>goto /usr/share and download the latest Linux version of Prey (<code>wget http://preyproject.com/releases/...linux.zip</code> ) and unzip it</li>
<li>create an account on Prey and get your <em>API key</em> from your <a href="http://control.preyproject.com/profile">Account profile</a>.</li>
<li>create a new device (e.g. &#8216;NAS8TB (Syn410)&#8217;), indicate OS as Debian (it&#8217;s close enough) and get the <em>device key</em>.</li>
<li>edit the <code>/usr/share/prey/config</code> file and fill in the API and device key<br />
<code># you can get both of these from Prey's web service<br />
api_key='yyyyyyyyyy'<br />
device_key='xxxxxx'</code></li>
<li>now run the &#8220;<code>bash /usr/share/prey/prey.sh</code>&#8221; a first time &#8211; you should get a &#8220;<code>-- Got status code 200! -- Nothing to worry about. <img src='http://blog.forret.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  -- Cleaning up!</code>&#8221; response.</li>
<li>now edit /etc/crontab and add a line<br />
<code>5-55/20 * * * * root /opt/bin/bash /usr/share/prey/prey.sh &gt;  /usr/share/prey/lastrun.log</code></li>
<li>Now restart crontab in the following (non-standard-Linux) way (from <a href="http://forum.synology.com/wiki/index.php/How_to_backup_the_Synology_Server_to_Amazon_S3">forum.synology.com</a>):<br />
<code>/usr/syno/etc.defaults/rc.d/S04crond.sh stop<br />
/usr/syno/etc.defaults/rc.d/S04crond.sh start<br />
</code></li>
<li>And it&#8217;s running! When your Synology is stolen, you set its status in your Prey account to &#8216;Missing&#8217; and you will start getting email reports every 20 minutes. Because it&#8217;s a NAS, there is no webcam and no screenshots can be taken, but the external IP address will let you see where the device turns up.<br />
<code>Remote IP: 78.29.245.xxx<br />
Private IP: 192.168.0.108<br />
Gateway IP: 192.168.0.1<br />
MAC Address: xx:xx:xx:xx:xx:xx</code></li>
</ol>
<p>Would this work on a <a href="http://www.qnap.com/">QNAP server</a>? I&#8217;m guessing, yes.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2011/04/track-your-synology-nas-when-its-stolen/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Redirecting with Apache&#8217;s .htaccess</title>
		<link>http://blog.forret.com/2007/07/redirecting-with-apaches-htaccess/</link>
		<comments>http://blog.forret.com/2007/07/redirecting-with-apaches-htaccess/#comments</comments>
		<pubDate>Wed, 04 Jul 2007 13:09:27 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2007/07/redirecting-with-apaches-htaccess/</guid>
		<description><![CDATA[When you migrate web sites from one place to another, and the URLS change, you don&#8217;t want to lose visitors that still use the old links. If your &#8216;old&#8217; website ran on Apache, you can use its mod_alias/mod_rewrite functionality to automatically redirect to the new URL. This involves adding redirect rules to the .htaccess file [...]]]></description>
			<content:encoded><![CDATA[<p>When you migrate web sites from one place to another, and the URLS change, you don&#8217;t want to lose visitors that still use the old links. If your &#8216;old&#8217; website ran on Apache, you can use its mod_alias/mod_rewrite functionality to automatically redirect to the new URL. This involves adding redirect rules to the <code>.htaccess</code> file in the base folder of the redirects. Some examples:</p>
<p><b>Generic structure of the .htaccess redirects</b><br />
<code><br />
<strong><a href="http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirect">Redirect</a></strong> permanent /(old url) (new url)<br />
Redirect ... (add all your one-2-one redirects here)<br />
RedirectMatch permanent ^/old_stuff/.*html$ http://www.example.com/<br />
RedirectMatch ... (add your catch-all redirects here)</p>
<p><strong>RewriteEngine</strong> on<br />
<strong>RewriteBase</strong> /blog/<br />
<strong><a href="http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteRule">RewriteRule</a></strong> ^([regex])$ http://blog.example.com/$1   [R,L]<br />
RewriteRule ... (add all your variable redirects here)</code></p>
<p><b>EXAMPLE: old Blogger site (on your own server) to new WordPress site</b><br />
I&#8217;ve done <a href="http://blog.forret.com/2005/12/migrating-from-blogspot-to-a-real-blog/">a migration from a blog published by Blogger (via FTP) onto my own webspace</a>, to a blog run by WordPress. I&#8217;ve used the following Rewrite rules to handle the redirections.<br />
* HOMEPAGE:<br />
redirect /index.html and / to your new blog URL<br />
<code>Redirect permanent / http://blog.example.com/<br />
Redirect permanent /index.html http://blog.example.com/</code></p>
<p>* FEED:<br />
redirect e.g. /atom.xml to your Feedburner feed<br />
<code>Redirect permanent /atom.xml http://feeds.feedburner.com/(exampleblog)</code></p>
<p>* ARCHIVES:<br />
redirect e.g. /archive/2005_03_posts.html to the new WordPress archives<br />
<code>RedirectMatch permanent /archive/([0-9][0-9][0-9][0-9])_([0-9][0-9])_.*$ http://blog.example.com/$1/$2/</code></p>
<p>* POST PAGES:<br />
This is tricky, because Blogger and WordPress do not use exactly the same rules for constructing the text-like URL (the &#8216;post slug&#8217;). E.g a post called <em>how-to-podcast-with-blogger-and.html</em> on my old Blogger site became <em>how-to-podcast-with-blogger-and-smartcast/</em> on the new WordPress one. So what I did consisted of 2 type of rules:<br />
a) redirecting individual pages<br />
<code>Redirect permanent 2004/10/how-to-podcast-with-blogger-and.html http://blog.example.com/2004/10/how-to-podcast-with-blogger-and-smartcast/</code><br />
b) a generic rule for the others (this uses Rewrite instead of RedirectMatch!): each page is redirected to a search on the WordPress blog within the correct month with the two first words of the title:<br />
<code>RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/([a-z0-9]*)-([a-z0-9]*).*$ http://blog.example.com/$1/$2/?s=$3+$4  [R,L]</code><br />
This method is far from perfect, but will bring visitors a lot closer to the right page. If you use pretty distinctive words for titles (e.g. &#8220;<a href="http://blog.forret.com/2006/10/myspace-bulletin-and-other-spam/">Myspace: bulletin and other spam</a>&#8220;), chances are the right page show up first. If you start all your posts with &#8220;The ten best ways to &#8230;&#8221; then you will need a more sophisticated rule; e.g. using the 6th and 7th word:<br />
<code>RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-([a-z0-9]*)-([a-z0-9]*).*$ http://blog.example.com/$1/$2/?s=$3+$4  [R,L]</code></p>
<p><b>Not losing the querystring</b><br />
Redirect and RedirectMatch cannot redirect to a URL with a querystring (e.g. to <code>newpage.php?param1=val1&#038;param2=val2</code>). For that you will need to use the RewriteRule. An example: redirect all links like test.asp?param=value on the old domain to the new domain while keeping all querystring parameters:<br />
<code>RewriteRule ^tools/test.asp\??(.*)$  http://web.example.com/tools/test.asp [L,QSA]</code><br />
where the QSA = (query string append) keep existing querystring, and L = (last rule) stop looking further for rule matches.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2007/07/redirecting-with-apaches-htaccess/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Convert Bind DNS zone into PTR records</title>
		<link>http://blog.forret.com/2005/06/convert-bind-dns-zone-into-ptr-records/</link>
		<comments>http://blog.forret.com/2005/06/convert-bind-dns-zone-into-ptr-records/#comments</comments>
		<pubDate>Wed, 15 Jun 2005 13:32:00 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2005/06/convert-bind-dns-zone-into-ptr-records/</guid>
		<description><![CDATA[The following script I made in order to convert the forward DNS records in a /var/named/db.[domain] file into the correct format for a reverse DNS db.[subnet prefix] file. #!/bin/sh (...) DNSROOT=/var/named PREFIX=$1 DOMAIN=$2 shift 2 DNSPRE=$DNSROOT/db.$PREFIX DNSDOM=$DNSROOT/db.$DOMAIN echo "; save this in $DNSPRE" ( if [ -f $DNSDOM ] ; then cat $DNSDOM &#124; grep [...]]]></description>
			<content:encoded><![CDATA[<p>The following script I made in order to convert the forward DNS records in a /var/named/db.[domain] file into the correct format for a reverse DNS db.[subnet prefix] file.<br />
<code><br />
#!/bin/sh<br />
(...)<br />
DNSROOT=/var/named<br />
PREFIX=$1<br />
DOMAIN=$2<br />
shift 2<br />
DNSPRE=$DNSROOT/db.$PREFIX<br />
DNSDOM=$DNSROOT/db.$DOMAIN<br />
echo "; save this in $DNSPRE"<br />
(<br />
if [ -f $DNSDOM ] ; then<br />
cat $DNSDOM<br />
| grep $PREFIX<br />
| grep -w "A"<br />
| sed "s/$PREFIX.*//g"<br />
| gawk "BEGIN {OFS = "t" ;} {print $4,"IN","PTR",$1 ".$DOMAIN.",";; FROM `basename $DNSDOM`" }"<br />
fi</p>
<p>if [ -f $DNSPRE ] ; then<br />
cat $DNSPRE<br />
| grep -w "PTR"<br />
| gawk "BEGIN {OFS = "t" ;} {print $1,$2,$3,$4,";; FROM `basename $DNSPRE` "; }"<br />
fi )<br />
| sort -n<br />
| uniq --check-chars=3<br />
</code></p>
<p>You would call it as follows:<br />
<code>revdns.sh 192.168.110 internal.example.com &gt; new.db.192.168.110</code> and then replace the records of the original db.192.168.110 with the records of the new file. The script still requires manual intervention (you cannot pipe the result straight into a live Bind config file) but saves a lot of typing!</p>
<p>Example of the output:<br />
<code><br />
201     IN      PTR     james.internal.example.be.  ;; FROM db.internal.example.com<br />
202     IN      PTR     wilbur.internal.example.be. ;; FROM db.internal.example.com<br />
216     IN      PTR     appprd1.internal.example.com.   ;; FROM db.192.168.110<br />
217     IN      PTR     appprd2.internal.example.com.   ;; FROM db.192.168.110<br />
218     IN      PTR     appprd3.internal.example.com.   ;; FROM db.192.168.110<br />
219     IN      PTR     appprd4.internal.example.com.   ;; FROM db.192.168.110<br />
220     IN      PTR     appprd5.internal.example.com.   ;; FROM db.192.168.110<br />
221     IN      PTR     appprd6.internal.example.com.   ;; FROM db.192.168.110<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2005/06/convert-bind-dns-zone-into-ptr-records/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Installing NTP (time synchronisation)</title>
		<link>http://blog.forret.com/2005/05/installing-ntp-time-synchronisation/</link>
		<comments>http://blog.forret.com/2005/05/installing-ntp-time-synchronisation/#comments</comments>
		<pubDate>Thu, 19 May 2005 14:09:00 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[ntp]]></category>
		<category><![CDATA[synchronisation]]></category>
		<category><![CDATA[time]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2005/05/installing-ntp-time-synchronisation/</guid>
		<description><![CDATA[Set timezone (optional) create symbolical link from /usr/share/zoneinfo/... to /etc/localtime: ln -sf /usr/share/zoneinfo/Europe/Brussels /etc/localtime Set UTC mode (optional) if your hardware clock runs in UTC (Universal Coordinated Time) mode, add UTC=true to the /etc/sysconfig/clock file Make sure ntpd is not running Use service ntpd stop to stop it. Choose the NTP server you will get [...]]]></description>
			<content:encoded><![CDATA[<dl>
<dt><strong>Set timezone (optional)</strong> </dt>
<dd>create symbolical link from <code>/usr/share/zoneinfo/...</code> to <code>/etc/localtime</code>:<br />
<code>ln -sf /usr/share/zoneinfo/Europe/Brussels /etc/localtime</code> </dd>
<dt><strong>Set UTC mode (optional)</strong> </dt>
<dd>if your hardware clock runs in <a href="http://www.worldtimeserver.com/current_time_in_UTC.aspx">UTC (Universal Coordinated Time)</a> mode, add<br />
<code>UTC=true</code><br />
to the <code>/etc/sysconfig/clock</code> file</dd>
<dt><strong>Make sure <code>ntpd</code> is not running</strong>  </dt>
<dd>Use<code> service ntpd stop </code>to stop it. </dd>
<dt>Choose the <strong>NTP server</strong> you will get your time from  </dt>
<dd>it can be an internal server that has the NTP service open for clients, or an <a href="http://ntp.isc.org/bin/view/Servers/StratumTwoTimeServers">public NTP server</a>. To be sure, use 2 servers. To check if you can access it, run <code>ntpdate timeserver.ntp.ch</code> </dd>
<dt><strong>Edit the <code>/etc/ntp.conf</code> file</strong>  </dt>
<dd>Rename the current file to <code>ntp.bak.conf</code> and make a small new one:<br />
<code>restrict default ignore<br />
server   timeserver.ntp.ch  # Swiss time<br />
server ntp.ucsd.edu       # Univ of California, San Diego<br />
restrict timeserver.ntp.ch mask 255.255.255.255 nomodify notrap noquery<br />
restrict ntp.ucsd.edu      mask 255.255.255.255 nomodify notrap noquery<br />
server  127.127.1.0     # local clock<br />
fudge   127.127.1.0 stratum 10 #so it only takes over if the rest fails<br />
restrict 127.0.0.1  driftfile /etc/ntp/drift broadcastdelay  0.008  authenticate no</code> </dd>
<dt><strong>Set your system clock right</strong>  </dt>
<dd>Run the following command a couple of times:<br />
<code>ntpdate -u timeserver.ntp.ch # or whatever server you want to use</code><br />
You will see the initial diffence in time go away afer the 2nd or 3rd time. </dd>
<dt><strong>Set hardware clock</strong>  </dt>
<dd> <code>/sbin/hwclock --systohc</code> </dd>
<dt><strong>Run the <code>ntpd</code> daemon</strong>  </dt>
<dd> <code>service ntpd start</code> </dd>
<dt><strong>Add <code>ntpd</code> to the services started at boot time</strong> </dt>
<dd><code>chkconfig ntpd on</code></dd>
<dt><strong>Check the NTP results</strong> </dt>
<dd> <code>ntpd -p</code><br />
will show you what the difference is between your clock and that of the servers you added. You are looking for lines like<br />
<code><br />
remote           refid      st t when poll reach   delay   offset  jitter<br />
==========================================================================<br />
LOCAL            LOCAL      10 l   30   64  377    0.000    0.000   0.004 *<br />
192.168.246.107 192.168.246.88   3 u  41  128  177 0.313    5.598   0.345</code><br />
and not lines like<br />
<code><br />
remote           refid      st t when poll reach   delay   offset  jitter<br />
==========================================================================<br />
192.168.246.126 LOCAL        11 u   37  128  375    0.204  6082.02 6069.84</code><br />
Jitter is too high! </dd>
</dl>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2005/05/installing-ntp-time-synchronisation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perl HTML scraping part #1</title>
		<link>http://blog.forret.com/2005/01/perl-html-scraping-part-1/</link>
		<comments>http://blog.forret.com/2005/01/perl-html-scraping-part-1/#comments</comments>
		<pubDate>Fri, 21 Jan 2005 17:05:07 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2005/01/perl-html-scraping-part-1/</guid>
		<description><![CDATA[Here we are, back at the scene of the crime. Yes, I know it&#8217;s been a while. And the task of the day is: GOAL: make an HTML scraper, i.e. a script that grabs another URL and outputs the results to the screen TOOL: let&#8217;s say &#8230; Perl (in my case: Perl 5.8 on RedHat) [...]]]></description>
			<content:encoded><![CDATA[<p>Here we are, back at the scene of the crime. Yes, I know it&#8217;s been a while. And the task of the day is:</p>
<p>
<dl>
<dt>GOAL:</dt>
<dd>make an HTML scraper, i.e. a script that grabs another URL and outputs the results to the screen </dd>
<dt>TOOL:</dt>
<dd>let&#8217;s say &#8230; Perl (in my case: Perl 5.8 on RedHat) </dd>
<dt>INPUT:</dt>
<dd>a URL </dd>
<dt>OUTPUT:</dt>
<dd>the HTML code of that URL</dd>
</dl>
<p>The actual HTML retrieval is easy: you need <code>get()</code> from the LWP::Simple module:<br />
<code>use LWP::Simple;<br />
my $page = get($url);</code></p>
<p>Some remarks:</p>
<ul></ul>
</p>
<p>
<li>Since you are generating a web page, you need the CGI module (to take care of the HTTP headers and stuff).</li>
<p></p>
<li>The URL input parameter will be given as an HTTP querystring: <code>?url=http://www.example.com/path/page.htm</code>. When no url parameter given, we will generate a form where it can be filled in.</li>
<p></p>
<li>We calculate the time it takes to retrieve the original page</li>
<p>
</p>
<p><code></code></p>
<p>#!/usr/bin/perl -w<br />
use strict;<br />
use CGI qw(:standard);<br />
use LWP::Simple qw(!head);<code>my $query = new CGI;<br />
my $url = $query-&gt;param('url');<br />
my $debug = 0;</code></p>
<p>print header();<br />
if(length($url) &gt; 0) {<br />
print getpage($url);<br />
} else {<br />
showform();<br />
}</p>
<p>sub getpage{<br />
my $url = shift;<br />
my $time1 = time();<br />
debuginfo(&#8220;Scraping &lt;a target=_blank href=&#8217;&#8221; . $url . &#8220;&#8216;&gt;link&lt;/a&gt; &#8230;&#8221;);<br />
my $page = get($url);<br />
my $time2 = time();<br />
debuginfo(&#8220;Time taken was &lt;b&gt;&#8221; . ($time2 &#8211; $time1) . &#8220;&lt;/b&gt; seconds&#8221;);<br />
debuginfo(&#8220;Total bytes scraped: &lt;b&gt;&#8221;. length($page)/1000 . &#8220;KB&lt;/b&gt;&#8221; );<br />
return $page;<br />
}</p>
<p>sub debuginfo{<br />
if ($debug &gt; 0) {<br />
my $text = shift;<br />
print &#8220;&lt;small&gt;&#8221; , $text , &#8220;&lt;/small&gt;&lt;br /&gt;n&#8221;;<br />
}</p>
<p>}</p>
<p>sub showform{<br />
print(&#8220;&lt;html&gt;&lt;head&gt;&#8221;);<br />
print(&#8220;&lt;title&gt;SCRAPER&lt;/title&gt;&#8221;);<br />
print(&#8220;&lt;link rel=stylesheet type=text/css href=http://www.forret.com/blog/style.css&gt;&#8221;);<br />
print(&#8220;&lt;/head&gt;&lt;body&gt;&lt;center&gt;n&#8221;);<br />
print(&#8220;&lt;form method=GET action=&#8217;scrape.pl&#8217;&gt;&#8221;);<br />
print(&#8220;URL: &lt;input name=url type=text size=60 value=http://www.forret.com&gt;&#8221;);<br />
print(&#8220;&lt;input type=submit&gt;&lt;/form&gt;n&#8221;);<br />
print(&#8220;&lt;/center&gt;&lt;/body&gt;&lt;/html&gt;n&#8221;);<br />
}</p>
<p>Next step: making sure image <code>src=</code> and hyperlink <code>href</code> keep on working (so convert relative links to absolute links!).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2005/01/perl-html-scraping-part-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Squid cachemgr.cgi UI hack</title>
		<link>http://blog.forret.com/2004/11/squid-cachemgrcgi-ui-hack/</link>
		<comments>http://blog.forret.com/2004/11/squid-cachemgrcgi-ui-hack/#comments</comments>
		<pubDate>Mon, 08 Nov 2004 15:01:00 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/11/squid-cachemgrcgi-ui-hack/</guid>
		<description><![CDATA[Squid has a little system statistics viewer built-in: The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server. (from Squid FAQ) The only thing is &#8230; it&#8217;s so [...]]]></description>
			<content:encoded><![CDATA[<p>Squid has a little system statistics viewer built-in:</p>
<blockquote><p>The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server.<br />
(from <a href="http://www.squid-cache.org/Doc/FAQ/FAQ-9.html">Squid FAQ</a>)</p></blockquote>
<p>The only thing is &#8230; it&#8217;s so ugly! It uses plain HTML and cannot be customized, the FAQ says. However, there is a way to do it:</p>
<ol></ol>
<li>copy <code>cachemgr.cgi</code> to <code>cachemgr2.cgi</code> so if you do something wrong, the original is not lost.</li>
<li>open the CGI file in a text-editor. I used <code>vi</code>, but if you&#8217;re not used to working with it, use something else (emacs?).</li>
<li>in the binary file, look for some text portions that look like HTML code</li>
<li>while keeping in mind that the # of characters should remain the same, change the &lt;title&gt; and &lt;style&gt; to something that suits you. You will have to do this at 2 locations in the file: one for the homepage template and one for the other pages&#8217; template.</li>
<li>suggestion: just let the CGI use a <code>style.css</code> file that you drop into the same folder.<br />
<code>&lt;link rel="stylesheet" type="text/css" href="style.css" mce_href="style.css" /&gt;</code> and fill up with spaces to keep the same # characters</li>
<li>verify that the <code>cachemgr</code> and the <code>cachemgr2</code> have the same # bytes</li>
<li>now use <code>cachemgr2</code> to display your statistics.</li>
<p>I did something a bit different (I wanted to use the CSS of my own website), so I &#8216;ll show you the difference between the two versions.<br />
In order to get to the following comparison, I did a <code>strings cachemgr.cgi &gt; cachemgr.txt</code> to extract only the text parts, and I did a <code><b>diff</b> cachemgr.txt cachemgr2.txt</code> to compare both files. You cannot do a file comparison of 2 binary files.<br />
<code><br />
&lt;em&gt;173,174c173,174&lt;/em&gt;<br />
&lt; &lt;HTML&gt;&lt;HEAD&gt;&lt;TITLE&gt;Cache Manager Interface&lt;/TITLE&gt;<br />
&lt; &lt;STYLE type="text/css"&gt;&lt;!-- BODY{background-color:#ffffff;font-family:verdana,sans-serif} --&gt;&lt;/STYLE&gt;&lt;/HEAD&gt;<br />
---<br />
&gt; &lt;HTML&gt;&lt;HEAD&gt;&lt;TITLE&gt;Cache Manager (pforret)&lt;/TITLE&gt;<br />
&gt; &lt;link rel="stylesheet" type="text/css" href="http://www.forret.com/forret/forret.css" mce_href="http://www.forret.com/forret/forret.css" /&gt; &lt;/HEAD&gt;<br />
&lt;em&gt;199c199&lt;/em&gt;<br />
&lt; &lt;STYLE type="text/css"&gt;&lt;!-- BODY{background-color:#ffffff;font-family:verdana,sans-serif} TABLE{background-color:#333333;border:0pt;padding:0pt}TH,TD{background-color:#ffffff}--&gt;&lt;/STYLE&gt;<br />
---<br />
&gt; &lt;link rel="stylesheet" type=text/css href="http://www.forret.com/forret/forret.css" mce_href="http://www.forret.com/forret/forret.css"&gt;&lt;!-- TABLE{background-color:#333333;border:0pt;padding:0pt} TH,TD{background-color:#ffffff}--&gt;&lt;/STYLE&gt;<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/11/squid-cachemgrcgi-ui-hack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Probe disk performance (MRTG)</title>
		<link>http://blog.forret.com/2004/11/probe-disk-performance-mrtg/</link>
		<comments>http://blog.forret.com/2004/11/probe-disk-performance-mrtg/#comments</comments>
		<pubDate>Wed, 03 Nov 2004 13:41:28 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/11/probe-disk-performance-mrtg/</guid>
		<description><![CDATA[The hdparam can be used to monitor the throughput speed of a hard disk: # &#60;strong&#62;hdparm -tT /dev/hda&#60;/strong&#62; /dev/hda: Timing buffer-cache reads: 888 MB in 2.00 seconds = 444.00 MB/sec Timing buffered disk reads: 20 MB in 3.30 seconds = 6.06 MB/sec This would be an interesting performance metric to see plotted against time. So [...]]]></description>
			<content:encoded><![CDATA[<p>The <code>hdparam</code> can be used to monitor the throughput speed of a hard disk:<br />
<code># &lt;strong&gt;hdparm -tT /dev/hda&lt;/strong&gt;</code><br />
<code>/dev/hda:<br />
Timing buffer-cache reads:   888 MB in  2.00 seconds = 444.00 MB/sec<br />
Timing buffered disk reads:   20 MB in  3.30 seconds =   6.06 MB/sec</code></p>
<p>This would be an interesting performance metric to see plotted against time. So let&#8217;s convert it to a format ready for MRTG.</p>
<ul>
<li>The only numbers we need are the last ones: resulting speed. This can be parsed from the output as follows:<br />
<code>#hdparm -tT /dev/hda | gawk -F = "/seconds/ { print $2}"</code>&#160;</p>
<pre>440.00 MB/sec   3.30 MB/sec</pre>
</li>
<li>if we could suppose that the results will always be in &#8220;MB/sec&#8221;, we could parse out the numbers with<br />
<code>(...) | gawk "{print $1}"</code><br />
and then add a line to our MRTG config files to adjust the units:<br />
<code>kMG[_]: M,G,T,P,X</code><br />
But let&#8217;s say that KB/sec or GB/sec speeds are possible.</li>
<li>One <code>gawk</code> can do the conversion trick:<br />
<code>#(...) | gawk "/GB/ {print $1*1000000000} /MB/ {print $1*1000000} /KB/ {print $1*1000}"</code>&#160;</p>
<pre>440000000 3300000</pre>
</li>
<li>To have a complete MRTG-ready output, we also add the boot time on line 3 and the name of the MRTG output on line 4</li>
<li>Q: Do we need 2 <code>gawk</code>s one after the other? Can&#8217;t one do it?<br />
A: You could do it in 1, I guess, but the parsing would be more complex. I use 2 because the FS (field separator) changes: the first gawk uses the &#8216;=&#8217; character, the second uses the normal whitespace.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/11/probe-disk-performance-mrtg/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Date formatting in GAWK: boot time</title>
		<link>http://blog.forret.com/2004/11/date-formatting-in-gawk-boot-time/</link>
		<comments>http://blog.forret.com/2004/11/date-formatting-in-gawk-boot-time/#comments</comments>
		<pubDate>Tue, 02 Nov 2004 14:17:54 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/11/date-formatting-in-gawk-boot-time/</guid>
		<description><![CDATA[I have one server with apparently an exceptional stability: # uptime 3:45pm up 524 days, 1:22, 1 user, load average: 0.44, 0.16, 0.13 Unfortunately I know this is not correct (I remember rebooting it some weeks ago). So what are other ways to get the date/time of the last boot? Looking at the RedHat manuals, [...]]]></description>
			<content:encoded><![CDATA[<p>I have one server with apparently an exceptional stability:<br />
<code># uptime</code></p>
<pre>3:45pm  up 524 days,  1:22,  1 user,  load average: 0.44, 0.16, 0.13</pre>
<p>Unfortunately I know this is not correct (I remember rebooting it some weeks ago). So what are other ways to get the date/time of the last boot?</p>
<p>Looking at the <a href="http://www.redhat.com/docs/manuals/linux/RHL-7.3-Manual/ref-guide/s1-proc-topfiles.html">RedHat manuals</a>, the following thing should work too:<br />
<code># <strong>cat /proc/stat</strong><br />
cpu 33813143 210619911 30093342 59435750<br />
cpu0 33813143 210619911 30093342 59435749<br />
(...)<br />
btime 1096157569<br />
(...)</code></p>
<p>The <code>btime</code> gives us the last boot time in seconds since 1 Jan 1970. I can find and convert it with <code>gawk</code>:<br />
<code># <strong>gawk "/btime/{ print (`date +%s` - $2) / (3600 * 24.0) ,"days -",strftime("%a %b %d %H:%M:%S %Z %Y",$2)}" /proc/stat</strong><br />
38.6473 days - Sun Sep 26 02:12:49 CEST 2004</code><br />
Which gives us an uptime of 38,6 days &#8211; that looks more like it!</p>
<p>Another way of calculating the uptime:<br />
<code># <strong>gawk "/cpu/ {print $1,($2 + $3 + $4 + $5)/(3600 * 24 * 100)}" /proc/stat</strong><br />
cpu 38.6515<br />
cpu0 38.6515</code><br />
Confirmation of the previous measurement!</p>
<p><code># <strong>cat /proc/uptime</strong><br />
45282758.17 663091.26</code><br />
The first number is the # of seconds since last boot. The other one (idle time) we don&#8217;t need. What is that in days?<br />
<code># <strong>gawk "{print $1/(3600 * 24.0)}" /proc/uptime</strong><br />
524.106</code></p>
<p>This is where the wrong data is coming from! So I&#8217;ll ignore this data.</p>
<p>Remark: This server is one of my oldest ones and is still running <em>Redhat 7.2 (Enigma)</em>. Looks like this bug was fixed in later versions of RedHat, since none of my other servers have it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/11/date-formatting-in-gawk-boot-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Probe average cpu utilisation (MRTG)</title>
		<link>http://blog.forret.com/2004/10/probe-average-cpu-utilisation-mrtg/</link>
		<comments>http://blog.forret.com/2004/10/probe-average-cpu-utilisation-mrtg/#comments</comments>
		<pubDate>Thu, 21 Oct 2004 22:44:27 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/10/probe-average-cpu-utilisation-mrtg/</guid>
		<description><![CDATA[There are two main tools to keep track of your CPU usage: top and vmstat. top is an interactive tool: it shows you the CPU usage of each process, as well as overall statistics, updated every 5 seconds. It&#8217;s good for hands-on checking. #top 17:18:34 up 2 days, 8:14, 3 users, load average: 0.00, 0.00, [...]]]></description>
			<content:encoded><![CDATA[<p>There are two main tools to keep track of your CPU usage: <code>top</code> and <code>vmstat</code>.</p>
<ul>
<li>
<code>top</code> is an interactive tool: it shows you the CPU usage of each process, as well as overall statistics, updated every 5 seconds. It&#8217;s good for hands-on checking.<br />
<code><br />
#top  17:18:34  up 2 days,  8:14,  3 users,  load average: 0.00, 0.00, 0.00<br />
47 processes: 46 sleeping, 1 running, 0 zombie, 0 stopped<br />
CPU states:   0.1% user   0.1% system   0.0% nice   0.0% iowait  99.6% idle<br />
Mem:  1030872k av, 1022256k used,    8616k free,<br />
                         0k shrd,  104844k buff<br />
     777088k actv,      12k in_d,   22296k in_c<br />
Swap: 2048276k av,    8120k used, 2040156k free<br />
                                 640080k cached<br />
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND<br />
30776 root      19   0  1140 1140   852 R     0.9  0.1   0:00   0 top<br />
    1 root      15   0   504  464   436 S     0.0  0.0   0:03   0 init       (...)</code><br />
But say you want to get just one number (percentage) back, so you can use it for logging.
</li>
<li>
<code>vmstat</code> wil give you the following output:<br />
<code><br />
#vmstat<br />
procs                      memory      swap          io     system      cpu<br />
r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id<br />
0  0  0   7964   8804 104712 640224    0    0     2    16  129    27  0  0 100<br />
</code></p>
<p>You can run <code>vmstat 1 5</code> to get 5 consecutive measurements (1 second apart). The number we want is the average CPU usage, or (100% &#8211; idle). The following command will do the job:<br />
<code>#vmstat 1 5 | gawk "/0/ {tot=tot+1; id=id+$16} END {print 100 - id/tot}"</code><br />
gives<br />
<code>0.4</code>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/10/probe-average-cpu-utilisation-mrtg/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Estimate # of lines in a log file</title>
		<link>http://blog.forret.com/2004/10/estimate-of-lines-in-a-log-file/</link>
		<comments>http://blog.forret.com/2004/10/estimate-of-lines-in-a-log-file/#comments</comments>
		<pubDate>Thu, 21 Oct 2004 12:30:27 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/10/estimate-of-lines-in-a-log-file/</guid>
		<description><![CDATA[Let&#8217;s say you need an (approximate) count of the number of lines in a huge file. The most obvious way of calculating this would be using wc, but this actually can be quite slow: # time wc -l /var/log/squid/access.log 2812824 /var/log/squid/access.log real 0m43.988s (counting is done at 64.000 lines/sec) Running wc without the -l (only [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s say you need an (approximate) count of the number of lines in a huge file. The most obvious way of calculating this would be using <code>wc</code>, but this actually can be quite slow:<br />
<code># time wc -l /var/log/squid/access.log<br />
2812824 /var/log/squid/access.log<br />
real    0m43.988s</code><br />
(counting is done at 64.000 lines/sec)</p>
<p>Running <code>wc</code> without the <code>-l</code> (only count lines) would be ever slower because it would also count the words, instead of just the LF (linefeed) characters. But using <code>wc -c</code> is very fast! This is because the filesystem keeps track of each file&#8217;s filesize (= number of characters/bytes), so the file does not even have to be read to give this number. Can we estimate the # of lines from the # of bytes?</p>
<p>For the type of file we are talking about here (a Squid log file) there actually is a way. The file is more or less &#8216;square&#8217;, meaning that every line is about the same length (it contains date, status, URL, &#8230;).<br />
If we take the beginning of the file (the first 10000 lines):<br />
<code># head -10000 /var/log/squid/access.log | wc<br />
  10000  100000 1775257</code><br />
we see that every line is about 177 chars long.</p>
<p>The end of the file (the last 10000 lines):<br />
<code># tail -10000 /var/log/squid/access.log | wc<br />
  10000  100000 2047887</code><br />
gives us a number of 204 chars/line.</p>
<p>Let&#8217;s take some more data and combine both:<br />
<code># ( head -50000 /var/log/squid/access.log ; tail -50000 /var/log/squid/access.log ) | wc<br />
 100000 1000000 19488905</code><br />
which gives us an average of 195 chars/line.</p>
<p>A file size of 533.229.920 bytes (533MB) would lead us to estimate the # of lines to 2.734.512, where the actual # of lines is 2.818.184 (3% difference). That is: we lose 3% accuracy but the calculation takes almost no CPU time, instead of 45 seconds. This might be a trade-off you are willing to accept!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/10/estimate-of-lines-in-a-log-file/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calculate hit rate from a log file</title>
		<link>http://blog.forret.com/2004/10/calculate-hit-rate-from-a-log-file/</link>
		<comments>http://blog.forret.com/2004/10/calculate-hit-rate-from-a-log-file/#comments</comments>
		<pubDate>Thu, 21 Oct 2004 09:30:13 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/10/calculate-hit-rate-from-a-log-file/</guid>
		<description><![CDATA[You have a huge file that contains one line per request/transaction. Some of the lines are of one type (e.g. &#8216;HIT&#8217;), some of another (e.g. MISS). Let&#8217;s say you want to calculate the hitrate, but as fast as possible. We take a Squid log file of about 140MB. How long does it take to count [...]]]></description>
			<content:encoded><![CDATA[<p>You have a huge file that contains one line per request/transaction. Some of the lines are of one type (e.g. &#8216;HIT&#8217;), some of another (e.g. MISS). Let&#8217;s say you want to calculate the hitrate, but as fast as possible.<br />
We take a Squid log file of about 140MB. How long does it take to count how many lines it has?<br />
<code># time wc -l /var/log/squid/access.log<br />
845212 /var/log/squid/access.log<br />
real 0m6.523s</code> (about 21.4 MB/s or 130.000 lines/s)</p>
<p>And now let&#8217;s just filter out the lines containing &#8216;HIT&#8217; and count those:<br />
<code>#time sh -c "grep -i HIT /var/log/squid/access.log | wc -l"</code><br />
Wow! This takes ages (I stopped it after 15 minutes) and the <code>grep</code> takes 100% CPU all the time. So let&#8217;s look for another solution.</p>
<p>Maybe <code>gawk</code>? First let&#8217;s see if it is much slower than <code>wc -l</code> for counting lines:<br />
<code># time gawk "END {print NR}" /var/log/squid/access.log<br />
845907<br />
real 0m26.129s</code> (5.3 MB/s or 32.000 lines/s &#8211; 4 times slower)<br />
And now let it count the hits too:<br />
<code>]# time gawk "BEGIN {hit=0} /HIT/ {hit = hit+1} END {print hit/NR*100}" '/var/log/squid/access.log'<br />
84.5023<br />
real 0m32.836s</code> (4MB/s or 25.000 lines/s &#8211; slow but acceptable)</p>
<p>Do we actually need a count on the whole file? What if we just took the last (i.e. most recent) 100.000 lines? The result would be a better indication of what the current hit rate is, and the speed of calculation would be more predictable.<br />
<code># time sh -c "tail -100000 /var/log/squid/access.log | gawk 'BEGIN {hit=0} /HIT/ {hit = hit+1} END {print hit/NR*100}'"<br />
92.305<br />
real 0m3.332s</code> (30.000 lines/s)</p>
<p>It is actually a bit slower the first time you run it, probably due to disk or filesystem caching. So if you want your hit rate calculation to take less than 2 seconds, you could take the last 50.000 lines. Done!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/10/calculate-hit-rate-from-a-log-file/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Squid: list top X referers</title>
		<link>http://blog.forret.com/2004/10/squid-list-top-x-referers/</link>
		<comments>http://blog.forret.com/2004/10/squid-list-top-x-referers/#comments</comments>
		<pubDate>Tue, 19 Oct 2004 16:55:54 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://blog.forret.com/2004/10/squid-list-top-x-referers/</guid>
		<description><![CDATA[If your Squid server logs the referers of its request (i.e. 1. you&#8217;ve configured squid-cache with --enable-referer-log before compiling and 2. you&#8217;ve included a referer_log /var/log/squid/referer.log in your squid.conf file), you can easily show top 50 of most popular referers with a simple Bourne shell: #!/bin/bash this script is &#8216;top_referers.sh&#8217; (c) 2004 Peter Forret &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p>If your Squid server logs the referers of its request (i.e.<br />
1. you&#8217;ve configured <a href="http://www.squid-cache.org">squid-cache</a> with <code>--enable-referer-log</code> before compiling and<br />
2. you&#8217;ve included a <code>referer_log /var/log/squid/referer.log</code> in your <code>squid.conf</code> file),<br />
you can easily show top 50 of most popular referers with a simple Bourne shell:<br />
<code>#!/bin/bash</code></p>
<ol>
<li>this script is &#8216;top_referers.sh&#8217;</li>
<li>(c) 2004 Peter Forret &#8211; Open Source<br />
REFERERS=/var/log/squid/referer.log<br />
OUTPUT=/var/www/html/stats/referer.txt<br />
MAXLINES=50<code>(<br />
echo REPORT MADE AT `date`<br />
echo =============================<br />
$OUTPUT</code></li>
</ol>
<p>Then add it to your crontab:<br />
<code>10 * * * * /(path)/top_referers.sh</code><br />
and you have an hourly updated stat!<br />
Add a little HTML formatting if you&#8217;re aesthetically demanding!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.forret.com/2004/10/squid-list-top-x-referers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.705 seconds -->

