Published on
July 4, 2007 in
Linux.
When you migrate web sites from one place to another, and the URLS change, you don’t want to lose visitors that still use the old links. If your ‘old’ website ran on Apache, you can use its mod_alias/mod_rewrite functionality to automatically redirect to the new URL. This involves adding redirect rules to the .htaccess file in the base folder of the redirects. Some examples:
Generic structure of the .htaccess redirects
Redirect permanent /(old url) (new url)
Redirect ... (add all your one-2-one redirects here)
RedirectMatch permanent ^/old_stuff/.*html$ http://www.example.com/
RedirectMatch ... (add your catch-all redirects here)
RewriteEngine on
RewriteBase /blog/
RewriteRule ^([regex])$ http://blog.example.com/$1 [R,L]
RewriteRule ... (add all your variable redirects here)
EXAMPLE: old Blogger site (on your own server) to new Wordpress site
I’ve done a migration from a blog published by Blogger (via FTP) onto my own webspace, to a blog run by Wordpress. I’ve used the following Rewrite rules to handle the redirections.
* HOMEPAGE:
redirect /index.html and / to your new blog URL
Redirect permanent / http://blog.example.com/
Redirect permanent /index.html http://blog.example.com/
* FEED:
redirect e.g. /atom.xml to your Feedburner feed
Redirect permanent /atom.xml http://feeds.feedburner.com/(exampleblog)
* ARCHIVES:
redirect e.g. /archive/2005_03_posts.html to the new Wordpress archives
RedirectMatch permanent /archive/([0-9][0-9][0-9][0-9])_([0-9][0-9])_.*$ http://blog.example.com/$1/$2/
* POST PAGES:
This is tricky, because Blogger and Wordpress do not use exactly the same rules for constructing the text-like URL (the ‘post slug’). E.g a post called how-to-podcast-with-blogger-and.html on my old Blogger site became how-to-podcast-with-blogger-and-smartcast/ on the new Wordpress one. So what I did consisted of 2 type of rules:
a) redirecting individual pages
Redirect permanent 2004/10/how-to-podcast-with-blogger-and.html http://blog.example.com/2004/10/how-to-podcast-with-blogger-and-smartcast/
b) a generic rule for the others (this uses Rewrite instead of RedirectMatch!): each page is redirected to a search on the Wordpress blog within the correct month with the two first words of the title:
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/([a-z0-9]*)-([a-z0-9]*).*$ http://blog.example.com/$1/$2/?s=$3+$4 [R,L]
This method is far from perfect, but will bring visitors a lot closer to the right page. If you use pretty distinctive words for titles (e.g. “Myspace: bulletin and other spam“), chances are the right page show up first. If you start all your posts with “The ten best ways to …” then you will need a more sophisticated rule; e.g. using the 6th and 7th word:
RewriteRule ^([0-9][0-9][0-9][0-9])/([0-9][0-9])/[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-[a-z0-9]*-([a-z0-9]*)-([a-z0-9]*).*$ http://blog.example.com/$1/$2/?s=$3+$4 [R,L]
Not losing the querystring
Redirect and RedirectMatch cannot redirect to a URL with a querystring (e.g. to newpage.php?param1=val1¶m2=val2). For that you will need to use the RewriteRule. An example: redirect all links like test.asp?param=value on the old domain to the new domain while keeping all querystring parameters:
RewriteRule ^tools/test.asp\??(.*)$ http://web.example.com/tools/test.asp [L,QSA]
where the QSA = (query string append) keep existing querystring, and L = (last rule) stop looking further for rule matches.
Published on
June 15, 2005 in
Linux.
The following script I made in order to convert the forward DNS records in a /var/named/db.[domain] file into the correct format for a reverse DNS db.[subnet prefix] file.
#!/bin/sh
(...)
DNSROOT=/var/named
PREFIX=$1
DOMAIN=$2
shift 2
DNSPRE=$DNSROOT/db.$PREFIX
DNSDOM=$DNSROOT/db.$DOMAIN
echo "; save this in $DNSPRE"
(
if [ -f $DNSDOM ] ; then
cat $DNSDOM
| grep $PREFIX
| grep -w "A"
| sed "s/$PREFIX.*//g"
| gawk "BEGIN {OFS = "t" ;} {print $4,"IN","PTR",$1 ".$DOMAIN.",";; FROM `basename $DNSDOM`" }"
fi
if [ -f $DNSPRE ] ; then
cat $DNSPRE
| grep -w "PTR"
| gawk "BEGIN {OFS = "t" ;} {print $1,$2,$3,$4,";; FROM `basename $DNSPRE` "; }"
fi )
| sort -n
| uniq --check-chars=3
You would call it as follows:
revdns.sh 192.168.110 internal.example.com > new.db.192.168.110 and then replace the records of the original db.192.168.110 with the records of the new file. The script still requires manual intervention (you cannot pipe the result straight into a live Bind config file) but saves a lot of typing!
Example of the output:
201 IN PTR james.internal.example.be. ;; FROM db.internal.example.com
202 IN PTR wilbur.internal.example.be. ;; FROM db.internal.example.com
216 IN PTR appprd1.internal.example.com. ;; FROM db.192.168.110
217 IN PTR appprd2.internal.example.com. ;; FROM db.192.168.110
218 IN PTR appprd3.internal.example.com. ;; FROM db.192.168.110
219 IN PTR appprd4.internal.example.com. ;; FROM db.192.168.110
220 IN PTR appprd5.internal.example.com. ;; FROM db.192.168.110
221 IN PTR appprd6.internal.example.com. ;; FROM db.192.168.110
Published on
January 21, 2005 in
Linux.
Here we are, back at the scene of the crime. Yes, I know it’s been a while. And the task of the day is:
- GOAL:
- make an HTML scraper, i.e. a script that grabs another URL and outputs the results to the screen
- TOOL:
- let’s say … Perl (in my case: Perl 5.8 on RedHat)
- INPUT:
- a URL
- OUTPUT:
- the HTML code of that URL
The actual HTML retrieval is easy: you need get() from the LWP::Simple module:
use LWP::Simple;
my $page = get($url);
Some remarks:
Since you are generating a web page, you need the CGI module (to take care of the HTTP headers and stuff).
The URL input parameter will be given as an HTTP querystring: ?url=http://www.example.com/path/page.htm. When no url parameter given, we will generate a form where it can be filled in.
We calculate the time it takes to retrieve the original page
#!/usr/bin/perl -w
use strict;
use CGI qw(:standard);
use LWP::Simple qw(!head);my $query = new CGI;
my $url = $query->param('url');
my $debug = 0;
print header();
if(length($url) > 0) {
print getpage($url);
} else {
showform();
}
sub getpage{
my $url = shift;
my $time1 = time();
debuginfo(“Scraping <a target=_blank href=’” . $url . “‘>link</a> …”);
my $page = get($url);
my $time2 = time();
debuginfo(“Time taken was <b>” . ($time2 – $time1) . “</b> seconds”);
debuginfo(“Total bytes scraped: <b>”. length($page)/1000 . “KB</b>” );
return $page;
}
sub debuginfo{
if ($debug > 0) {
my $text = shift;
print “<small>” , $text , “</small><br />n”;
}
}
sub showform{
print(“<html><head>”);
print(“<title>SCRAPER</title>”);
print(“<link rel=stylesheet type=text/css href=http://www.forret.com/blog/style.css>”);
print(“</head><body><center>n”);
print(“<form method=GET action=’scrape.pl’>”);
print(“URL: <input name=url type=text size=60 value=http://www.forret.com>”);
print(“<input type=submit></form>n”);
print(“</center></body></html>n”);
}
Next step: making sure image src= and hyperlink href keep on working (so convert relative links to absolute links!).
Published on
November 8, 2004 in
Linux.
Squid has a little system statistics viewer built-in:
The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server.
(from Squid FAQ)
The only thing is … it’s so ugly! It uses plain HTML and cannot be customized, the FAQ says. However, there is a way to do it:
copy cachemgr.cgi to cachemgr2.cgi so if you do something wrong, the original is not lost.
open the CGI file in a text-editor. I used vi, but if you’re not used to working with it, use something else (emacs?).
in the binary file, look for some text portions that look like HTML code
while keeping in mind that the # of characters should remain the same, change the <title> and <style> to something that suits you. You will have to do this at 2 locations in the file: one for the homepage template and one for the other pages’ template.
suggestion: just let the CGI use a style.css file that you drop into the same folder.
<link rel="stylesheet" type="text/css" href="style.css" mce_href="style.css" /> and fill up with spaces to keep the same # characters
verify that the cachemgr and the cachemgr2 have the same # bytes
now use cachemgr2 to display your statistics.
I did something a bit different (I wanted to use the CSS of my own website), so I ‘ll show you the difference between the two versions.
In order to get to the following comparison, I did a strings cachemgr.cgi > cachemgr.txt to extract only the text parts, and I did a diff cachemgr.txt cachemgr2.txt to compare both files. You cannot do a file comparison of 2 binary files.
<em>173,174c173,174</em>
< <HTML><HEAD><TITLE>Cache Manager Interface</TITLE>
< <STYLE type="text/css"><!-- BODY{background-color:#ffffff;font-family:verdana,sans-serif} --></STYLE></HEAD>
---
> <HTML><HEAD><TITLE>Cache Manager (pforret)</TITLE>
> <link rel="stylesheet" type="text/css" href="http://www.forret.com/forret/forret.css" mce_href="http://www.forret.com/forret/forret.css" /> </HEAD>
<em>199c199</em>
< <STYLE type="text/css"><!-- BODY{background-color:#ffffff;font-family:verdana,sans-serif} TABLE{background-color:#333333;border:0pt;padding:0pt}TH,TD{background-color:#ffffff}--></STYLE>
---
> <link rel="stylesheet" type=text/css href="http://www.forret.com/forret/forret.css" mce_href="http://www.forret.com/forret/forret.css"><!-- TABLE{background-color:#333333;border:0pt;padding:0pt} TH,TD{background-color:#ffffff}--></STYLE>
Published on
November 3, 2004 in
Linux.
The hdparam can be used to monitor the throughput speed of a hard disk:
# <strong>hdparm -tT /dev/hda</strong>
/dev/hda:
Timing buffer-cache reads: 888 MB in 2.00 seconds = 444.00 MB/sec
Timing buffered disk reads: 20 MB in 3.30 seconds = 6.06 MB/sec
This would be an interesting performance metric to see plotted against time. So let’s convert it to a format ready for MRTG.
Published on
November 2, 2004 in
Linux.
I have one server with apparently an exceptional stability:
# uptime
3:45pm up 524 days, 1:22, 1 user, load average: 0.44, 0.16, 0.13
Unfortunately I know this is not correct (I remember rebooting it some weeks ago). So what are other ways to get the date/time of the last boot?
Looking at the RedHat manuals, the following thing should work too:
# cat /proc/stat
cpu 33813143 210619911 30093342 59435750
cpu0 33813143 210619911 30093342 59435749
(...)
btime 1096157569
(...)
The btime gives us the last boot time in seconds since 1 Jan 1970. I can find and convert it with gawk:
# gawk "/btime/{ print (`date +%s` - $2) / (3600 * 24.0) ,"days -",strftime("%a %b %d %H:%M:%S %Z %Y",$2)}" /proc/stat
38.6473 days - Sun Sep 26 02:12:49 CEST 2004
Which gives us an uptime of 38,6 days – that looks more like it!
Another way of calculating the uptime:
# gawk "/cpu/ {print $1,($2 + $3 + $4 + $5)/(3600 * 24 * 100)}" /proc/stat
cpu 38.6515
cpu0 38.6515
Confirmation of the previous measurement!
# cat /proc/uptime
45282758.17 663091.26
The first number is the # of seconds since last boot. The other one (idle time) we don’t need. What is that in days?
# gawk "{print $1/(3600 * 24.0)}" /proc/uptime
524.106
This is where the wrong data is coming from! So I’ll ignore this data.
Remark: This server is one of my oldest ones and is still running Redhat 7.2 (Enigma). Looks like this bug was fixed in later versions of RedHat, since none of my other servers have it.
Published on
October 21, 2004 in
Linux.
There are two main tools to keep track of your CPU usage: top and vmstat.
Recent Comments