|
|
I occasionally come across text files with weird squiggles or numbers in them were there should be characters. Usually it’s accented characters, but in extreme cases I’ve seen it happen with speech marks.
The problem of course is that the files are not ASCII, and text files don’t store what character set they were created with. So if the set my system happens to use doesn’t match the one the file uses then the result is screwed up letters.
To solve this I created a little python script that converts the file into using the UTF-8 character set as that is nice and universal. You can specify what codec the input is in with the -c option, if you don’t bother then it assumes the Windows 1252 codepage as that is usually what it is in my experience. There is also a force option for when the conversion comes across characters that don’t match the input codec but you want it to convert anyway.
I thought about making it autodetect the codec of the input file, but it is a lot of work for little benefit. The current code works in 99% of cases for me.
#!/usr/bin/env python
import codecs
import optparse
import sys
if len(sys.argv) < 2:
print ‘convert_cp1252_to_UTF-8.py [-f] $filename’
sys.exit(0)
opts = optparse.OptionParser()
# ‘help’, ‘config=’, ‘logfile=’
opts.add_option(‘-f’, ‘–force’, action=‘store_true’)
opts.add_option(‘-c’, ‘–codec’, default=‘cp1252′)
(parsedOpts, args) = opts.parse_args()
filename = args[0]
try:
if parsedOpts.force:
textfile = codecs.open(filename, ‘r’, encoding=parsedOpts.codec, errors=‘ignore’)
else:
textfile = codecs.open(filename, ‘r’, encoding=parsedOpts.codec)
utffile = codecs.open(filename+‘.utf-8′, ‘w’, encoding=‘utf-8′)
utffile.write(textfile.read())
except:
print ‘Error converting to UTF-8, source file probably doesn’t use cp1252‘
The above code is made available under the MIT license.
After decades of never losing any significant data, last night I deleted my Video dir for the second time in six months, along with my Music this time. Fortunately I had learned my lesson and had a complete backup this time. The first time I lost a large amount of non-critical videos I had downloaded off the internet (okay, it was porn. I said it, happy now?). This time I just lost a couple of episodes of This Week in Fun I hadn’t watched. If I can be bothered I can easily re-grab them from ODTV.
I’m not sure how I accidently deleted the stuff, I was moving my config files from being stored in Subversion to git, using this method. I must have accidently rm -rf them somehow. This is my problem, it’s incredibly easy to shoot yourself in the foot from the command-line.
I’ve always used a combination of the GUI and shell, my first proper computer was an Amiga 500+ which, while it had an amazing modern GUI while Windows was still stuck with 3.1, also had a pretty good shell interface. Not quite up to unix standards, but much better than DOS. So I’ve always used both, depending on what was easiest for the task.
When I switched to Windows (after my last Amiga died on me), I started using the GUI much more heavily since DOS is so underpowered. When a short time later I switched to Linux though that got reversed since the Linux GUI at the time wasn’t so hot and the shell was incredibly powerful. So I currently default to using the shell, which makes me nice and productive but also means I’m working without a safety net.
If recent experience is anything to go by, I either need to start relying on the GUI much more or I need to find myself an idiot-proof shell. Since I don’t think the latter is possible it looks like I’ll be GUI-focused from now on. This is going to be a hard transition for me, much like giving up a narcotic, but I think it is necessary.
I’m going to start this transition at the same time as I install KDE 4, which is looking excellent in my tests on my EeePC by the way. I could almost transition now, there are just a couple of things (that are in the works) that I need before I can convert. A Network-Manager controlling plasmoid being the main one.
That way I only have to relearn how to do things in one GUI, rather than learning how to do it in KDE 3.5 and then a new way in 4. I’m not going to give up the shell completely mind, just change my balance so it is 80/20 in favour of the GUI instead of the other way round.
Wish me luck.
My cranky cynical old heart bursts with joy at the mere thought of Kevin Rose working the drive through window of McDonald’s asking people if they “want fries with that”. In fact I’m full prepared to strip down to a loincloth, dance and chant around a fire, and ritualistically sacrifice a live chicken if I honestly felt it would help
via How to Abuse the New DiggBar for Fun and Profit.
 KDE 4.2 desktop
I’ve had a virtual machine running KDE4 from the Debian experimental repos for a while now. Every few months, when I notice some announcement of a point release I update it and give it another try. Up till now I’ve always been disappointed in some way.
I first tried it on my EeePC 701 netbook, back when 4.0 first launched. I was impressed that it ran at all on that hardware, even with desktop effects enabled, some of them at least. This is on a machine that struggles to run Windows XP, let alone Vista. While it was okay from a performance perspective, it was horribly unstable. I can live with applications crashing (as long as I don’t lose data), but parts of the desktop going boom is unacceptable. I eventually gave up on the netbook and put KDE 3.5 on it instead.
That is when I setup a VM on my main Debian laptop, at that time running Lenny, so I could keep track of KDE’s progress. I’m not one of those people who baulk at change, who objects to learning new interfaces irrespective of whether they are better or worse than what they replace. On the other hand I absolutely need my desktop to be rock-solid, any flakiness in any essential part of the OS I will not abide. On the gripping hand I want to be in control, this is why I use Linux and KDE in the first place. They allow me to configure the OS to work the way I want. I’d really like to run KDE4 on my production machines ‘cos of the shiny, but until I can be sure of its stability I won’t risk it.
KDE 4.2 just recently transitioned from Debian experimental to unstable, so I fired up the VM to give it another whirl. As I had already played with 4.2 a few weeks earlier I was expecting to have the same nearly there experience. Things mostly working but still the odd crash, but I was wrong.
 KDE 4.2 Calculator plasmoid.
After two days of playing with it I have had but a single crash (kwallet), and that was just after the upgrade process so can be dismissed as an artefact. Other than that it has been completely stable and usable. I have to say, I really like it, some of the defaults are not to my taste but they can all be configured with a bit of searching through the interfaces. KMail in particular defaulted to something horrid, but on the other hand it seems to be even more configurable than the 3.5 version, so it’s a win overall.
It’s the little things that make me want to switch though, like the calculator plasmoid that can be stuck on the panel for incredibly fast access. A dictionary right on the desktop and many other useful widgets. Previous attempts at widgets on KDE (superkaramba I’m looking at you) have been mediocre at best, glorified system monitors is the best you could hope to run. The plasmoids seem to be the real deal from the built-in examples, if the community starts writing these things we should be in for some real treats.
The application launchers are a little different, and I have to say at the moment I prefer the old “start menu” style one, the fixed height of the new designs mean I have to keep scrolling to find what I want which is icky. But I’m going to give them a while, it may be just one of those things that you have to get used to, and if it turns out I still hate them the old style is still available.
Being in a VM means I can’t use the desktop effects, so I’m missing a lot of the fun stuff. So the next step is to sacrifice my netbook again to try those out. I’m hopeful that by the time it transitions to testing (squeeze), which my main laptop now runs, I will be completely happy to switch to it, it is certainly looking positive at the moment.
When I write a program I like to include a build number in the version. Usually I go version.revision.build. The version only ever gets bumped for things that are practically a complete rewrite, and sometimes not even then. The revision gets bumped for new features or major bug fixes and the build gets bumped for every compile.
This gives the huge of advantage of being able to identify the exact time a particular version of a program was created and thus make bug finding easier.
For compiled code this works really well, I just have the build process increment the number somehow. For interpreted code, such as python, however there is no build process. So how do I make sure the build number gets incremented every time? There is no way I would remember to increment it manually.
When I was using subversion I hacked something together using $LastChangedRevision$, so the build number was based on the most recent svn commit number. It worked well enough and I was happy.
Now however I’m switching to git, which doesn’t have keywords like CVS and Subversion so I’m a bit stumped. I tried hacking a script that auto-increments a number then runs a git commit, but not only is it logically flawed, but it relies on me remembering to run it rather than direct git commands which is never going to be reliable.
Maybe I could do something with the current date, but again that would rely on me running something. Probably repo hooks are the answer somehow but I’ve no idea how. The build no. itself needs to be stored in git so as to sync between computers. Can a git repo hook update a file then commit that update back to itself automatically (without an infinite recursion destroying the repo)?
Update (2009-04-05): I’ve worked out how to generate an auto-incrementing number via git hooks. Use the following script as a pre-commit hook (save as .git/hooks/pre-commit). It’s setup for python but it could easily be adapted for whatever language you need.
#!/bin/bash
#
# Called by git-commit with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# This script will automatically increment the __revision__ number in
# the file version.py on every git commit.
f="$GIT_DIR/../version.py"
REVISION=$(grep ‘^__revision__’ $f | sed ‘s/^__revision__ = //’)
#echo "Current revision = $REVISION"
let REVISION=$REVISION+1
#echo "New revision = $REVISION"
sed -r "s/^(__revision__ = )([0-9]+)$/\1$REVISION/" < $f > $f.new
mv $f.new $f
git add $f
|
|