DO NOT WANT

Sometimes well intended optimizations can get in the way, and this is one of them.

In general, when a file is read from or written to, the operating system keeps the data around in case it's needed again. This is in general a good idea, except when it's not. Sometimes you know that after you won't need the cached data again any time soon, and it's better for the operating system to forget about it instead of letting the cache balloon and push out more useful, albeit less recently used data. For example, when performing a backup a large amount of data is read from the disk and then written out to (typically) another media. It's unlikely that much of the just backed up data is going to be needed again, at least until the next backup window. Unfortunately on Linux the interface to control this is fairly limited. One can write a LD_PRELOAD library to tell Linux to uncache files that are just closed:

#if 0
gcc $0 -shared -o $0.so -ldl -fPIC && LD_PRELOAD=$0.so exec [email protected]
#else
#define _GNU_SOURCE
#include <dlfcn.h>
#include <fcntl.h>
#include <stdlib.h>

int close(int fd)
{
    static int (*close_func)(int) = NULL;
    posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
    if (close_func == NULL)
        close_func = dlsym(RTLD_NEXT, "close");

    return close_func(fd);
}
#endif

posix_fadvise(3) says the following about POSIX_FADV_DONTNEED:

POSIX_FADV_DONTNEED attempts to free cached pages associated with the specified region. This is useful, for example, while streaming large files. A program may periodically request the kernel to free cached data that has already been used, so that more useful cached pages are not discarded instead.

Additionally, a length of 0 indicates that the advice applies until the end of file. This is insufficient however. If the program exits without closing its files then they will never be uncached. And when streaming large files they also won't be uncached until after the cache is already polluted. Those issues can be fixed by wrapping read, write, pread, pwrite and related functions, or by directly modifying the program when it's possible to do so. Still, the interface is unsatisfactory. For example, if a file was already cached before it's backed up, you don't want the backup process to uncache it. There's also no good way to use posix_fadvise to uncache write data, given what the manpage says:

Pages that have not yet been written out will be unaffected, so if the application wishes to guarantee that pages will be released, it should call fsync(2) or fdatasync(2) first.

Calling fsync(2) or fdatasync(2) while doing large streaming writes is simply unacceptable. Instead, it would be much better if the advice can be given at file open time and it would last for the duration of the file descriptor. The kernel can then take the hint and not cache accesses made through that file descriptor. O_DIRECT can achieve this, but sometimes you don't want the extra semantics and requirements that it brings.

by khc on Tue Nov 8 00:41:41 2011 Permlink
Tags: computer

HTTP_REFERER

Google recently announced that it will enable secure search for all the login users. More HTTPS usage is in general a good thing, but it does have a side effect:

keyword: not provided

Apparently RFC 2616 specifies that:

Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.

Makes sense.

HTML5 also has a new "noreferrer" (with the correct spelling this time) that can be used to instruct the browser to never send HTTP_REFERER to the destination URL.

For my own amusements, I can still correlate those visits from Google with the landing page to guess what people are interested in when visit, so it's not too big of a deal.

by khc on Sun Oct 30 19:15:45 2011 Permlink
Tags: computer

Fun with Google Translate

A conversation at work reminded me to try out the Google Translate Android app's new Conversation Mode, which I did tonight. I spoke this conversation to the phone, first as custom clerk in English and Jim in Chinese (Mandarin):

Me: Hello can I see your passport please
Android: 你好可以,我看你的護照,請
Should be: 您好,請出示您的護照

Me: 給你
Android: You
Should be: Here you are

Me: What is the purpose of your visit
Android: 您訪問的目的是什麽
Should be: 您旅行的目的是什麽

Me: 觀光
Android: Tourism
Should be: Sightseeing

Me: Where are you staying
Android: 你是在哪裡停留
Should be: 您將在哪兒住宿

Me: 中國大飯店
Android: China world hotel
Should be: China world hotel

Me: How long will you be staying in china
Android: 多久你會留在中國
Should be: 預計在中國停留幾天

Me: 六天
Android: Six days
Should be: Six days

Me: Okay have a nice trip
Android: 好有一個不錯的行程
Should be: 好,祝您玩得愉快

"Should be" is the reference translation on the website, which I have to say, is sometimes a bit odd. The entire "conversation" took about six minutes, with most of the time spent on 「給你 (gĕi nín)」. I noticed that when I tried hard to pronounce the tonal change in gĕi, most of the time it will be recognized as two words (either as 「電影 (dìan yĭng)」or 「捷運 (jié yùn)」). So I tried to drop the tonal change, and Google would translate it as "gay". That's another thing I noticed, Google tries too hard to recognize English phrases within Chinese, and there's no way to tell it "Yes I really am speaking Chinese, only!". Lets not forget that it translated「給你」incorrectly. At least things are (sometimes) better when it has more context to work with, so longer sentences are (sometimes) both recognized and translated better (ex: 「我有一樣東西給你」is translated to "I have a thing for you"). Context is both a blessing and a curse though, if what you are trying to say sounds similar to a common noun (which probably means it's searched on Google often), then good luck trying to make Google return what you really mean.

There are other relatively minor problems as well. Most sentences took multiple tries to be recognized correctly. I am not sure if it recognizes English better or my Mandarin is worse (probably both), it has a much easier time "understanding" English sentences. If Google thinks it recognized a word correctly, there's no way you can tell it otherwise (that is, using the built-in correct option, you can edit it out and start over). And if it recognized a Chinese word incorrectly, even if it's not sure (those are shown as blue), you can't correct it either.

Just for fun I tried to speak the conversation the other way. I am not going to post the conversation here, other than that Google insists 「預計 (yù jì)」is "wiki" and I couldn't find a way to get past that. Did I say Google is trying too hard?

by khc on Mon Oct 17 23:29:24 2011 Permlink

New (work) laptop and other things

My work laptop's harddrive was unhealthy last Friday. At first I thought it could last a little longer but when I returned to work Monday morning it was returning various read errors and firefox wouldn't even start. I was able to start some more backup (as part of eating our own dogfood I've started backing up some of the stuff to it, but not all), but eventually rsync was having trouble reading the files too.

The new laptop is a ThinkPad X220, which is a little bit of an upgrade compare to the old X200 (quad core i5 instead of Core Duo). Unfortunately our PXE server only has Fedora Core 14, which isn't new enough to support the network card (which is a little problematic when you are trying to install over network). Downloaded and extracted the Fedora Core 15 bits to the PXE server, but that didn't work either because of some kernel OOPS which caused udevd (iirc) to die. Gave up, went home, put the install image to a USB drive and it installed fine from that.

Battery life is pretty good, the battery indicator said I had about 4 hours left, consider that I wasted a lot of time trying to do the install earlier and it was never plugged in. I mostly do my work on a shared server though so the laptop is mostly just a dumb terminal.

Gnome Shell worked okay. After disabling the braindead mutiple monitor/workspace preference it was even tolerable. FC16 is coming out in a month or so, hopefully the extensions that I rely on will still work. The new Evolution (3.0.3) also works much better than whatever I had. Not having to kill it 20 times a day is definitely an improvement. Openvpn didn't work out of the box because of SELinux, and by default the firewall blocks mDNS. Strangely port 22 is by default open, even though sshd isn't running by default. Coming from a Debian/Ubuntu world that confused me a little.

In other news, I just bought a PS3 from Amazon and it should be here on Friday.

by khc on Tue Oct 4 23:50:47 2011 Permlink
Tags: computer

Follow up on my follow up on my follow up on my lyric display system

Updated lrc.pl again, this time with support for the MediaServer2 D-Bus interface that Rhythmbox 2.90.1 is using. It's a plugin that you have to enable in Rhythmbox, and the previous D-Bus interface is gone.

The update also includes some rudimentary LRC editing support, which I've written a while ago but never uploaded.

by khc on Mon Jul 4 16:51:54 2011 Permlink
Tags: computer
Older Posts