Deciding on a name that does not give your offspring a penalty in life right from the start.

Part 0: postponing

Way before knowing my wife was pregnant I was aware that if I ever would get a kid it would become a tricky exercise to decide upon a name. Think about it. If I were to ask you to give me 3 names you do not want to give your future kid, you could easily give me those and probably more. You would hesitate a little and then realize Adolf is a name, but not a name for your kid. It wouldn't take long before you start going over all your classmates in primary and secondary school and starting giving me more examples of names that your kid should not get, simple because of bad associations. Coming up with viable options for your kids name is a lot harder. This is why they sell books with many names and accompanying stories explaining the origin and meaning of the names. When we discovered that a baby was in the making, step 0 was for me: postpone thinking of names. We will just look at the echo later, learn about it's gender and thus reduce the effort already by half (yes, I have seen the stories that somethimes the echo interpreter is wrong and you still get the other option, but let's keep things simple for now).

Part 1: the 20 week echo

Copyright Cyanide and Happiness www.explosm.net

After roughly 20 weeks the build is sufficiently complete that you, no, that the echo specialist, can see what the gender is off your offspring. In our case, the lady doing the echo detected a little extra and therefore concluded: it's a boy. Now that we have this knowledge, we can start with the actual selection procedure and we just halved the database of possible names. Not a bad start when it comes to filtering for options.

Speaking of databases, we actually need databases to start with. There is a ton of websites out there with collections of names, but few of them offer actually usable downloadable lists. I used a number of lists from Cotse.Net and I found a nice complete Dutch name database here. The Dutch database features also some statistics on how often the name was picked in recent years. Interesting, but for the selection procedure for now I would like to start with a list of just names. The following one-liner does exactly that for me:

cat jongensNamen.txt.org | cut -f 2 | cut -d ' ' -f 1 > jongensNamen_cut.txt
Then: merge all the name lists I have collected and remove all duplicates in the process. The easiest way to do that is to convert it all to lowercase (not all lists were using capitals) and then sort it:
cat *.txt | sed "y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/" | sort | uniq > names.txt
This reduced the count from 351272 to 103845 for the ridiculous selection of lists I had chosen. Opening the resulting file I saw some strange names... Some of them weren't exactly ASCII compatible (e.g.: "Beno<C3><AE>" and no, my terminal is not UTF-8), so I decided my kid should be given an ASCII compatible name to make things easier in life. To achieve that I added the following to the previous command:
grep "^[abcdefghijklmnopqrstuvwxyz]*$"
to fix that; the count is now down to 94836.

The list now looked... reasonable. Is is however clear there's some weird naming going on in this world. Highlights from page 1:

  • A. It is short. But... really?
  • Aardappel. Potato in Dutch... either I included the wrong word list or someone on this globe has a weird name. I decided to search and Google did not dissappoint . There's 4 of them...
  • Aarsarbetskrafter. Errrr... this sounds like a Dutch/German/Something word and at least to me it says: "ass labour worker" or something along those lines.
More changes: it seems I'm going to make many more changes to my filtering, so it's time for a Makefile and the first change is replacing the long grep with egrep '[:lower:]{3,12}'. Egrep knows the lowercase characters as a character class, so I don't have to spell it out anymore. Secondly, the {3,12} tells egrep I'm not interested in any name shorter then 3 or longer then 12 characters; this should prevent names like "A" and names that won't fit in a database. We are now down to 99791 options. Wait wut... it went up. Note to self: copy pasting a command line into a Makefile can lead to mistakes. A '$' has a special meaning in Makefiles, so what I needed to get a dollar into the grep command was actually $$. Now I'm down to 92515. Time to make some more drastic changes. Let's just start with the Dutch database as the starting point because it contains every first name existing at the moment in the country. Changes are I don't want to use a name no one has ever used in the whole country. Step 2 will then become somehow verifying it is pronouncable in English and Romanian. The Makefile is helping here; one change later and I'm at 4130 names; that's more manageable. Now the filtering: for every Dutch name candidate I'd like to check it exists outside the Dutch database; that should be a very rough indication of it being usable outside of this country. This does that trick:
-@for f in `cat dutch_names.txt`; do grep "^$$f$$" check_names.txt; done > names.txt
It needs a minus before the line in the Makefile, because if grep can not find a name it returns an error code it seems. Notice also the ^ and $ to make sure we are matching the whole name and not that it exists as part of other names. Next up: my wife is Romanian, so I think the name should be compatible for Romania as well. I'm not going to repeat what I did for English compatibility because it seems Romania primarily still uses mostly religious names that come with an associated "name day", which is a bit like a birthday, but then on the day the saint associated with that name was born (or died?). What I can do however is make sure the Romanians understand it's a boy from the name, which boils down to: does it end with an 'a'? Then it's a girl; otherwise it's a boy. So let's remove all options ending with an 'a'; this removes 78 names ranging from Abdalla, via Bora (a Volkswagen model...) and Tuna (the fish?) to Zakaria. Related to filtering on name ending: I also want to filter on the start; a.k.a. the initial(s). Why? It is not really convenient (snail mail wise etc.) to have the same initials twice in one house. So no starting with my or my wife's initial. We're not done yet :) After filtering on ASCII and usability in other language areas it is also time for a blacklist. There is some names you really do not want to handicap your future kid with. Adolf comes to mind. But also my own name: I don't want my kid to have the same name as me. So Adolf and a few other obvious ones go into this blacklist. More inspiration for the blacklist came from this website. The following added line does just that for me and even informs me on what nice names I'm actually rejecting.
-@for f in `cat names.txt`; do if grep -q "^$$f$$" blacklist.txt; then echo "rejected $$f"; else echo $$f >> candidates.txt; fi; done
We are now down to 1516 possibilities.