Posted on May 5, 2011

Machine Learning cheat sheet

For a recently taken course in Machine Learning, a substantial part involved learning and applying linear classifiers and clustering algorithms on smaller data sets. In order to summarise the most important material, I created a cheat sheet in LaTeX. I figured someone else might appreciate it as well, so why not make it available for more people than myself?

cheat sheet preview

.pdf can be downloaded here.
.tex-file is on Github here; feel free to modify or add information. Please let me know if you find mistakes!

Note that his document was really only created for my own study purposes, and hence might be of limited use for others. Hopefully not, though.

EDIT: Discussion on Hacker News: http://news.ycombinator.com/item?id=2515612

Posted on Apr 12, 2011

Implementing durability for in-memory databases, on SSDs

As the examination for a recently completed course in Database Systems Implementation, students had to implement a durable, high-throughput, in-memory key/value database for strings, coincidently the same problem as this year’s SIGMOD programming contest. I thought I’d present aspects of my own implementation of durability, focusing on problems I encountered and how I solved them. I also relate parts of my solution to existing NoSQL databases as well as how SSD disks can reduce the sophistication needed for a good-enough solution.

Perhaps the main problem of the programming contest lies in implementing the in-memory data structure itself, with issues such as concurrency control and efficient string comparisons. I will, however, assume such an arbitrary data structure, and focus this post on implementing durability.

Disclaimer
This is by no means efficient, beautiful or advanced, and ideas should probably not be copied without first dissecting them with experienced eyes. I learned as I proceeded, and I have essentially no prior knowledge of databases up until this course and assignment. This post is written for unexperienced enthusiasts like myself; for others there is probably little new.
Continue Reading

Posted on Jan 31, 2010

White space time management

The naïve approach to graphic design is usually to include as much information as possible on a given canvas. White space is nothing more than unused space, ready to be filled with more graphics and copy. However, this is obviously a bad strategy since all it does is confuse the viewer and obscure the message. This is why Google became king of search, and why Apple keeps being awesome.

Similarly, the naïve time management strategy is to fit in as much work and meetings in a schedule as possible. The person with the busiest calendar is clearly very good at managing his time and responsibilities. Or is he?

There is a difference between busy following a schedule, and busy solving a problem. This is essentially the same as a design busy presenting information and a design conveying a message.

Strangely, the concept of simplicity is never the natural state but always the result of carefully considered choices. Just as a design process should be about removing clutter until the bare essence is left, time management should be about removing appointments. Not adding new ones where they fit.

White space is a powerful element, necessary for creating dynamics between the essential components of a design. In the same way, free time is essential for effective time management. Leisure time works in your favor when alternated with creative problem solving. Mindlessly adding more work is thus nothing else than adding clutter.

True productiveness is just as much a product of free time, as of hard work. This is why the most creative people always tend to have hobbies, read books, write and travel — while the mediocre majority complain about being busy.

Posted on Nov 30, 2009

The Game 3.0

Yet another game is now complete and over. I’d have to say this was one of the better ones we’ve made! It’s always very difficult to judge the quality and difficulty of tasks beforehand, but based on player response the conclusion is that it was successful. I thought I’d share some of the problems that were in the game. For more about the game me and Fredrik make, see The Game and The Game 2.0.
Continue Reading

Posted on Nov 19, 2009

The Game 2.0

Last week I went back to the UK with Fredrik, for another game. There turned out to be less participants, but we did our best to create some interesting problems anyway. Here are two of them.

Stage 1, problem 6 — Retrieve me treasure…

…or reel th’ plank!
map
Clues:
Yer booty is t’ be found on “Indoor Island”.
Me treasure is me weap’n!

The key to solve this task is to realize that the map does not depict some South Sea island, but instead a location at the venue where the game took place. To help finding the right place, some clues were given: the compass shows the relative orientation and “Indoor Island” indicates that it’s not outside. Through the event’s website, this seating map for one particular floor can be found:

The basic characteristics are similar, and the actual interior design corresponds to objects in the map. For example, “West Port” and “East Pier” indicate entrances, and “Underground Gorge” is the escalator from the floor below. If the participants went to the “X“, they found a poem containing a reference to the password: the poor pirate’s gold plated dagger.

Stage 2, problem 5 — Jazz

U+100AA

o hrhi ng odvm xutz grfwor nideiyy jiuz’x oxrivhi vm colxh phse, rzh cxxhrbw ux to cartxx crdiq. bl tymx jty tigi ux sujf lnok fvxx gagt yq lnw rojf xux ulu ieef coixh, ctod r tmta vrzoi shx lzhmaz zof xsaz cikt e fbtgcq hexgm. yq qhlz hrhi yhukvp yc tz ae grstsicuee lqy ktvbnmh wdmtazeeurt ekamqw ngj syuzrkkd re lr yuuep autz a xdsgxyqlq xubtg r dsfx os rzh uhc rri xux yuexmtaz wre ycht tyq wptxcvxc pkkakqh tkgsj. m rrp coixh, ztzeiuey potyayg ukies vrtr, wyqvr iuoi slblzs, sdinmnies hexgmj xmxx gii, pvvyzeu rsemaikayfee asayg…eokv flnm gsyqr, stttrexvv lixgvr zriuurt muwrdh ubs tydshzn tyq ezhxpyayf mxeve.

The easiest method to solve this task was to identify the string “U+100AA” as a Unicode code for the Linear B ideogram “Garment.” By intuition or by evaluating alternative encryption methods, the larger paragraph of text was found to be a Vigénere cipher. “Garment” was thus the keyword. If deciphered, and Googled for, the text could be identified as an excerpt from Fitzgerald’s The Great Gatsby. The really literate participants would, of course, recognize the text immediately and also that “Jazz” refers to The Jazz Age — the period in American history in which The Great Gatsby took place.

All in all it was lots of fun creating and interacting with the competing teams! Next game is in only 2 weeks, something I’m greatly looking forward to.

Posted on Sep 30, 2009

Do stuff

Here’s a graph illustrating a model of the relationship between doing a lot of stuff, and the amount of fun it results in:

Productivity graph

With stuff I refer to anything productive and rewarding, such as a course at school, a qualified job or a project of your own. And with fun I refer to the feeling of satisfaction and purpose that is the result of doing meaningful things.

So, what can this model tell us?

  • Do more, create something, engage in productiveness and a great feeling will follow.
  • Eliminate wasted time to give room for more personal projects, sports or arts.
  • Being overworked removes all the joy from what you’re doing.
  • Make sure to find out your personal “maximum workload constant”, to know the feeling of when there’s simply too many things going on. You’ll never want to end up getting burned again.
  • Remember where your limit is, and carefully balance your workload to stay just below the threshold.
  • The “maximum workload constant” is not constant: it can be extended to allow for an increased capacity.
  • Having too little to do is far better than being overworked.
  • When you have “almost too much to do” it’s really just the right amount of work!
  • If you’re engaged in stuff you like, and you are filling your time with it, you’ll hopefully experience flow.

Or as the Ruby hacker _why said:

when you don’t create things, you become defined by your tastes rather than ability. your tastes only narrow & exclude people. so create.

After all, it’s a thousand times more interesting to talk to someone that fills his time with interesting work and projects of his own, rather than someone completely defined by his music taste or belief. Experiences come through interaction with the real world, and they don’t create themselves — they need to be obtained through hard work. And a few leaps of faith.

Posted on Aug 13, 2009

The Game

Twice a year me and Fredrik create The Dreamhack Game (DHG), at the Dreamhack computer festival. Earlier this summer we got an email from the Multiplay staff, who arrange UK’s largest LAN parties, inviting us over to create what is now the i-Hunt. Apparently they knew of what we do in Sweden, and liked it enough to fly us over to their own event. Quite cool indeed, and of course we made the most out of it. I’d say my first international “business” trip was a success.

The game is advertised as a “contest of intellect, lateral thinking and logical skill”. No special skills or knowledge are required, only the ability to figure out what to do and how to obtain necessary and relevant information. The tasks usually include elements of code breaking, alternate reality gaming, geocaching, deciphering, treasure hunting and various puzzles. Advanced Google skills are fundamental to solving the game, and so are endurance and thoroughness. Once a problem is solved, you move on to the next level. You usually compete in teams, but you can never know the current position of your competitors. This makes it a competition of intelligence cloaked in mystery and with a touch of psychological warfare.

There were a few tasks we made for this event that I’m a little extra proud of. I’ll explain them here, as they give a very good picture of what The Game is really about. But if the reader feels like giving the game – and these problems in particular – a try, then head over to the i-Hunt website (which will be up until November 09), and register to play. An answer sheet is also available if you get stuck and just want to try the next problem.

Stage 1, problem 5 — The Shameful One

An outbound coast
Surrounds or embraces? The city
Hamilton’s (NZ) antipode.
The Shameful One

There are two clues here – the haiku poem and the maze image. Both can give the answer on their own, but also in combination with another. The line that is traced through the maze when solved, can be identified by the hunter as the south coast of Spain, and part of Portugal. This is what is referred to in the poem as “An outbound coast”. Furthermore, the location of the square dot indicates “The city, Hamilton’s (NZ) antipode” – which is the city of Córdoba in Spain. An antipode is a complete opposite geographical location, a rare property that Córdoba and Hamilton in New Zealand share. Córdoba is thus the password.

Stage 2, problem 3 — Who’s coming to visit today?

Have you been paying attention to the local tv-station?

In this problem, the contestants had to realize that “the local tv-station” referred to the event-specific daily Youtube broadcasts, mainly the Saturday one found here. The careful watcher will notice the announcement of a new sponsor – Tentacle Technology. However, this company is not to be found on the internet, nor did they actually show up at the venue. Instead, if the URL http://tentacletechnology.com was thought of and followed, a really fancy website was found. The password “puppet” could be found if downloading the latest press release.

Smell something fishy here? That’s because me and Fredrik made it all up; in two hours we had set up the company website, got fake sponsorship deals, marketed ourselves, staged a 31 year old corporate history, stolen product descriptions from IBM and even put together a catchy mission statement. This kind of ARG-inspired problem is one of my favourites.

Stage 2, problem 5 — Think inside the box

Cards

In this last problem, the first realization to make is that the cards are actually not part of some unspecified card game. Instead it’s a sudoku, and when solved the numbers revealed in order are 174143214192. For a hunter, this is immediately identified as the IP address 174.143.214.192, and one of the first things to do with an IP is to HTTP it. When done so, a simple website containing the following image was found:

ihunt

Again, the experienced contenstant would identified the erect and fallen cans as morse code. When deciphered, the final password was “wey“.

* * *

At Dreamhack we attract ~600 players, and at the i-Hunt we managed to get 200 registered players which should be considered good, since it was the first game in the UK for us. However, it looks like I’ll be going back there soon, as I’ll probably be arranging the i-Hunt three times a year in total – which feels great! I’m looking forward to get to know Britain more, as well as to work with creating and further evolving The Game.