Faup

Fast URL decoder library

Latest release is 1.5
Download RPM Download DEB View on GitHub

Faup stands for Finally An Url Parser and is a library and command line tool to parse URLs and normalize fields with two constraints:

  1. Work with real-life urls (resilient to badly formated ones)
  2. Be fast: no allocation for string parsing and read characters only once

What can Faup do for me?

Extract various elements from an URL with no pain. The fields we get are: scheme, credential, subdomain, domain, host, tld, port, resource path, query string and fragment ('#'). Ever dreamed of giving urls to a command line tool doing it? Faup exists!

$ faup -f domain www.github.com
github.com

Why Yet Another URL Extraction Library?

Because they all suck. Find a library that can extract, say, a TLD even if you have an IP address, or http://localhost, or anything that may confuse your regex so much that you end up with an unmaintainable one.

You can see all those failures on the regex library webpage here.

Here's a buch of example with faup run on various urls to extract the TLD:

URL Faup TLD extraction Comments
www.example.co.uk co.uk TLD > 1
www.example.bl.uk uk bl is an exception in uk TLD extraction
192.168.0.42 IPv4 address, no TLD
www.tagada.42 42 This is not an IP address, 42 is right
www.example.paris paris GTLD extracted smoothly
حكومة.امارات حكومة United Arab Emirates IDN ccTLD

How fast?

We did a bunch of tests with a few libraries, regex etc. The regex used was this one:

^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA- Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1} [0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]| [0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]| 2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\. (25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}| [0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\. (com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro| aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA- Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$

This is the result graph against 1 million URLs:

Building faup

To get and build faup, you need cmake. As cmake doesn't allow to build the binary in the source directory, you have to create a build directory.

git clone git://github.com/stricaud/faup.git
cd faup
mkdir build
cd build
cmake .. && make
sudo make install