Introducing Anonymity to Youtube

The video sharing giant, YouTube has introduced a long needed feature to protect peoples anonymity – the ability to blur peoples faces.  For many years activists and protestors living in police states and under dictators have used social media sites to inform the world of events in their countries.  Often with communications locked down and extensive surveillance in place these sites are the only ways that people can communicate with the outside world.  Unfortunately in countries like Syria, Iran and many of the ex-Russian states the authorities use these videos for their own security requirements.

Just think about it – these protest videos will commonly involve activists speaking out against regimes, close pictures and video streams of protests – all identifying leading activists and protesters.  All the regimes have to do is identify them and go and pick them up.  They also use them for real time security operations, videos are often uploaded live during meetings and protests.  Security services can use them to quickly identify locations and send people in to disrupt or arrest individuals involved.

There are no specific figures on how many people have been arrested across the world because of these videos, but it is certainly in the thousands judging on the number of stories that circulate.  Giving users the ability to quickly blur faces and other recognisable features in a video will go a long way to protecting the safety of activists in countries like Iran.  It of course doesn’t offer the true panacea of online anonymity that sites like this aspire to –  but it offers an important level of protection.  It also raises awareness of the dangers that posting these videos online can cause.  Many videos are uploaded by bystanders or angry young people who perhaps don’t consider the consequences of identifying activists on the front line.

There are other issues with uploading and identifying individuals online, there are definitely other dangers too.  Many people fail to consider that their IP address is logged, by many different servers while they are online.  Records exists in ISPs and on the web servers you connect to.  The only way to protect this is to obscure your real IP address and encrypt your connection.   Ironically although not advocated as a security precaution – this video about watching the BBC outside the UK, demonstrates one method of uploading videos anonymously.

The technology is not quite perfect and we are unsure behind the effectiveness of the algorithm that detects and blurs the faces.  This is not a selective technology either at the moment with YouTube only offering the facility to blur all faces (not select which to blur and which to leave). Whatever it’s initial shortcomings though, it’s certainly an important step in offering some level of privacy online and hopefully we’ll see similar measures being implemented at other video and photo sharing sites shortly.


Mathematics: Using Binary Classification in Email Filtering

Have you ever wondered how applications manage the huge amount of junk mail that is sent every day. Well the most common method is to use the binary classification, to help filter the junk mail from the real emails.  This means that any system should decide what to do with any individual email based on a simple decision or classification.  That is if an email is junk it should be either deleted or placed in a junk email, if not junk it should be delivered to the recipient.   It’s difficult to implement though as basically it relies on a confidence level of whether the item is junk or not.

This is where the problems starts because a binary classification has no real idea of confidence – it’s either junk or isn’t.  If the system decides that an email is junk and it isn’t then this is called a False Positive.  However if the application decides something is junk and it isn’t then the mistake is known as a False Negative.

There is another problem with using a simple binary classification method, in that sometimes whether an email is junk is very often a subjective decision. One person might consider the hundreds of loan offers arriving in his inbox the very epitomy of junk email, however someone else may be looking for one these services.  It could be said that there are certain rules which could define a junk or spam email, but an application should consider the emails as simply data.  It is a similar issue in dealing with classification on patient data in the NHS.

There can be no place for the subjective decision in our binary system – rules must be defined and ambiguity removed.  Most systems slowly build up these rules often using some user interaction.  For example emails can be marked as junk initially and users allowed to confirm these decisions, hence a set of rules can be built up to create an absolute definition.  This is essential in order to reliably identify each component and ensure we know where the email is originating from whether it’s from the USA, Russia or Australia for example.

This can cater for exceptions to the specific rules.   For example some people use encryption programs like PGP to encrypt very important emails.  Or they may modify the source and destination fields by using a UK VPN like this, which can be almost impossible to detect. These of course are a long way from junk status however to an application the email will look like junk and completely unreadable.  Without the key and a facility to decrypt the email, anything like this would get swallowed up by a binary classification system – if you want to read more on this – here’s a primer on email security.

This also allows the system to operate the binary classification system but based on an individuals subjective preferences. Other systems have other methods of reducing mis-classifications – like a temporary area where emails can be retrieved and reclassified with user intervention.