Cezar's reflections

Friday, October 19, 2007

Next Windows, next technology

In case you haven't heard the next version on Windows, after Vista, is already in the works and it is known under the cod-name Windows 7. See a small demo.

With all the talk around how Vista isn't selling, Microsoft might need this Windows 7 sooner than they think. Is this a real opportunity for Linux to get a foot onto the desktop market and become important on the list of providers of desktop software?

Or maybe the future reserves us something different, something more online oriented where each of us will carry a small device, a la iPhone, which doesn't matter which operating system runs as long as it has a really good browser, and all important data and applications will be online.

If you think advances in technology makes possible solving problems exponentially bigger. The Human Genome Project for example, took about 10 years 90's and $3b for the first few, now takes about 1 month and about $1m, and is predicted in the next 5 years to get a few hours at only a few thousands dollars. This starts to look like Moore's law.

If 10 years ago, we had laptops slower that todays phones, and back than those laptops were faster than 20 years ago mainframes, imagine were things will be in the next 10 years. If I were Microsoft I would look what thing we might carry 10 years from now and start build for it now, because somebody else is most probably doing it already, like Google. Adobe already is thinking To Move All Apps to the Web.

So, what would you do as a consumer if you'd have a device which would be 10 to 100 times better than todays top of the line? Something like:


Today's top of the line
10 years from now
Processor:
single core, 500 MHz 10 core, 5 GHz
Memory:8 GB1/2 TB
Internet:300 Kb/s global
50 Mb/s local
3 Mb/s global
1 Gb/s local
Price:$300 - $500same in today's money
Battery:4 h talk
2 weeks standby
2 day continuous talk + data
1 month standby*

* Battery is less than 10 to 100 times because historically didn't evolve as rapidly.

Maybe the numbers are too big, but even if you consider a device the size of today's phones but with the power of a nice laptop, do you think people would use these devices of the future differently?

I think so, and the biggest impact wouldn't be the processor power or the storage volume, but the connectivity. Already having Wi-Fi new applications are popping out.

Will this technology change the way people will interact to each other?
Will it change the way they interact with computers?

Labels: , , , , ,

Wednesday, October 10, 2007

Hidden web

Hidden web, also known as deep web is defined as the public web that is reachable through filling out a web form or having an account. Usually the big search engines don't enter the deep web, for multiple reasons: usually a specialized query is needed to retrieve data, or an account is required.

In a very old paper, old by internet standards, published in 2001 in The Journal of Electronic Publishing from the University of Michigan, researchers found the following:

The deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a "one at a time" laborious way to search. BrightPlanet's search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology and thus is the only search technology, so far, that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content.

  • If the most coveted commodity of the Information Age is indeed information, then the value of deep Web content is immeasurable. With this in mind, BrightPlanet has quantified the size and relevancy of the deep Web in a study based on data collected between March 13 and 30, 2000. Our key findings include:
  • Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.
  • The deep Web contains 7,500 terabytes of information compared to nineteen terabytes of information in the surface Web.
  • The deep Web contains nearly 550 billion individual documents compared to the one billion of the surface Web.
  • More than 200,000 deep Web sites presently exist.
  • Sixty of the largest deep Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times.
  • On average, deep Web sites receive fifty per cent greater monthly traffic than surface sites and are more highly linked to than surface sites; however, the typical (median) deep Web site is not well known to the Internet-searching public.
  • The deep Web is the largest growing category of new information on the Internet.
    deep Web sites tend to be narrower, with deeper content, than conventional surface sites.
  • Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web.
    deep Web content is highly relevant to every information need, market, and domain.
  • More than half of the deep Web content resides in topic-specific databases.
  • A full ninety-five per cent of the deep Web is publicly accessible information — not subject to fees or subscriptions.

To put these findings in perspective, a study at the NEC Research Institute (1), published in Nature estimated that the search engines with the largest number of Web pages indexed (such as Google or Northern Light) each index no more than sixteen per cent of the surface Web. Since they are missing the deep Web when they use such search engines, Internet searchers are therefore searching only 0.03% — or one in 3,000 — of the pages available to them today. Clearly, simultaneous searching of multiple surface and deep Web sources is necessary when comprehensive information retrieval is needed.

Labels: , , , , ,

Monday, October 01, 2007

Google for Developers

There is a very interesting series of lectures on YouTube from Google developers, it's called Cluster Computing and MapReduce. It starts with general parallelism and distributed computing concepts but gets by the second lecture into the intricacies of Google system and even a hint of algorithms. This is very strange since Google is very closed in general but especially when it comes to its in house implementation. If this isn't just a mistake, kudos to Google for sharing these lectures with all of us, mere mortals.

Labels: , , , , ,