IMDb on iPhone and iPod touch Learn more Learn more Download from the App Store

 
 

NEWSLETTER #9

October 1996


this issue edited by Jon Reeves

Welcome to issue 9 of the IMDb newsletter. The newsletter is intended to keep database users and contributors informed of the latest developments from the management team. Comments and suggestions are welcome and should be directed to newsletter@imdb.com. Issue 10 is scheduled for mid-November.

Apologies for the delay in putting out this issue; vacation schedules and outside activities got in the way (see, we do have lives!).

I could also say we were busy celebrating our third anniversary as a web site, except we didn't realize it until after the fact. Yes, IMDb was one of the first 500 web sites (though not at our current location), and it's been around longer than most CD-ROM movie references. If you want to celebrate, October 17 will be the sixth anniversary of the database, originally as a set of UNIX shell scripts.

To subscribe to the newsletter, fill out the survey and check the appropriate box.


Contents


FUZZY SEARCHING

by Michel Hafner

The WWW version of our database access software now offers a fuzzy search for name and titles that goes far beyond the old substring and exact search options (which nevertheless remain useful and the default search types). The CPU intensive regular expression search is an additional option. The four basic search types are best (not) used in the following cases:

SUBSTRING search is the appropriate type of search if you:

  • want all names/titles that have a certain substring and don't mind getting huge lists (which is likely if the substring is sufficiently common or short). It should not be used with too common and/or short substrings unless you really need all these names/titles that match.
  • are fairly positive about the spelling of a substring contained in a title or name you are looking for and this substring is sufficiently long or uncommon enough to reduce the output to a list of moderate size. Usual first names such as 'Peter' and 'John' and articles such as 'The', for example, are not suitable since they match thousands of names/titles. Also, spelling errors will invariably lead to results that do not contain what you are actually looking for.

Substring search is the default search type and very useful if you pay attention to picking suitable substrings.

EXACT search is the appropriate type of search if you:

  • know exactly how a name/title is spelled in IMDb (which is, hopefully, the correct spelling) and that it will uniquely indentify the name/title you have in mind. In all other cases you will get a failure message or a result you likely didn't intend to get.

FUZZY search is the appropriate type of search if you:

  • Basically know how the name/title is spelled, but are unsure about details such as
    • an initial (e.g., is it "Darryl Zanuck" or "Darryl F. Zanuck"? (in this example an appropriate substring works equally well))
    • an article (e.g., is it "The Seven Samurai" or "Seven Samurai"? (in this example an appropriate substring works equally well))
    • a year date (e.g., is it "Carmen (1981)" or "Carmen (1982)"? ((in this example an appropriate substring works equally well))
    • a roman number, a label such as (TV) or (mini) or stuff such as "...". (again a substring might work equally well)
    • one or two letters in the name/title (e.g., is it "Emanuele Beart" or "Emmanuelle Beart" or "Emanuelle Beart"? Is it "Mission Impossible?" or "Mission: Impossible" or "Mission - Impossible!"?)
    • a firstname variation (e.g., is it "Larry Fishburne" or "Laurence Fishburne" or "Lawrence Fishburne"?)
    • a common short word such as "and" or "in" or "for"
    • a title such as Sir, Dame or Lady, AND ESPECIALLY
    • combinations of the above which substring search can't handle directly (e.g., you usually have to try several substring searches to get what you want)

Fuzzy search is a very useful search method if the above cases apply. But keep in mind that fuzzy search is not a more tolerant substring search! If it generally were, it would produce even bigger output than substring search for common/small substrings, which it doesn't. To separate the good matches from the bad matches (e.g., the many matches substring search will give you anyway) it outputs only names/titles whose length is more or less the same as the length of your search string (there are exceptions to this rule if the matched substring is very likely to be relevant despite its length difference to the search string).

So a fuzzy search for title 'Tim' will give you for example
Tim (1979)
Time (1986)
but not
Across The Sea of Time (1995)
Adventures of Timmy the Tooth: Malibu Timmy, The (1995) (V)

REGULAR EXPRESSION search is the appropriate type of search if you:

  • cannot formulate your search with a substring and the criteria for fuzzy search don't apply either. Since it needs more computing time than the other options on average and features a more complex syntax it's to be used with moderation. Don't use it if one of the other three options is equally suited to solve your search problem.

If you encounter unexpected or incorrect behaviour of fuzzy search, you can drop me a line and report the problem.


EFFECTS COMPANY LIST LAUNCHED

by Rob Hartill

How did Forrest Gump shake the hand of JFK? It was just an illusion of course, and in this case it was created by the good folks of ILM. You might think that ILM (Industrial Light and Magic) do the effects for all of Hollywood's blockbusters these days; well not quite, there are many other "Special Effects Companies" out there doing a great job and to acknowledge their ever growing contribution to today's movies the IMDb now records their credits.

See the extended search form for a good starting place to search or browse this new section.

[We've also added Norwegian and Italian title aka lists at the FTP sites.]


UPDATE CYCLE CHANGE

by Col Needham

The database continues to grow at an amazing rate despite the fact that we now have complete information for thousands of movies and people. I'm pleased to say that contributors are finding new areas to research and expand the database (particularly silent movies and non-US releases). However, new data has to be processed and validated and this takes an increasing amount of time as the database grows.

Previously additions were distributed to the database editors on Friday for processing over the weekend ready for the site update late on Sunday. This has now been moved to Thursday in order to allow more time and has worked out very well. It means that the best case processing time moves from 2 days to 3 days (and worst from 9 to 10 days) but it means we can still keep on top of everything in weekly cycles.

As a side effect of this, the deadline for the template additions interface has been moved from Thursday to Wednesday to allow time for the templates to be processed.

Please keep the new information pouring in and help the database to grow.


PLOT SUMMARIES WANTED

by Col Tinto

As ever, we need your summaries! Starting to scrape the bottom of the 'most voted for' movies list now, but here are another 20 popular movies which as yet don't have summaries.


Night Shift (1982)
Johnny Dangerously (1984)
Doctor Detroit (1983)
Taps (1981)
Hairspray (1988)
Force 10 from Navarone (1978)
Revenge of the Nerds II: Nerds in Paradise (1987)
River Runs Through It, A (1992)
Jabberwocky (1977)
Highlander III: The Sorcerer (1994)
Postman Always Rings Twice, The (1981)
Fly II, The (1989)
Police Academy 6: City Under Siege (1989)
Fly, The (1958)
House of the Spirits, The (1993)
Flamingo Kid, The (1984)
Grease 2 (1982)
Hackers (1995)
Action Jackson (1988)
Caddyshack II (1988)


NEW ADDITIONS GUIDE

by Col Needham

A new version of the complete database additions guide was published at the end of August. A copy is available by sending e-mail to the IMDb mail-server at with the subject: HELP ADD FULL or alternatively via FTP:

ftp://uiarchive.cso.uiuc.edu/pub/info/imdb/tools/additions-guide.gz

or any of the other IMDb ftp sites.

There are several changes, but most notably a new policy on uncredited appearances. All uncredited appearances must now be tagged with the attribute (uncredited) whether it be a cameo from a major star in a recent movie to a bit player in older movies where ususually only the principal cast are credited. Use of this attribute will automatically trigger the removal of the cast order number, thus fixing the problem highlighted by Rod Crawford in the previous newsletter.


HOT SEARCHES

Here's the most popular searches people have done lately, based on total pages for the week ending September 28.

Titles:

  1. 1. Independence Day (1996)
  2. - 2 Days in the Valley (1996)
  3. 3. Star Trek: First Contact (1996)
  4. 10. Star Wars (1977)
  5. 2. Striptease (1996)
  6. 5. The Rock (1996)
  7. 7. Twister (1996)
  8. - Trainspotting (1995)
  9. 4. Mission: Impossible (1996)
  10. 11. Braveheart (1995)
  11. 8. Heaven's Prisoners (1996)
  12. - The First Wives Club (1996)
  13. - Last Man Standing (1996/I)
  14. 12. Pulp Fiction (1994)
  15. - Naniwa Ereji (1936)
  16. - ...och alla dessa kvinnor (1944)
  17. 20. Terminator 2: Judgment Day (1991)
  18. 15. Jurassic Park (1993)
  19. 18. The Cool Surface (1994)
  20. - A Time to Kill (1996)

ID4 continues strong, though without the commanding lead it had last month. Star Wars climbs from number 10. Besides the usual new releases, surprising showings by Naniwa Ereji (aka Osaka Trilogy) and a 1944 Swedish title.

People:

  1. 3. Teri Hatcher
  2. 5. Jenny McCarthy
  3. 4. Tom Cruise
  4. 2. Demi Moore
  5. 1. Pamela Anderson
  6. 11. Sandra Bullock
  7. 12. Bo Derek
  8. 8. Shannon Tweed
  9. - Alison Armitage
  10. 10. Mel Gibson
  11. 18. Alyssa Milano
  12. 7. Kim Basinger
  13. - Elizabeth Berkley
  14. 13. Helen Hunt
  15. 15. Brad Pitt
  16. 6. Groucho Marx
  17. - Heather Locklear
  18. - Michelle Pfeiffer
  19. - Sean Connery
  20. 20. Arnold Schwarzenegger

Margaret Colin drops off the top 150 completely (how quickly they forget); Jeff Goldblum plummets to #102. And Will Smith may have helped him save the world, but he's already down to #39. Kevin Costner, Sharon Stone, and Jennifer Connelly just missed the cut this month. Michelle Pfeiffer's latest movie opens in a couple weeks; should raise her standing even more. At least Sean Connery gives hope to us balding males. Number 1 Hatcher was visited about 5 times as often as Arnold. Joan Bud (who?) was #30.


HOT MOVIES

by Col Needham

Movies opening in the US in August/September sorted by number of votes (to September 26th):

  10000001022206.5Escape from L.A. (1996)
  0.000001252028.8Emma (1996)
  11000101011995.5Island of Dr. Moreau, The (1996)
  00000012121967.6Tin Cup (1996)
  20000000031735.8Crow: City of Angels, The (1996)
  00100111111256.4Chain Reaction (1996)
  00000111011116.8Jack (1996)
  0000000113817.5Matilda (1996)
  0000001112736.9Fan, The (1996)
  00.0.01123708.0Freeway (1996)

Movies opening in the US in August/September sorted by average votes (to September 26th):

  0.000001252028.8Emma (1996)
  0.0...1123588.6Fly Away Home (1996)
  0.0..01024398.5Basquiat (1996)
  ..00100014348.1Alaska (1996)
  0000000122618.1Spitfire Grill, The (1996)
  00.0.01123708.0Freeway (1996)
  0..0001113397.9First Kid (1996)
  00000012121967.6Tin Cup (1996)
  0000000113817.5Matilda (1996)
  0.000.11.4387.5Maximum Risk (1996)

IMDb IN THE NEWS

by Jon Reeves

Just a few of the traditional media outlets that have mentioned us lately:

The Net (US). Newsday. Yahoo! Internet Life (September *and* October). Boston Globe. Utne Reader. Sight and Sound. KFBK Radio, Sacramento. P.O.V. Magazine. I-way 500 (best Leisure site). WebSight (months ago; we just found out). BBC Radio 1. Library Journal.

We're particularly proud of the review in Yahoo! Internet Life (Sept.), where the two best known US movie reviewers, Roger Ebert and Gene Siskel, both gave us a thumbs up.

We've also won several new awards. See selections from the gallery here.

NetBest Awards (finalist). Awesome Universal t@p 500 WebSites. Access to the World Cool Link of the Week. Computer Currents Interactive Link of the Week. Komputer Klinic Kool Site. (WFMM) Cool Site o' the Day. USA Today Hot Site. P.O.V. Top 100 (#53). I-way 500 (best Leisure site). Top Shopping Site: All Internet Shopping Directory.

And a web-related mention of note: the hot100 list shows us as the eighth hottest site on the whole net.


WEB SERVER CHANGES

by Rob Hartill

Since the last newsletter, a lot of coffee has been consumed and in between the trips to the kettle some new code has been added. Elsewhere in this newsletter you can read about Michel Hafner's new fuzzy matching code (written in C you know!, how did he slip that by my perl-only filter?). Other changes include much more online checking of web based submission to try to clean more of the data we receive before it even leaves your web browser. All the extra checks and warnings might frustrate at first, but we hope they remind you how best to submit clean data that can be added sooner... all those warnings used to be fixed by us manually :-(

The old style quiz has been put to rest and replaced with a new quiz. At the moment it comes in two varieties: (1) a name guessing, based on a hangman like game and (2) a multiple choice quick quiz (in the same style as the old quiz) with questions that will be designed to tease and educate. If you have some devious questions that you'd like to add to the quiz, send them to me please.

The main search form now allows searching of 'business', 'goofs', 'technical' and 'trivia' under the 'word search' section, so now you can search for your favourite type of goof or studio filming locations, etc.

Behind the scenes, our servers became HTTP/1.1 compliant thanks to our developers version of Apache 1.2 and our Perl became fuel injected thanks to Doug MacEachern's "mod_perl_fast" Apache plug-in module that embeds a Perl interpreter into our Apache server.

Look out for translated versions of key pages in the very near future. Using Apache's language negotiation feature we'll soon be serving up some pages in French, German and Italian (to begin with). For user with browsers capable of specifying a preferred language (e.g. Netscape 3 for Win/Mac [not for Unix! sigh]) the new pages will magically appear if you prefer a language other than English and if we have a translation available. Everyone else will continue to see English.


XREGAL UPDATES

by Lachlan Wetherall

Since the last newsletter, versions 1.1 and 1.2 of xregal have been released. Xregal is an X11 hypertext interface for the Internet Movie Database when it is installed locally on a Unix host. Apart from a number of bug fixes the main features added from version 1.0 to 1.2 are:

  • The display now looks a lot nicer, with centered titles, automatically wrapped paragraphs, hanging indents, bulleted lists, and italicized subheadings.
  • There are many more options for changing the appearance and behaviour of xregal through command line parameters and/or X reseources.
  • Text can be selected with mouse button 1, in order to paste data to other X applications.
  • It is now possible to save an ASCII version of the displayed data to a file.
  • There is a new 'search for character name' menu option.

For a full list of changes, consult the ChangeLog file. Version 1.2 of xregal requires the moviedb3.2g package to be installed first. Both xregal and movidedb3.2g are available from the usual IMDB ftp sites:

ftp://uiarchive.cso.uiuc.edu/pub/info/imdb/tools
ftp://ftp.digital.com.au/pub/imdb/tools

The latest development version of xregal is always available from the xregal home page.

If you have any suggestions on improvements for xregal, drop me an e-mail. Bug reports are especially welcomed and acted on speedily.


DATABASE STATISTICS

by Jon Reeves

This is a regular section giving information about the current size and growth of the IMDb. We receive between 30,000 and 40,000 additions every week from users all over the world.

   Number of filmography entries: 1,194,654
   Number of people covered:        337,775
   Number of movies covered:         84,196

   Size of the database (Mb):            97

Recent milestones:

  • Over 1,000,000 lines of additions (all categories) for the year
  • Over 500,000 actor entries
  • Over 1000 titles with MPAA ratings reasons
  • Over 2000 mini-biographies
  • Over 3000 titles on the business-info list
  • Over 15,000 plot summaries
  • Over 25,000 running time entries
  • Over 35,000 language entries
  • Genres for over 41,000 movies
  • Over 80,000 movies
  • Over 100,000 titles (including aka's and TV series)

FUTURE DEVELOPMENTS

This is a regular section listing some enhancements we're currently looking at. Please bear in mind that some of these may take quite a while to come to fruition or even fail to materialize because the original volunteer decides not to proceed.

  • Washed Update: Greg Bulmash's column on forgotten stars from the '70s will soon be available first on the IMDb web site.
  • a list of other companies; currently, except for production and special effects companies, the database records only the names of individual people.
  • outline list: a "one line" plot summary, short enough to display on the main title page.
  • a "crew completion" list, similar to the cast completion list.
  • a list of "influential scenes"... the scenes that launched a thousand spoofs, became the director's trademark, changed cinema forever, launched a star.
  • a separate list of films in production, with their current status.
  • a complete rewrite of the additions interface. The survey results suggest that many people struggle with the clumsy interface currently in place. Taking the comments from the survey and our own ideas we are completely rewriting the additions interface. This is a *major* undertaking and will take some time to complete. We're confident the results will be worth the wait!! :-)
  • full support for accented characters (ISO 8859-1) without losing people that can't type them. Implementation in progress.
  • proper handling of writer credit order.
  • a locally installable MS-Windows interface to the database is under final testing for those of you who want to reduce your phone bills!
  • enhanced awards section for the database covering more international festivals, national film institutes etc.
  • general support for alternate titles in languages other than English and the language of the original country.
  • a movie recommendation service that will use your vote records to suggest other movies you might enjoy. Initially available via an E-mail interface. Time to check you're up-to-date with your voting!

Academy Awards and Oscar are registered trademarks of the Academy of Motion Picture Arts and Sciences.