NEWSLETTER #9
October 1996
this issue edited by Jon Reeves
Welcome to issue 9 of the IMDb newsletter. The newsletter is intended to
keep database users and contributors informed of the latest developments
from the management team. Comments and suggestions are welcome and should
be directed to newsletter@imdb.com. Issue 10 is scheduled for mid-November.
Apologies for the delay in putting out this issue; vacation schedules and
outside activities got in the way (see, we do have lives!).
I could also say we were busy celebrating our third anniversary as a
web site, except we didn't realize it until after the fact. Yes, IMDb
was one of the first 500 web sites (though not at our current location),
and it's been around longer than most CD-ROM movie references. If you
want to celebrate, October 17 will be the sixth anniversary of the
database, originally as a set of UNIX shell scripts.
To subscribe to the newsletter, fill out the survey
and check the appropriate box.
Contents
by Michel Hafner
The WWW version of our database access software now offers a fuzzy
search for name and titles that goes far beyond the old substring
and exact search options (which nevertheless remain useful and the
default search types). The CPU intensive regular expression search
is an additional option. The four basic search types are best (not)
used in the following cases:
SUBSTRING search is the appropriate type of search if you:
- want all names/titles that have a certain substring and don't mind
getting huge lists (which is likely if the substring is sufficiently
common or short). It should not be used with too common and/or short
substrings unless you really need all these names/titles that match.
- are fairly positive about the spelling of a substring contained in
a title or name you are looking for and this substring is sufficiently
long or uncommon enough to reduce the output to a list of moderate
size. Usual first names such as 'Peter' and 'John' and articles such
as 'The', for example, are not suitable since they match thousands of
names/titles. Also, spelling errors will invariably lead to results
that do not contain what you are actually looking for.
Substring search is the default search type and very useful if you pay
attention to picking suitable substrings.
EXACT search is the appropriate type of search if you:
- know exactly how a name/title is spelled in IMDb (which is, hopefully, the
correct spelling) and that it will uniquely indentify the name/title
you have in mind. In all other cases you will get a failure message
or a result you likely didn't intend to get.
FUZZY search is the appropriate type of search if you:
- Basically know how the name/title is spelled, but are unsure about
details such as
- an initial (e.g., is it "Darryl Zanuck" or "Darryl F. Zanuck"? (in this
example an appropriate substring works equally well))
- an article (e.g., is it "The Seven Samurai" or "Seven Samurai"? (in this
example an appropriate substring works equally well))
- a year date (e.g., is it "Carmen (1981)" or "Carmen (1982)"? ((in this
example an appropriate substring works equally well))
- a roman number, a label such as (TV) or (mini) or stuff such as "...".
(again a substring might work equally well)
- one or two letters in the name/title (e.g., is it "Emanuele Beart" or
"Emmanuelle Beart" or "Emanuelle Beart"? Is it "Mission Impossible?" or
"Mission: Impossible" or "Mission - Impossible!"?)
- a firstname variation (e.g., is it "Larry Fishburne" or "Laurence
Fishburne" or "Lawrence Fishburne"?)
- a common short word such as "and" or "in" or "for"
- a title such as Sir, Dame or Lady, AND ESPECIALLY
- combinations of the above which substring search can't handle directly
(e.g., you usually have to try several substring searches to get what
you want)
Fuzzy search is a very useful search method if the above cases
apply. But keep in mind that fuzzy search is not a more tolerant
substring search! If it generally were, it would produce even
bigger output than substring search for common/small substrings,
which it doesn't. To separate the good matches from the bad matches
(e.g., the many matches substring search will give you anyway) it
outputs only names/titles whose length is more or less the same as
the length of your search string (there are exceptions to this rule
if the matched substring is very likely to be relevant despite its
length difference to the search string).
So a fuzzy search for title 'Tim' will give you for example
Tim (1979)
Time (1986)
but not
Across The Sea of Time (1995)
Adventures of Timmy the Tooth: Malibu Timmy, The (1995) (V)
REGULAR EXPRESSION search is the appropriate type of search if you:
- cannot formulate your search with a substring and the criteria for fuzzy
search don't apply either. Since it needs more computing time than
the other options on average and features a more complex syntax it's
to be used with moderation. Don't use it if one of the other three
options is equally suited to solve your search problem.
If you encounter unexpected or incorrect behaviour of fuzzy search,
you can drop me a line and report the problem.
by Rob Hartill
How did Forrest Gump shake the hand of JFK? It was just an illusion
of course, and in this case it was created by the good folks of ILM.
You might think that ILM (Industrial Light and Magic) do the effects
for all of Hollywood's blockbusters these days; well not quite, there
are many other "Special Effects Companies" out there doing a great
job and to acknowledge their ever growing contribution to today's movies
the IMDb now records their credits.
See the extended search form for
a good starting place to search or browse this new section.
[We've also added Norwegian and Italian title aka lists at the FTP sites.]
by Col Needham
The database continues to grow at an amazing rate despite the fact
that we now have complete information for thousands of movies and
people. I'm pleased to say that contributors are finding new areas to
research and expand the database (particularly silent movies and non-US
releases). However, new data has to be processed and validated and this
takes an increasing amount of time as the database grows.
Previously additions were distributed to the database editors on Friday
for processing over the weekend ready for the site update late on
Sunday. This has now been moved to Thursday in order to allow more time
and has worked out very well. It means that the best case processing
time moves from 2 days to 3 days (and worst from 9 to 10 days) but it
means we can still keep on top of everything in weekly cycles.
As a side effect of this, the deadline for the template additions
interface has been moved from Thursday to Wednesday to allow time for
the templates to be processed.
Please keep the new information pouring in and help the database to grow.
by Col Tinto
As ever, we need your summaries! Starting to scrape the bottom of the
'most voted for' movies list now, but here are another 20 popular movies
which as yet don't have summaries.
Night Shift (1982)
Johnny Dangerously (1984)
Doctor Detroit (1983)
Taps (1981)
Hairspray (1988)
Force 10 from Navarone (1978)
Revenge of the Nerds II: Nerds in Paradise (1987)
River Runs Through It, A (1992)
Jabberwocky (1977)
Highlander III: The Sorcerer (1994)
Postman Always Rings Twice, The (1981)
Fly II, The (1989)
Police Academy 6: City Under Siege (1989)
Fly, The (1958)
House of the Spirits, The (1993)
Flamingo Kid, The (1984)
Grease 2 (1982)
Hackers (1995)
Action Jackson (1988)
Caddyshack II (1988)
by Col Needham
A new version of the complete database additions guide was published at
the end of August. A copy is available by sending e-mail to the IMDb
mail-server at with the subject: HELP ADD FULL
or alternatively via FTP:
ftp://uiarchive.cso.uiuc.edu/pub/info/imdb/tools/additions-guide.gz
or any of the other IMDb ftp sites.
There are several changes, but most notably a new policy on uncredited
appearances. All uncredited appearances must now be tagged with the
attribute (uncredited) whether it be a cameo from a major star in a
recent movie to a bit player in older movies where ususually only the
principal cast are credited. Use of this attribute will automatically
trigger the removal of the cast order number, thus fixing the problem
highlighted by Rod Crawford in the previous newsletter.
Here's the most popular searches people have done lately, based on total
pages for the week ending September 28.
Titles:
- 1. Independence Day (1996)
- - 2 Days in the Valley (1996)
- 3. Star Trek: First Contact (1996)
- 10. Star Wars (1977)
- 2. Striptease (1996)
- 5. The Rock (1996)
- 7. Twister (1996)
- - Trainspotting (1995)
- 4. Mission: Impossible (1996)
- 11. Braveheart (1995)
- 8. Heaven's Prisoners (1996)
- - The First Wives Club (1996)
- - Last Man Standing (1996/I)
- 12. Pulp Fiction (1994)
- - Naniwa Ereji (1936)
- - ...och alla dessa kvinnor (1944)
- 20. Terminator 2: Judgment Day (1991)
- 15. Jurassic Park (1993)
- 18. The Cool Surface (1994)
- - A Time to Kill (1996)
ID4
continues strong, though without the commanding lead it had last month.
Star Wars
climbs from number 10. Besides the usual new releases, surprising showings by
Naniwa Ereji
(aka Osaka Trilogy) and
a 1944 Swedish title.
People:
- 3. Teri Hatcher
- 5. Jenny McCarthy
- 4. Tom Cruise
- 2. Demi Moore
- 1. Pamela Anderson
- 11. Sandra Bullock
- 12. Bo Derek
- 8. Shannon Tweed
- - Alison Armitage
- 10. Mel Gibson
- 18. Alyssa Milano
- 7. Kim Basinger
- - Elizabeth Berkley
- 13. Helen Hunt
- 15. Brad Pitt
- 6. Groucho Marx
- - Heather Locklear
- - Michelle Pfeiffer
- - Sean Connery
- 20. Arnold Schwarzenegger
Margaret Colin
drops off the top 150 completely (how quickly they forget);
Jeff Goldblum
plummets to #102. And
Will Smith
may have helped him save the world, but he's already down to #39.
Kevin Costner,
Sharon Stone,
and
Jennifer Connelly
just missed the cut this month.
Michelle Pfeiffer's
latest movie opens in a couple weeks; should raise her standing even more.
At least
Sean Connery
gives hope to us balding males. Number 1
Hatcher
was visited about 5 times as often as
Arnold.
Joan Bud
(who?) was #30.
by Col Needham
Movies opening in the US in August/September sorted by number of
votes (to September 26th):
Movies opening in the US in August/September sorted by average
votes (to September 26th):
by Jon Reeves
Just a few of the traditional media outlets that have mentioned us lately:
The Net (US).
Newsday.
Yahoo! Internet Life (September *and* October).
Boston Globe.
Utne Reader.
Sight and Sound.
KFBK Radio, Sacramento.
P.O.V. Magazine.
I-way 500 (best Leisure site).
WebSight (months ago; we just found out).
BBC Radio 1.
Library Journal.
We're particularly proud of the review in Yahoo! Internet Life (Sept.),
where the two best known US movie reviewers, Roger Ebert and Gene Siskel,
both gave us a thumbs up.
We've also won several new awards. See selections from the gallery here.
NetBest Awards (finalist).
Awesome Universal t@p 500 WebSites.
Access to the World Cool Link of the Week.
Computer Currents Interactive Link of the Week.
Komputer Klinic Kool Site.
(WFMM) Cool Site o' the Day.
USA Today Hot Site.
P.O.V. Top 100 (#53).
I-way 500 (best Leisure site).
Top Shopping Site: All Internet Shopping Directory.
And a web-related mention of note: the hot100 list
shows us as the eighth hottest site on the whole net.
by Rob Hartill
Since the last newsletter, a lot of coffee has been consumed and in
between the trips to the kettle some new code has been added. Elsewhere
in this newsletter you can read about Michel Hafner's new fuzzy matching
code (written in C you know!, how did he slip that by my perl-only filter?).
Other changes include much more online checking of web based submission
to try to clean more of the data we receive before it even leaves your
web browser. All the extra checks and warnings might frustrate at first,
but we hope they remind you how best to submit clean data that can be
added sooner... all those warnings used to be fixed by us manually :-(
The old style quiz has been put to rest and replaced with a new quiz.
At the moment it comes in two varieties: (1) a name guessing, based on a
hangman like game and (2) a multiple choice quick quiz (in the same style
as the old quiz) with questions that will be designed to tease and
educate. If you have some devious
questions that you'd like to add to the quiz, send them to me please.
The main search form now allows
searching of 'business', 'goofs', 'technical' and 'trivia' under the
'word search' section, so now you can search for your favourite type
of goof or studio filming locations, etc.
Behind the scenes, our servers became HTTP/1.1 compliant thanks to our
developers version of Apache 1.2 and
our Perl became fuel injected thanks
to Doug MacEachern's "mod_perl_fast" Apache plug-in module that embeds
a Perl interpreter into our Apache server.
Look out for translated versions of key pages in the very near future.
Using Apache's language negotiation feature we'll soon be serving up
some pages in French, German and Italian (to begin with). For user with
browsers capable of specifying a preferred language (e.g. Netscape 3
for Win/Mac [not for Unix! sigh]) the new pages will magically appear
if you prefer a language other than English and if we have a translation
available. Everyone else will continue to see English.
by Lachlan Wetherall
Since the last newsletter, versions 1.1 and 1.2 of xregal have
been released. Xregal is an X11 hypertext interface for the Internet Movie
Database when it is installed locally on a Unix host. Apart from a
number of bug fixes the main features added from version 1.0 to 1.2 are:
- The display now looks a lot nicer, with centered titles, automatically
wrapped paragraphs, hanging indents, bulleted lists, and italicized
subheadings.
- There are many more options for changing the appearance and behaviour of
xregal through command line parameters and/or X reseources.
- Text can be selected with mouse button 1, in order to paste data to other
X applications.
- It is now possible to save an ASCII version of the displayed data to a file.
- There is a new 'search for character name' menu option.
For a full list of changes, consult the ChangeLog file.
Version 1.2 of xregal requires the moviedb3.2g package to be installed
first. Both xregal and movidedb3.2g are available from the usual IMDB
ftp sites:
ftp://uiarchive.cso.uiuc.edu/pub/info/imdb/tools
ftp://ftp.digital.com.au/pub/imdb/tools
The latest development version of xregal is always available from the
xregal home page.
If you have any suggestions on improvements for xregal,
drop me an e-mail.
Bug reports are especially welcomed and acted on speedily.
by Jon Reeves
This is a regular section giving information about the current size
and growth of the IMDb. We receive between 30,000 and 40,000 additions
every week from users all over the world.
Number of filmography entries: 1,194,654
Number of people covered: 337,775
Number of movies covered: 84,196
Size of the database (Mb): 97
Recent milestones:
- Over 1,000,000 lines of additions (all categories) for the year
- Over 500,000 actor entries
- Over 1000 titles with MPAA ratings reasons
- Over 2000 mini-biographies
- Over 3000 titles on the business-info list
- Over 15,000 plot summaries
- Over 25,000 running time entries
- Over 35,000 language entries
- Genres for over 41,000 movies
- Over 80,000 movies
- Over 100,000 titles (including aka's and TV series)
This is a regular section listing some enhancements we're currently
looking at. Please bear in mind that some of these may take quite
a while to come to fruition or even fail to materialize because the
original volunteer decides not to proceed.
- Washed Update: Greg Bulmash's column on forgotten stars from
the '70s will soon be available first on the IMDb web site.
- a list of other companies; currently, except for production and
special effects companies, the database records only the names of
individual people.
- outline list: a "one line" plot summary, short enough to display
on the main title page.
- a "crew completion" list, similar to the cast completion list.
- a list of "influential scenes"... the scenes that launched a thousand
spoofs, became the director's trademark, changed cinema forever,
launched a star.
- a separate list of films in production, with their current status.
- a complete rewrite of the additions interface. The survey results
suggest that many people struggle with the clumsy interface currently
in place. Taking the comments from the survey and our own ideas we
are completely rewriting the additions interface. This is a *major*
undertaking and will take some time to complete. We're confident
the results will be worth the wait!! :-)
- full support for accented characters (ISO 8859-1) without losing
people that can't type them. Implementation in progress.
- proper handling of writer credit order.
- a locally installable MS-Windows interface to the database is
under final testing for those of you who want to reduce your
phone bills!
- enhanced awards section for the database covering more
international festivals, national film institutes etc.
- general support for alternate titles in languages other than
English and the language of the original country.
- a movie recommendation service that will use your vote records to
suggest other movies you might enjoy. Initially available via an
E-mail interface. Time to check you're up-to-date with your voting!
Academy Awards and Oscar are registered trademarks of the Academy of Motion
Picture Arts and Sciences.
|