User:MosheZadka/Thoughts

I've been on wikiquote for a while. I've created quite a few pages, edited many, deleted more than I wanted to. I've thought a lot about how wikiquote works.

  • Who are our competitors? Even Bartlett's (our classic competitor) and certainly all quotation sites, use a database schema on the order of (Quote, Origin, Keywords) [This is unnormalized SQL, deal] [Yes, it's possible to have paper databases. We call them "indices".]
    • Contrast: Wikipedia's competitors are basically long reams of text, with fairly little meta-data. Just the category system makes Wikipedia better than Britannica in the organizational respect. Wikibook's competitors are also long reams of text. They have even less organization, on average.

So it seems Wikiquote uses a fairly different style than its competitors.

  • Advantages:
    • Freedom: With just wide community approval (or even without, in experimental stages) we can add notes, note types, formatting options. We used this to invent the Sourced/Attributed/Misattributed dichotomy, to add to it "Quotes about" (while we still debate the correct heading) and so on.
  • Disadvantages:
    • Organization is a manual issue.

How can we keep the advantages and make the disadvantages less hurtful? Adopting standard formatting. This is why I've been pushing for heavy-handed standardization. The goal is to make it possible to write a program which will do something like:

  • Read all pages: remove stub notices, ignore pages with other types of maintenance tags.
  • Use a heuristic to figure out what the page style is: hopefully, the heuristic will be able to use just the category system. [This is why categories are important to me.]
  • Parse a page into individual quotations. [The formatting must be standard enough for a dumb program.]
  • Spit out whatever we want it to: fortune file, SQL dumps, XML dumps etc.
    • For example, it might be nice to spit out "fortune file with all quotes from people or books, with added quotation marks and attribution, but only those which are sourced." Or fortune file with "All quotes from TV episodes which have less than 4 lines of dialogue."

Wikiquote is not at a level at which this is possible, but it is close. I think it would be nice to bring it closer.