News
MathML on the Clipboard
One of the more interesting applications coming with Windows 7 is the Math Input Panel. This is designed for pen input on a tablet-style device and performs pretty impressively accurate recognition of mathematical expressions. While designed for a tablet, it also works pretty well if you are just “writing” the expression with a finger on a small laptop trackpad, which is how I have been using it.
The Math Input Panel is designed with a very simple interface with virtually no customisation options. It offers no way of saving the expressions generated and just offers a simple insert button that tries to insert the math expression at the insertion point in a currently open application. This works well for Word 2007 which accepts MathML from the clipboard and transparently converts it to its internal form and renders it, but other more generic tools such as XML editors that could use the MathML do not accept MathML from the clipboard in this way. Unlike MathPlayer or Word, The Math Input Panel doesn't offer fallback text representations of the XML markup on the clipboard. Marko Panic, the program manager for the development of this tool confirmed to me that this was a design decision as they didn't want the end user to be faced with raw XML. This is not unreasonable but not what I wanted personally (I like to see my XML raw:-). Marko confirmed that the MathML is on the clipboard and it should be possible to extract it with a few lines of code, or if I wanted a more extensive customisation there was documentation of the API offered by the underlying DLL available at
http://msdn.microsoft.com/en-us/library/dd317324(VS.85).aspx
and
http://msdn.microsoft.com/en-us/library/dd317311(VS.85).aspx.
I decided to brush up my C# forms programming and produced a small form that shows any MathML on the clipboard. The main code (everything apart from the boilerplate Visual Studio files) is available on google code While it's particularly useful to see the MathML generated by the Math Input panel, it also works with other applications, notably MathPlayer and Word, that place MathML on the clipboard.
While looking via Google for some programming tips on my form, I came across a very similar blog posting from last year. That form had some differences though (displaying the IE folding tree view of the XML) so I completed my form here. The screenshot shows the Math Input Panel interpreting my appalling handwriting, and the mmlclipboard form displaying the generated MathML.
QMHE 2010 - the 6th International Seminar - Quality Management in Higher Education
TICE 2010 - 7th Conference Information and Communication Tools for learning and training
4th Baltic Young PhD Conference on Learning in Networks Collocated with the BaSoTI 2010 Summer School
DeLFI 2010 - Achte e-Learning-Fachtagung Informatik der Gesellschaft für Informatik
Gemeinsame Fachkonferenz Interaktive Kulturen
Concurso EUROSCOLA 2010 - XVI EDICIÓN
Workshop: Group awareness in knowledge convergence and activity organization in CSCL
ICL2010 - 13th International Conference on Interactive Computer aided Learning, Academic and Corporate E-Learning in a Global Context
2010-01-18: Prof. Sherry Mantyka will visit ActiveMath from March 15 to April 2, 2010
2009-12-24: Sergey has received EU Marie Curie International Incoming Fellowship
2009-10-15: Visiting Researcher from Russia
2009-10-01: A new researcher joined ActiveMath lab
Y a mí que no me parecen bien algunas webs de enlaces…
Tenía esta entrada en la cabeza desde hace tiempo. Tanto como el que ha pasado desde que suscribí el manifiesto en defensa de los derechos fundamentales en internet (y de esto hace ya cerca de mes y medio)… Finalmente, hoy me decido a hablar de lo que creo que debería ser castigable en la red. Vayamos por pasos…
En primer lugar, no tengo nada en contra del P2P, principalmente por dos motivos:
- Quien comparte algo, lo que sea, digital, en una red P2P, no lo hace para lucrarse (y, en la práctica, invierte en el esfuerzo un ancho de banda de subida que, en este país al menos, se paga a precio de oro). [De hecho, sí hay quien intenta sacar tajada: los que comparten archivos con contraseña e intentan obtener un rescate por esta... pero la 'comunidad' ya se encarga de 'lincharles' adecuadamente (o al menos lo hacen las comunidades por las que me muevo/he movido)]
- Si bien opino que las discográficas y distribuidoras de cine pierden ingresos a través del P2P,
- también estoy seguro de que nadie se cree sus cifras de pérdidas (al fin y al cabo, si Pixbox ofrece todo su catálogo por 6 euros al mes, difícilmente va a poder defender la industria que nadie que se descargue música le perjudique en más de esos 6 euros mensuales, a no ser que demanden de la misma forma a Pixbox, menos la tajada que se lleven)
- los que me preocupan son los creadores, no los intermediarios. Y a los creadores no parece que les vaya tan mal, últimamente
- a pesar de que a las industrias del disco y el DVD no les guste acordarse de ello, hay industrias que sufren más los efectos de la ‘piratería’: como mínimo la industria del videojuego y la subindustria de la triple equis. Y curiosamente a estos no se les oye escudarse en la pobre excusa del P2P para solicitar la ayuda de las arcas públicas ni de del ejecutivo, el legislativo ni el judicial: dedican sus esfuerzos, de manera bastante más inteligente, a buscar nuevos canales de distribución, nuevos modelos de negocio… y a perseguir a los piratas industriales.
Y ahí es donde me duele el tema de las webs de enlaces (que, como recordaba Miquel Peguera, no son delito, y seguirán sin serlo mientras no se cambie la legislación española sobre propiedad intelectual).
- Las webs de enlaces no son P2P: son una cosa centralizada, nada de entre iguales, tienen un responsable o responsables.
- En las webs de enlaces sí hay lucro (o, como mínimo, sí es fácil ver cómo puede haberlo).
- Ningún usuario de P2P le puede hacer suficiente daño a la industria como para que esta se inmute, pero la acción de una web de enlaces sí (o al menos eso cree aquí su humilde y poco informado servidor).
¿Todas las webs de enlaces son, por tanto, tan nocivas como para merecer el cierre administrativo? No, desde luego que no. Para comenzar, es esencial respetar los derechos que nos garantiza la Constitución y el resto de leyes en vigor. Y nada que implique el cierre de una web debería hacerse sin pasar por el sistema judicial. Naturalmente. A pesar de lo cerriles (tercera acepción del DRAE) que puedan resultar determinados legisladores. Y exaltarse porque alguien pueda intentar colar algo así en una ley presuntamente inofensiva me parece muy natural.
Ahora bien, no sé quién dijo que si había que elegir entre la incompetencia o la mala fe cuando algo parece hecho con muy mala baba uno debía inclinarse siempre por la primera opción, pero tenía muchísima razón. En este caso, no lo dudo, había una dosis más que notable de mala fe, puesta por el ‘lobby’ de las “industrias culturales” (si esas dos palabras juntas no son el mejor ejemplo posible de oxímoron, no sé cuáles pueden serlo (estoy seguro de que existen militares inteligentes)). Pero esa era la mala fe (y la ignorancia necesaria) de intentar acabar con el P2P, no la de atentar contra la libertad de expresión: que el redactado del celebérrimo “Anteproyecto de Ley de Economía sostenible” permita usarlo para atentar contra ese derecho fundamental es un accidente motivado por la incompetencia de (¿casi?) todos los implicados en el desaguisado. Sé perfectamente que es una cosa no demostrable (los culpables serán los primeros en defender su competencia, demostrando por el camino su falta de ella), pero como todo el mundo tiene derecho a una opinión, yo me reservo la mía ;-).
Y entonces… ¿cómo lo resolvemos? Confesando de nuevo mi desconocimiento casi total de la materia (que me temo que no es mucho mayor que el de muchos de los que han dado ya su opinión sobre el tema, especialmente aquellos que han hecho mucho ruido) a mí me atrae poderosamente el concepto de “safe harbor” que se incluye en el título segundo de la muy criticada (con razón) Digital Millennium Copyright Act, que protege a los prestadores de servicios de la legislación si se comprometen a comportarse como ‘puertos seguros’ y bloquean de manera diligente los contenidos que infringen la legislación sobre propiedad intelectual al ser notificados de tal infracción (con las esperables garantías para poder alegar). Introduciendo [bien] algo así en la legislación española, las webs de enlaces se dividirían rápidamente en las ‘especialistas en materiales más allá de la legislación de propiedad intelectual’ (que estarían jugando con fuego) y el resto del mundo (permitan que opine, de nuevo, que el resto del mundo se iba a demostrar muy escaso). Y a la industria le bastaría, para amargar la vida del webmaster de turno, con apostar a un francotirador (sirve un administrativo mileurista medianamente formado) sobre la tecla de F5 del navegador: nuestro hipotético webmaster no tiene un pelo de tonto y sabe bien cuándo el ‘torrent’ de turno es el último disco de Alejandro Sanz (y, por tanto, le conviene retirar el enlace a la voz de ya) y cuándo se trata de un material potencialmente más nocivo pero más allá del alcance de las leyes del copyright.
Una legislación así (esto es, ilegalizando cierto tipo de webs de enlaces y protegiendo los “puertos seguros”) no iba a parar el P2P (he dicho ya que no tengo nada en contra de este, me parece recordar), ni [suponiendo una buena redacción y su posterior buena aplicación, que no es poco suponer] tampoco atentaría contra la libertad de expresión. Pero a los “piratas industriales” sí les iba a desinflar el negocio. Y eso, qué quieren que les diga, no me parece mal…
RichEdit Versions 1.0 through 3.0
Digging through old doc files, I ran across the following summary of RichEdit up through Version 3.0. It’s more detailed than my post on RichEdit Versions, so it might be of interest to history buffs, anyhow. And it does describe the riched20.dll that still ships with Windows, mostly for purposes of backward compatibility. I wrote this document back in 1998 in preparing for an internal seminar on RichEdit 3.0. It even mentions that RichEdit 3.x would be an ideal development environment for WYSIWYG editing of built-up mathematical expressions! Sure hit that nail on the head. Naturally the statement “there are three main versions of RichEdit” is quite out of date.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
What is RichEdit?There are three main versions of RichEdit: 1.x, 2.x, and 3.0. Since all are being used, it makes sense to group the RichEdit features as they were introduced by these three versions. In general, RichEdit adds selective character and paragraph formatting along with embedded objects to the plain text editing facilities well-known in system edit controls.
A RichEdit instance consists of a single story, galley-like text that can be exported and imported using plain text or RTF. Each version of RichEdit is a superset of the preceeding one, except that only FE builds of RE 1.0 have a vertical text option (a relatively elegant vertical option could be added to RE 3.0 if there’s sufficient demand).
RichEdit 1.0 was originally developed for rich-text email. Major differences between the various builds of RE 1.x and RE 2.0 are that the latter is based on Unicode, is a single world-wide binary (not including BiDi, Thai or Indic scripts), has multilevel undo, has a powerful set of com interfaces, and is substantially more Word compatible. RE 2.1 adds BiDi capabilities.
Major differences between RE 2.x and RE 3.0 include the latter's better performance, richer text, outline view, zoom, font binding, more powerful IME support, and rich complex script support (BiDi, Indic, and Thai). RE 3.0 is a single, scalable, world-wide binary that offers high performance and substantial Word compatibility in a small package.
RichEdit 2.0 also includes simpler plain-text and single-line controls. RE 3.0 adds rich/plain ListBox and ComboBox controls.
RichEdit 1.0 Features1. Text Entry/Selection. Mostly standard (system-edit control) selection and entry of text. Selection bar support. Word-wrap and auto-word-select options. Single, double, and triple click selection.
2. ANSI (SBCS and MBCS) editing. No Unicode
3. Basic set of character/paragraph formatting properties
4. Character formatting properties: font facename and size, bold, italic, solid underline, strikeout, protected, link, offset, and text color.
5. Paragraph formatting properties: start indent, right indent, subsequent line offset, bullet, alignment (left, center, right), and tabs.
6. Find forward: includes case-insensitive and match-whole-word options.
7. Message-based interface: almost a superset of the system edit-control message set plus the two OLE interfaces, IRichEditOle and IRichEditCallback.
8. OLE embedded objects: requires client collaboration based on IRichEditOle and IRichEditCallback interfaces.
9. Right-button menu support: needs IRichEditOleCallback interface.
10. Drag & Drop editing.
11. Notifications: WM_COMMAND messages sent to client plus a number of others. Superset of common-control notifications
12. Single-level undo/redo.
13. Simple vertical text (Far East builds only)
14. IME support. (Far East builds only)
15. WYSIWYG editing using printer metrics. This is needed for WordPad, in particular.
16. Cut/Copy/Paste/StreamIn/StreamOut with plain text (CF_TEXT) or RTF with and without objects.
17. C code base
18. Different builds for different scripts.
RichEdit 2.x Additions1. Unicode. Big effort needed to maintain compatibility with existing nonUnicode documents, i.e., ability to convert to/from nonUnicode plain and rich text. Substantial effort needed to run correctly on Win95.
2. General international support. General line breaking algorithm (extension of Kinsoku rules), simple font linking, keyboard font switching.
3. FE support. E.g., Level 2 and 3 IME support
4. Find Up as well as down.
5. BiDi support (RichEdit 2.1)
6. Multilevel undo. Extensible undo architecture that allows client to participate in app-wide undo model.
7. Magellan mouse support
8. Dual-font support. Keyboard can automatically switch fonts when active font is inappropriate for current keyboard, e.g., Kanji characters in Times New Roman.
9. Smart font apply. Font change request doesn’t apply Western fonts to FE characters.
10. Improved display. An off-screen bitmap is used when multiple fonts occur on the same line. This allows, for example, the last letter of the word “cool” not to be chopped off.
11. Transparency support. Also in windowless mode.
12. System selection colors. Used for selecting text
13. AutoURL recognition
14. Word edit UI compatibility. Selection, cursor-keypad semantics.
15. Word standard EOP (end-of-paragraph mark: CR). Can also handle CRLF
16. Plain-text controls as well as rich-text. Single character format and single paragraph format.
17. Single-line controls as well as multiline. Truncate at first end-of-paragraph and no word wrap.
18. Accelerator and Password Controls.
19. Scalable architecture to reduce instance size.
20. Windowless operation and interfaces (ITextHost/ITextServices). Added primarily for Forms^3.
21. Com dual interfaces: TOM (Text Object Model)). This powerful set of interfaces is described separately.
22. CHARFORMAT2. Added font weight, background color, locale ID, underline type, superscript/subscript (in addition to offset), disabled effect. For RTF roundtripping only, added amount to space between letters, twip size above which to kern character pair, animated-text type, various effects: font shadow/outline, all caps, small caps, hidden, embossed, imprint, and revised.
23. PARAFORMAT2. Added space before/after and Word line spacings. For RTF roundtripping only, added shading weight/style, numbering start/style/tab, border space/width/sides, tab alignment/leaders, various Word paragraph effects: RTL paragraph, keep, keep-next, page-break-before, no-line-number, no-widow-control, do-not-hyphenate, side-by-side.
24. More RTF roundtripping. All of Word’s FormatFont and FormatParagraph properties.
25. Improved OLE support.
26. Code Stability and stabilization. E.g., parameter and object validation, function invariants, re-entrancy guards, object stabilization, etc.
27. Strong testing infrastructure including extensive regressions tests and Genesis testing. Shipped with no priority 1 or 2 bugs and not many postponed bugs.
28. Improved Performance. Smaller working set, faster load and redisplay times, etc.
29. C++ code base. The code is written in C++. Provided a solid foundation on which to build RichEdit 3.0.
RichEdit 3.0 Feature Additions1. Zoom. The zoom factor is given by ratio of two longs.
2. Paragraph numbering (single-level). Numeric, upper/lower alphabetic or Roman numeral.
3. Simple tables (no wrap inside cells). Limited UI: no resizing, but can delete/insert rows. With LineServices, can align columns centered, flush right, and decimal. Cells are simulated by tabs, so text tabs and carriage returns are replaced by blanks.
4. Normal and heading styles. Built-in normal style and heading styles 1 through 9 are supported by the EM_SETPARAFORMAT and TOM APIs.
5. Outline view (similar to Word’s). Supports normal style and headings 1 through 9. Can collapse to heading level n, promote/demote headings/text, move paragraphs up/down. Can persist collapse status.
6. More underline types (dashed, dash-dot, dash-dot-dot, dot)
7. Underline coloring. Underlined text can be tagged with one of 15 document choices for underline colors.
8. Hidden text. Marked by CHARFORMAT2 attribute. Handy for roundtripping of information that ordinarily shouldn’t be displayed.
9. More default hot keys, which act as Word’s default hot keys act. E.g., European accent dead keys (US keyboards only) and outline-view hot keys. Number hot key (Ctrl+L) cycles through numbering options available, starting with bullet.
10. Smart-quotes (toggled on/off by Ctrl+") for US keyboards.
11. Soft hyphens. (0xAD in plain text; \- in RTF).
12. Italics Caret/Cursor. Also hand cursor over URLs.
13. LineServices Option: RichEdit 3.0 can use Office’s LineServices component for line breaking and display. This elegant option was added primarily to facilitate handling complex scripts (BiDi, Indic, and Thai). In addition a number of improvements occur for simple scripts, e.g., center, right, and decimal tabs, fully justified text, underline averaging giving a uniform underline even when adjacent text runs have different font sizes. It opens the door to incorporating LineServices FE enhancements, such as Ruby, Warichu, Tatenakayoko, and vertical text. LineServices also paves the way for WYSIWYG editing of built-up mathematical expressions and RichEdit 3.x looks like the ideal development environment for this.
14. Complex Script Support: RichEdit 3.0 will support BiDi (text with Arabic and/or Hebrew mixed with other scripts), Indic (Indian scripts like Devangari), and Thai. For support of these complex scripts, the LineServices and NT Uniscribe components are used, which run on Win95 and later OSs.
15. Font binding: RichEdit 3.0 will automatically choose an appropriate font for characters that clearly do not belong to the current charset stamp. This is done by assigning charsets to runs and associating fonts with those charsets. Please see the section on Font Binding below.
16. Charset-specific plain-text read/write options, notably ability to read a file using one charset and write it with a different one.
17. UTF-8 RTF. Used preferentially for cut/copy/paste and optionally externally, this file format is substantially more compact than ordinary RTF, faster, and is completely faithful to Unicode.
18. Office 9 IME support (MSIME98). This more powerful IME capability has been factored out into an independent module (see RichEdit Architectural Improvements). Features include:
a. Reconversion - In the past, the user needs to delete the final string first and then type in a new string to get to the correct candidate. This feature enables the user to convert the final string back to composition mode, allowing easy selection of a different candidate string.
b. Document feed - This feature provides IME98 with the text for the current paragraph, which helps IME98 to do more accurate conversion during typing.
c. Mouse Operation - This feature allows the user to have better control over the candidate and UI windows during typing.
d. Caret position - This feature provides the current caret and line information, which IME98 uses to position UI windows (e.g., candidate list).
19. AIMM support. Users can invoke the IE/AIMM object, which enables users to enter Far East characters on US systems (NT4.0 & Win95).
20. More RTF round tripping.
21. Improved 1.0 compatibility mode, e.g., MBCS to/from Unicode character-position (cp) mappings. Is being used to emulate RE 1.0 in NT 5.
22. Increased Freeze Control. The display can be frozen over multiple API calls and then unfrozen to display the updates.
23. Increased Undo Control. Undo can be suspended and resumed (needed for IME).
24. Increase/Decrease Font Size. Increases or decreases font size to one of six standard values (12, 28, 36, 48, 72, 80 pts).
RichEdit 3.0 Architectural Improvements1. Input module: IME has been factored out into separate generally usable input module that supports the latest Office 9 IMEs. RichEdit 3.0 itself knows nothing of IMEs! In principle other IME clients can use this input module. Did need to add some methods to RichEdit’s object model (the approach is discussed in a separate section).
2. Virtual Win32 Environment: OS-dependent calls have been separated out into a class of their own. RE 3.0 works in a virtual Win32 with some multilingual enhancements. Most calls are static, so no runtime overhead is encountered. Facilitates building RichEdit with different OSs, e.g., Windows CE.
3. Factored Rich Text status: allows aspects of rich text to be used with plain-text semantics. E.g., multiple fonts, coloring, and underlining. Useful for font binding and IME highlighting. Plain text UI remains the same, so EM_SETCHARFORMAT and EM_SETPARAFORMAT apply to whole control.
4. Dual Line Methods. Lines can be broken, queried, and displayed with or without LineServices. Simple text can be handled with small instance size and higher speed. More sophisticated text can use the elegant LineServices component.
RichEdit 3.0 Performance Improvements and Maintenance1. Many performance/size improvements.
a) reduced size of (to 1/3) and generalized internal versions of RichEdit 2.0’s character and paragraph formatting structures (CHARFORMAT2 and PARAFORMAT2). Easy to add properties to these important structures, although the additions typically won't be available to the message interface.
b) reduced size of many other structures as well.
c) declared constant data structures const, so that they are included in the code segment and are shared by all active processes.
d) reduced the number of system calls by more caching of frequently used data
e) eliminated redundant code.
2. Faster startup time: most initialization is postponed to the creation of the first control. C runtime is no longer needed.
3. Cleaned up code base. Used the same notation (Hungarian, etc.) for local variable names throughout. Added many new comments and improved many old comments. Counts are now LONGs rather than the nefarious DWORDs, which might be described as “wishful thinking”! Eliminated evolutionary dead code. Simplified C++ model: no more multiple inheritance and almost no operator overloading (except for new and assignment).
4. Numerous bug fixes. Eliminated some memory leaks and reference counting errors. Fixed various bugs postponed from RichEdit 2.x.
RichEdit 3.0 Rich System Controls1. System edit-control mode that emulates the OS edit controls more accurately.
2. ListBox and ComboBox controls similar to system versions, but supporting Unicode and font binding on Win95 as well as on NT. These controls can be made rich, opening the door to substantially more elegant dialogs.
What RichEdit 3.0 Isn't1. Native HTML control. There are HTML « RTF converters that can be used with RichEdit. There’s the Trident control, which is substantially bigger.... We have a prototype for direct HTML I/O that uses the TOM interfaces, but it hasn’t been tested adequately for general use. This prototype only roundtrips HTML that RichEdit understands.
2. Active X control. We have a prototype RichEdit Active X control (ATL), but it too hasn’t undergone testing. Note there is a RichEdit 1.0 Active X control and in the future there may be a VB control based on RichEdit 3.0.
3. MFC RichEdit class. Note there is a RichEdit 1.0 MFC class.
4. Multistory editor (like Word). Each RichEdit instance corresponds to a single story. Word has many stories, e.g., body text, header, footer, footnote, textbox. A RichEdit instance can be used for any one of those, but to handle more, you need one instance for each story.
RichEdit ClientsRichEdit Client
Version
Office 97 SDM
2.x
Office 9 SDM (3.0)
3.0
Office Binder
2.0, 3.0
Office 9 Command Bars (3.0)
3.0
Word 97 (non-SDM dialogs)
2.x
Default Exchange Client
1.0
Outlook 97 body/to/from/subject/notes
2.x
Outlook 9 body/to/from/subject/notes
3.0
Pocket Word 2.0
3.0-
WordPad (Win95)
1.x
WordPad (Win98)
2.0
WordPad (NT 5.0)
3.0
MFC RichText Control
1.0
VB RichText Control
1.0
Forms^3 97 edit engine
2.0
Forms^3 9 edit engine
3.0
Layout Control Pack for IE
2.0
FrontPage source viewer
2.0
Windows SDK
1.0
Project 98
2.0
Publisher 98
???
Comic Chat
1.0?
How Create a RichEdit Instance (1)
HRESULT hRE = LoadLibrary("RICHED20");
hwndRE = CreateWindow(TEXT("RichEdit20W"), TEXT(""),
dwStyle,
rc.left, rc.top,
rc.right - rc.left, rc.bottom - rc.top, hwndParent,
NULL, hinst, NULL);
... // Send messages to hwndRE
FreeLibrary(hRE);
How Create a RichEdit Instance (2)A RichEdit control is based on an ITextHost object interacting with an ITextServices object. The latter doesn’t have a window of its own. The CreateWindow() call above creates an ITextHost object, which, in turn, creates an ITextServices object.
Alternatively, you can create an ITextHost object directly that, in turn, creates as many ITextServices objects as you desire. This is the way Forms^3 uses RichEdit for dialogs. It’d also be a great way to make a table object, for which each cell would have its own ITextServices object.
The way to create an ITextServices object is to call the function (it’s a bit complicated, since it allows the object to be aggregated)
STDAPI CreateTextServices(
IUnknown *punkOuter, // Outer unknown, may be NULL
ITextHost *phost, // Client's ITextHost; must be valid
IUnknown **ppUnk); // Private IUnknown of text services engine
For example,
if(FAILED(CreateTextServices(NULL, this, &pUnk)))
return FALSE;
hr = pUnk->QueryInterface(IID_ITextServices, (void **)&_pserv);
pUnk->Release();
You can then use the the _pserv pointer to call any ITextServices method, including TxSendMessage(), which is a faster way to send messages to the control than the system SendMessage(). But warning: CreateWindow() and the usual message interface is substantially easier to implement, since you don’t have to create an ITextHost object. As shown below, if all you want to do ist to use some ITextServices methods, you can get an ITextServices interface to a control created by CreateWindow().
How to use RichEditThere are five main ways to use a RichEdit 2.x or 3.0 control:
1. Messages
2. ITextServices methods
3. Keyboard input including cut/copy/paste
4. File read/write (plain text or RTF)
5. TOM (Text Object Model) methods
The most familar ways (messages and keyboard) are useful, but may not have the performance or functionality that you need. We describe each of these approaches in the remainder of this talk.
For ordinary keyboard input (not IME), RichEdit acts very similarly to Word. Word has more hot keys, but the cursor keypad and letter/punctuation keys work essentially the same way. Ditto for mouse operations.
RichEdit Message InterfaceThere are many RichEdit messages. In addition to the system edit control messages defined in winuser.h, there are many new messages defined in richedit.h. All edit messages handled by RichEdit (specifically by ITextServices::TxSendMessage()) are listed below. System edit and RichEdit 1.0 messages are defined in the system SDK. RichEdit 2.0 and 3.0 messages aren’t documented in my copy of the SDK, but should be documented on http://richedit sometime soon, and in the SDK sometime later. Note that a number of RichEdit 1.0 messages have been generalized in later versions. E.g., EM_STREAMIN/OUT take an optional codepage value (which can be 1200, i.e., Unicode, or CP_UTF8, i.e., UTF-8). RichEdit only understands enough about IME messages to know to invoke the IME input module (see Input Module). Hence not all IME messages are listed below.
System edit control messages not handled by RichEdit
EM_GETHANDLE EM_SETHANDLE
EM_FMTLINES EM_SETTABSTOPS
WM_GETFONT
System edit control messages handled by RichEdit
EM_GETFIRSTVISIBLELINE EM_GETLINE
EM_GETLINECOUNT EM_GETMODIFY
EM_GETSEL EM_GETTHUMB
EM_GETWORDBREAKPROC EM_LIMITTEXT
EM_LINEFROMCHAR EM_LINEINDEX
EM_LINELENGTH EM_LINESCROLL
EM_REPLACESEL EM_SCROLL
EM_SETMODIFY EM_SETSEL
EM_SETTARGETDEVICE EM_SETWORDBREAKPROC
EM_UNDO
WM_CHAR WM_CLEAR
WM_CONTEXTMENU WM_COPY
WM_CUT WM_DESTROYCLIPBOARD
WM_DROPFILES WM_ERASEBKGND
WM_GETTEXT WM_GETTEXTLENGTH
WM_HSCROLL WM_IME_CHAR
WM_INPUTLANGCHANGE WM_INPUTLANGCHANGEREQUEST
WM_KEYDOWN WM_KEYUP
WM_KILLFOCUS WM_LBUTTONDBLCLK
WM_LBUTTONDOWN WM_LBUTTONUP
WM_MBUTTONDBLCLK WM_MBUTTONDOWN
WM_MBUTTONUP WM_MOUSEACTIVATE
WM_MOUSEMOVE WM_MOUSEWHEEL
WM_NCMBUTTONDOWN WM_PASTE
WM_RBUTTONDBLCLK WM_RBUTTONDOWN
WM_RBUTTONUP WM_RENDERALLFORMATS
WM_RENDERFORMAT WM_SETFOCUS
WM_SETFONT WM_SETTEXT
WM_SETTINGCHANGE WM_SIZE
WM_SYSCHAR WM_SYSCOLORCHANGE
WM_SYSKEYDOWN WM_TIMER
WM_UNDO WM_VSCROLL
RichEdit 1.0 messages
EM_CANPASTE EM_CHARFROMPOS
EM_DISPLAYBAND EM_EXGETSEL
EM_EXLIMITTEXT EM_EXLINEFROMCHAR
EM_EXSETSEL EM_FINDTEXT
EM_FINDTEXTEX EM_FINDWORDBREAK
EM_FORMATRANGE EM_GETEVENTMASK
EM_GETCHARFORMAT EM_GETLIMITTEXT
EM_GETOLEINTERFACE EM_GETOPTIONS
EM_GETPARAFORMAT EM_GETSELTEXT
EM_GETTEXTRANGE EM_GETWORDBREAKPROCEX
EM_HIDESELECTION EM_PASTESPECIAL
EM_POSFROMCHAR EM_REQUESTRESIZE
EM_SCROLLCARET EM_SELECTIONTYPE
EM_SETBKGNDCOLOR EM_SETCHARFORMAT
EM_SETEVENTMASK EM_SETOLECALLBACK
EM_SETOPTIONS EM_SETPARAFORMAT
EM_SETTARGETDEVICE EM_SETWORDBREAKPROCEX
EM_STREAMIN EM_STREAMOUT
RichEdit 2.0 messages
EM_SETUNDOLIMIT EM_REDO
EM_CANREDO EM_GETUNDONAME
EM_GETREDONAME EM_STOPGROUPTYPING
EM_SETTEXTMODE EM_GETTEXTMODE
EM_AUTOURLDETECT EM_GETAUTOURLDETECT
EM_SETPALETTE EM_GETTEXTEX
EM_GETTEXTLENGTHEX EM_SHOWSCROLLBAR
EM_FINDTEXTW EM_FINDTEXTEXW
Far East specific messages (some are RE 1.0)
EM_GETPUNCTUATION EM_SETPUNCTUATION
EM_GETWORDWRAPMODE EM_SETWORDWRAPMODE
EM_GETIMECOLOR EM_SETIMECOLOR
EM_GETIMEOPTIONS EM_SETIMEOPTIONS
EM_GETLANGOPTIONS EM_SETLANGOPTIONS
EM_CONVPOSITION EM_GETIMECOMPMODE
RichEdit 3.0 messages
FE messages
EM_GETIMEMODEBIAS EM_SETIMEMODEBIAS
EM_RECONVERSION
BiDi specific messages
EM_GETBIDIOPTIONS EM_SETBIDIOPTIONS
Extended edit style specific messages
EM_GETEDITSTYLE EM_SETEDITSTYLE
Outline view message
EM_OUTLINE
Message for getting and restoring scroll pos
EM_GETSCROLLPOS EM_SETSCROLLPOS
Zoom and increment/decrement fontsize
EM_GETZOOM EM_SETZOOM
EM_SETFONTSIZE
LineServices messages
EM_GETTYPOGRAPHYOPTIONS EM_SETTYPOGRAPHYOPTIONS
RichEdit RTF
The RTF control words recognized by RichEdit are given below. Not all of these control words are fully implemented, but almost all are round tripped.
adeff, animtext, ansi, ansicpg, b, bgbdiag, bgcross, bgdcross, bgdkbdiag, bgdkcross, bgdkdcross, bgdkfdiag, bgdkhoriz, bgdkvert, bgfdiag, bghoriz, bgvert, bin, blue, box, brdrb, brdrbar, brdrbtw, brdrcf, brdrdash, brdrdashsm, brdrdb, brdrdot, brdrhair, brdrl, brdrr, brdrs, brdrsh, brdrt, brdrth, brdrtriple, brdrw, brsp, bullet, caps, cbpat, cell, cellx, cf, cfpat, clbrdrb, clbrdrl, clbrdrr, clbrdrt, collapsed, colortbl, cpg, cs, deff, deflang, deflangfe, deftab, deleted, dibitmap, disabled, dn, embo, emdash, emspace, endash, enspace, emdash, expndtw, f, fbidi, fchars, fcharset, fdecor, fi, field, fldinst, fldrslt, fmodern, fname, fnil, fonttbl, footer, footerf, footerl, footerr, footnote, fprq, froman, fs, fscript, fswiss, ftech, ftncn, ftnsep, ftnsepc, green, header, headerf, headerl, headerr, highlight, hyphpar, i, impr, info, intbl, keep, keepn, kerning, lang, lchars, ldblquote, li, line, lnkd, lquote, ltrch, ltrdoc, ltrmark, ltrpar, macpict, noline, nosupersub, nowidctlpar, objattph, objautlink, objclass, objcropb, objcropl, objcropr, objcropt, objdata, object, objemb, objh, objicemb, objlink, objname, objpub, objscalex, objscaley, objsetsize, objsub, objw, outl, page, pagebb, par, pard, piccropb, piccropl, piccropr, piccropt, pich, pichgoal, picscalex, picscaley, pict, picw, picwgoal, plain, pmmetafile, pn, pndec, pnindent, pnlcltr, pnlcrm, pnlvlblt, pnlvlbody, pnlvlcont, pnqc, pnqr, pnstart, pntext, pntxta, pntxtb, pnucltr, pnucrm, protect, pwd, qc, qj, ql, qr, rdblquote, red, result, revauth, revised, ri, row, rquote, rtf, rtlch, rtldoc, rtlmark, rtlpar, s, sa, sb, sbys, scaps, sect, sectd, shad, shading, sl, slmult, strike, stylesheet, sub, super, tab, tb, tc, tldot, tleq, tlhyph, tlth, tlul, tqc, tqdec, tqr, trbrdrb, trbrdrl, trbrdrr, trbrdrt, trgaph, trleft, trowd, trqc, trqr, tx, u, uc, ul, uld, uldash, uldashd, uldashdd, uldb, ulhair, ulnone, ulth, ulw, ulwave, up, utf, v, viewkind, viewscale, wbitmap, wbmbitspixel, wbmplanes, wbmwidthbytes, wmetafile, xe, zwj, zwnj.
ITextServices Windowless InterfaceAs described above, you can get an ITextServices interface using CreateTextServices(), but this requires that you implement your own ITextHost object. If you use CreateWindow() instead, you can still use ITextServices methods by using the following code:
SendMessage(hedit, EM_GETOLEINTERFACE, 0, (LPARAM)&punk);
if(punk)
{
hr = pUnk->QueryInterface(IID_ITextServices, (void **)&_pserv);
pUnk->Release();
.... // Use _pserv methods
_pserv->Release();
}
All ITextServices methods are typed simply as HRESULT. This differs from standard com interface functions, which are typed HRESULT STDMETHODCALLTYPE. The methods are:
TxSendMessage(msg, wparam, lparam, plresult)
TxDraw(dwDrawAspect, lindex, pvAspect,ptd, hdcDraw,
hicTargetDev, lprcBounds, lprcWBounds, lprcUpdate,
pfnContinue, dwContinue, lViewId)
TxGetHScroll(plMin, plMax, plPos, plPage, pfEnabled)
TxGetVScroll(plMin, plMax, plPos, plPage, pfEnabled)
OnTxSetCursor(dwDrawAspect, lindex, pvAspect, ptd,
hdcDraw, hicTargetDev, lprcClient, x, y)
TxQueryHitPoint(dwDrawAspect, lindex, pvAspect, ptd,
hdcDraw, hicTargetDev, lprcClient, x, y, pHitResult)
OnTxInPlaceActivate(prcClient)
OnTxInPlaceDeactivate()
OnTxUIActivate()
OnTxUIDeactivate()
TxGetText(pbstrText)
TxSetText(pszText)
TxGetCurTargetX(pX)
TxGetBaseLinePos(pPos)
TxGetNaturalSize(dwAspect, hdcDraw, hicTargetDev, ptd, dwMode,
psizelExtent, pwidth, pheight)
TxGetDropTarget(ppDropTarget)
OnTxPropertyBitsChange(dwMask, dwBits)
TxGetCachedSize(pdwWidth, pdwHeight)
Getting to the TOM Interfaces
// Skeleton function to manipulate text using TOM ITextRange interface
HRESULT Manipulate(HWND hedit)
{
IUnknown * punk;
ITextDocument *pdoc;
ITextRange * prg;
SendMessage(hedit, EM_GETOLEINTERFACE, 0, (LPARAM)&punk);
if(punk)
{
HRESULT hr;
hr = punk->QueryInterface(IID_ITextDocument, (void **)&pdoc);
if(pdoc)
{
hr = pdoc->Range(0, 0, &prg);
if(prg)
{
...
prg->Release();
}
pdoc->Release();
}
punk->Release();
return hr;
}
return E_NOINTERFACE;
}
Font Binding
RichEdit 3.0 will assign a charset to plain-text characters depending on their context. E.g., Hangul symbols get HANGUL_CHARSET, nonneutral ANSI characters get ANSI_CHARSET in any event, Chinese characters get SHIFTJIS_CHARSET if kana characters are found nearby and GB2312_CHARSET if no kana are found nearby. Greek characters get GREEK_CHARSET, etc. Note that we’re using Unicode internally, so this use of charset differs from the original one used in font specifications. But charset seems to be a pretty good match with what we want, which is a script, and our CHARFORMAT has a well-defined place for the charset. It also helps with some anomalies in Win95, where we can't always use Unicode. Neutral characters like blanks and digits get assigned a charset depending on their context. For example, a blank surrounded by characters of the same charset gets that charset. More generally neutrals/digits for BiDi text are assigned charsets in a way based on the Unicode BiDi algorithm. Once charsets are assigned, we scan the text around the insertion point forward and backward to find the nearest fonts that have been used for the charsets. If no font is found for a charset, we use the font chosen by the client for that charset. If the client hasn’t specified a font for the charset, we use the default Office 9 font for that charset. If the client wants some other font, it can always change it, but the hope is that this approach will work most of the time. Our current default font choices are based on the following table:
CodePage
Languages
Font facename
Size
125x
Western, CE, ME...
Times New Roman
10
932
Japanese
MS Mincho
10.5
949
Korean
Batang
10
936
Simplified Chinese
MS Song
10
950
Traditional Chinese
New MingLiU
10
874
Thai
Tahoma
8
Hence in our default font-binding table (entries have charset, facename, size), we allow ANSI_CHARSET to match all 8 125x charsets, while the appropriate charset matches other fonts on a one-to-one basis. More precisely, we use the ANSI_CHARSET choice whenever no other alternative is found. The client will be able to specify a finer granularity than this, e.g., assign a specific ARABIC_CHARSET for Arabic runs, a specific Greek font for Greek runs, etc. This finer granularity will also be used if a font with the desired charset stamp is found somewhere in the document before the area being font-bound.
XSS en algunos temas de WordPress
Hace un par de días un compañero de trabajo me avisaba de que la web de Mosaic, en la que hago “más o menos” de responsable técnico tenía un problema de XSS (inyección de código) en el formulario de búsqueda.
Alarmado, rápidamente actualicé la versión de WordPress a la 2.9.1, pero no conseguí solucionar el problema. La prueba era fácil, poniendo este sencillo script en el formulario de búsqueda
<script>alert("hola");</script>
Se abría un cuadro de diálogo de alerta.
Hoy, con tranquilidad, me he dedicado a investigar. El error se produce sólo en algunos blogs de WordPress, no en todos. Por tanto no es un problema del gestor de contenidos.
Después de algunas pruebas y algunos cambios, el error ha aparecido. Es un problema de algunos temas de WordPress y es muy fácil de arreglar. En el formulario de búsqueda de los temas que tienen la vulnerabilidad podemos ver algo parecido a esto:
<label for="s"><input type="text" name="s" id="s" size="50" maxlength="200" value="<?php echo get_search_query(); ?>" /></label>
El problema es el echo del código php. Eliminándolo se elimina el problema. Fácil :)
Actualización: Tal como apuntan Javier y Oscar en los comentarios, el problema no es tanto del echo (que permite mostrar la cadena buscada) como el hecho que no se filtre adecuadamente get_search_query().
Por tanto, tal y como propone Javier, en vez de eliminar el echo la solución más elegante es <?php echo htmlentities(get_search_query()); ?>
Special Capabilities of a Math Font
A fairly common inquiry is how a program can use and access the many special glyph variants of a math font. It’s clearly a much more intricate interaction than encountered in most text applications. This post outlines how the Office math layout software interacts with the Cambria Math font and, in principle, with any other math font that has similar capabilities. More specifically, this post describes the functionality of the special library, mathfont.dll, which is shipped with Office 2007/2010. This library, in turn, interacts with the OpenType and OpenType-like tables in a math font.
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Cambria Math and the math tables were developed together with the Office 2007 math software, each influencing the other to obtain high quality results. Some history is given in the post High-Quality Editing and Display of Mathematical Text in Office 2007. The font contains extensive math tables, glyph variants and glyphs for much of the Unicode math character set. It was designed with ClearType and excellent screen readability in mind and enables the best screen-resolution display of math text available today.
The specialized math tables include values that control glyph placements in math zones. Many math constants are defined to handle displacements such as axis height, fraction rule thickness, etc. The math tables are formalized as OpenType tables, although they are not yet part of the OpenType standard. Refinements include entries for positioning subscripts/superscripts horizontally using cut-ins and italic corrections. The cut-in tables allow automatic positioning of subscripts and superscripts horizontally better than un-tweaked TeX. Math characters have four cut-in values, one for each corner, allowing sub/superscripts to be kerned with their bases. Other table entries give larger glyph variants for operators like the integral sign, square root, and stretchy characters such as brackets and arrows.
The math tables are organized as a hierarchy accessed via the OpenType ID “MATH”. The names of the tables in the hierarchy are MathConstants, MathGlyphInfo, MathItalicsCorrectionInfo, MathTopAccentAttachment, ExtendedShapeCoverage, MathKernInfo, MathKern, MathVariants, MathGlyphConstruction, and GlyphAssembly. The MathConstants table includes parameters such as the em-size-dependent sub/superscript values
LONG lSubscriptShiftDown;
LONG lSubscriptTopMax;
LONG lSubscriptBottomDropMin;
LONG lSuperscriptShiftUp;
LONG lSuperscriptShiftUpCramped;
LONG lSuperscriptBottomMin;
LONG lSuperscriptTopRiseMin;
LONG lSubSuperscriptMinGap;
LONG lSuperscriptBottomMaxWithSubscript;
LONG lSpaceAfterScript;
Cambria Math contains full sets of glyph variants that have heavier weights so that when scaled down to the script and scriptscript levels the stem widths match those of the text-level glyphs. The prime (U+2032) and multiple prime characters need to be superscripted and scaled down accordingly. The dotless i and j glyph variants are used in the bases of accent objects. Accents over larger bases are given by special flattened and/or widened glyph variants.
Brackets, braces, parentheses and other stretchy characters have a number of larger glyph variants as well as arbitrarily large size created using glyph assemblies. When the assemblies are displayed, the pieces are clipped to prevent overlap, since overlaps create ClearType artifacts.
One choice not handled by the math font tables is that for the italic open-face characters 0x2145 - 0x2149 (differential D, d, and e, i, j). According to a document setting, software can display these characters as themselves (useful for patent applications) or with the corresponding math italic or corresponding ASCII letters. Serif italic glyphs are used for these in most math publications, but serif upright glyphs are used in some European math publications and math calculation engines. The use of the differential d (U+2146) automatically introduces a small space between it and the preceding character if that character is alphabetic.
An OpenType table or feature is identified by a 32-bit constant equal to the contents of a four-byte little-endian string. For example, the “MATH” table is identified by the string 0x4854414D. In C/C++, you can use the macro
#define MakeTag(a, b, c, d) (((d)<<24) | ((c)<<16) | ((b)<<8) | a)
#define tagMATH MakeTag('M','A','T','H')
to create such IDs if you don’t want to type the ASCII values of the letters directly. Note that these IDs are case sensitive. In particular, “MATH” identifies the overall math table hierarchy, and “math” identifies the math script, which is used for math glyph-variant features such as subscripts, superscripts, and dotless i's.
mathfont.dll functions
The following table describes the functions exported by the mathfont.dll. All functions return an HRESULT. Some entries in the table refer to the “current font metrics”. These metrics depend on the font height (point size), the script level (0 for text size, 1, for script size and 2 for scriptscript size or higher level nestings), and the device mode (reference or presentation).
mathfont.dll function
Purpose
OpenType table used
GetMathConstants
Get pointer to math constants
MATH
GetMathGlyphItalicsCorrection
Get italic correction for a glyph at current font metrics
MATH
GetMathGlyphTopAccentAttachment
Get top accent attachment displacement for a glyph at current font metrics
MATH
GetMathGlyphIsExtendedShape
In [left]sub/sup math objects, determine whether adjacent base glyph is extended, i.e., stretched vertically
MATH
GetMathGlyphKerning
Get kerning for a given corner and height of a glyph at current font metrics
MATH
GetMathGlyphVariant
Get possibly stretched glyph variant or set of glyphs for a glyph of desired size at current font metrics
MATH
GetMathGlyphVariantItalicsCorrection
Get italic correction for a vertically stretched glyph (or set of glyphs) at current font metrics
MATH
GetMathGlyphScriptShape
Get glyph variant for script or scriptscript size (use “ssty” feature for “math” script and “dflt” language)
GDEF, GSUB
GetMathGlyphDotlessForm
Get dotless glyph variant (for i or j like glyphs) (use “dtls” feature for “math” script and “dflt” language)
GDEF, GSUB
GetMathGlyphAccentFlattenedShape
Get flattened accent glyph variant if base height exceeds x height ) (use “flac” feature for “math” script and “dflt” language)
GDEF, GSUB
GetMathFontTextMetrics
If font is a math font, get math font text ascent, descent, and linegap at current font metrics
OS/2
Right to Left Math Zone Considerations
Right-to-left math requires mirroring the images of parentheses, integrals, square roots, arrows, etc. Many such mirror images can be obtained by using corresponding Unicode characters. For example the mirror image of a left parenthesis is a right parenthesis and vice versa. Such glyph variants are automatically returned by the Uniscribe function ShapeString() if SCRIPT_ANALYSIS::fRtl = TRUE. But Unicode doesn’t have many characters that are mirror images of other characters, such as integral signs and square roots. Furthermore it seems that using glyph variants for these characters makes more sense than adding characters to serve as the mirror images. Other approaches include using world transforms and mirrored bitmaps. But these approaches don’t solve the problem that the right-to-left character desired sometimes isn’t a perfect mirror image, e.g., the contour integral.
In principle (and in a prototype I’m working on), the glyph variant approach works by following the ShapeString() call with a call to Uniscribe’s ScriptSubstituteSingleGlyph() specifying tagScript as "math", tagLangSys as "dflt", and tagFeature as "rtlm". Here "math" identifies the script as math, "dflt" specifies the default language, and "rtlm" requests right-to-left mirroring. If no such special mirrored glyph exists, the call does nothing. In particular, if the appropriate mirrored glyph is given by a Unicode character, the call does nothing, so the ShapeString() call can be followed by the ScriptSubstituteSingleGlyph() call and never result in “double mirroring”.
If you want a complete specification of the math tables, please email me. Hopefully someday the specification will be available as part of the official OpenType standard. The mathfont.dll code was written by Sergey Malkin.

