Revision of HTML'izing the Homeowner's Manual from 2007, November 5 - 8:32pm

The revisions let you track differences between multiple versions of a post.

Our property manager has published our Homeowner's Manual in a form that is not well suited for use on the web. Here, we are working to HTML'ize the homeowner's manual to make it searchable and overall more accessible and usable.

The document has come to us (from Piedmont Management?) as a PDF, but the pdf contains not text, but images of text. So Dave passed it through an OCR program to try and extract the text. We can then use that to generate the HTML, and after we're satisfied, publish it on the Windsor's official site.

See Susan's comment for its current (2007-11-05) state. (Note, the comment was posted there by accident. Discussion of the html'izing process should happen here (comments on the manual itself can happen there).)

AttachmentSize
Source PDF of 2007-10-09 Windsor Homeowner's Manual1.03 MB
Rough Text by OCR of 2007-10-09 Windsor Homeowner's Manual66.22 KB

Comments

susanmcclendon's picture

Volunteering

I see in the website To Do List the topic "HTMLize the Homeowners' Manual." If no one else has taken this on, I would like to volunteer to do it. I have a lot of experience with straight HTML but none with content-management systems like the one the web site uses.

I could get started right away on the content, then explore with your help how to implement it. Please let me know how/whether to proceed.

Dave Allen Barker Jr's picture

Please Do!

I see in the website To Do List the topic "HTMLize the Homeowners' Manual." If no one else has taken this on, I would like to volunteer to do it. I have a lot of experience with straight HTML but none with content-management systems like the one the web site uses.

Excellent, thanks for your help!

I just got started the other day (but haven't updated the To Do list yet). Because the Windsor's website is so deficient, I've started the work on my testbed site, so the HTML'ized version (in progress) can be uploaded to the official site when it's done. That way we can use my site's collaborative features like revision control and comments to keep from stepping on each other's toes as we work.

I could get started right away on the content, then explore with your help how to implement it. Please let me know how/whether to proceed.

Unfortunately, the provided PDF doesn't contain text, but images of pages of text. So I ran it through an OCR program and extracted some fairly accurate, though unformatted text. So all that's left to do is correct the OCR errors and HTML'ize it (removing carriage returns and formatting (to be implemented in CSS later), and apply semantic HTML).

So, if you create an account on my website (let me know if it gives you any trouble, it's been a while since people have signed up), and start editing from where the HTML ends and the plain text begins, that'd be great.

Thanks.

Thank you! Hopefully I wasn't too confusing.

I've been having Website Committee meettings on an "as needed" basis. If you'd like to get started with this in person, let me know.

susanmcclendon's picture

More Questions

Ok, I'm logged in and 'legal' and I see how to edit text.

I can see that it would be much easier to work offline than try to edit in that small window -- would this be ok?

Also, may I propose classes and ids to be styled? and CSS?

Are you envisioning this as a single, long page, or can it be divided into multiple pages? If so, how is that accomplished?

Dave Allen Barker Jr's picture

More Answers

Ok, I'm logged in and 'legal' and I see how to edit text.

Cool.

I can see that it would be much easier to work offline than try to edit in that small window -- would this be ok?

Of course. Just save the work you do offline back to the webpage so we dont' duplicate work (copy from the edit window into your offline tools, work, copy, paste back into the edit window (replacing what was there) and submit the webpage changes).

Also, may I propose classes and ids to be styled? and CSS?

That would be great!

So far I was just wrapping the sections identified in the table of contents in a div with an id of "in" where n is a number, to be linked to from the table of contents later.

Maybe you could hide in an HTML comment at the top what ids and classses you've created and what their purpose is (if it's not obvious)?

For the CSS, maybe just define it in a style element defined at the top for now. Although the resulting page won't be legal, I think most browsers will render it anyway. Once we're closer to finished, we can move it out to it's own file to be linked in, or something else. What do you think?

Are you envisioning this as a single, long page,
or can it be divided into multiple pages? If so, how is that
accomplished?

For now I was just thinking a single page. Simplifies our work a little bit, it's difficult to work with multiple documents on the current official Windsor website, and users can search through it in their browsers (using Find).

If we do decide to divide it into multiple pages, in what way do you think we should do it? By section? Subsection? By page number?

It's so great to have someone else to work with!

susanmcclendon's picture

CSS and Officialness

I suggest that what I do is make the single file work like a regular web page, with CSS styles embedded at the top and a Table of Contents that jumps to bookmark anchors within the same file. I did think that dividing it up might make maintenance easier but that can wait.

Another issue that will come up is, which version is the "official" version, the pdf or the web page -- does the Maria send that pdf to new owners? If so, we need to have an easy way to keep them in sync -- that may be the deciding factor in whether to divide it or not. I just tried printing a multiple page web page to pdf -- that's exactly what to do.

Does the content management system have any default styles?

I'm going to be on vacation tomorrow, at home. This will be an entertaining way to spend some of that time instead of just vegging out.

Dave Allen Barker Jr's picture

Re: CSS and Officialness

I suggest that what I do is make the single file work like a regular web page, with CSS styles embedded at the top and a Table of Contents that jumps to bookmark anchors within the same file. I did think that dividing it up might make maintenance easier but thatcan wait.

I agree.

Another issue that will come up is, which version is the "official" version, the pdf or the web page --

I understand your point. The PDF would be closer to official, being generated directly from the source (a printout or fax of an MS Word document possibly?). And since our HTML is a bit of a hack, it would be even less official.

does the Maria send that pdf to new owners?

I'm not sure.

If so, we need to have an easy way to keep them in sync --

We should! I asked Maria some time ago about gaining access to the originals for HTML'izing.

that may be the deciding factor in whether to divide it or not. I just tried printing a multiple page web page to pdf -- that's exactly what to do.

Not sure I follow you.

Does the content management system have any default styles?

The TOPS IWSS styles are an ugly combination of definitions declared in the style element of the head, and pre-CSS inline styling by attributes. Styles seemed to be specified for generic elements (no classes or ids), with exception of the navigation list. AtHomeNet allows choosing from pre-defined styles through their Website Style Selector, but does not allow customization beyond that (at our service level).

I'm going to be on vacation tomorrow, at home. This will be an entertaining way to spend some of that time instead of just vegging out.

To our website's benefit!

susanmcclendon's picture

new html file

Eeek -- I posted a "comment" at the end of the Homeowner's Manual page instead of using this method, please see that. Susan

Dave Allen Barker Jr's picture

The Referred Comment

"Eeek"! :^)

The comment Susan is referring to should be replied to here.

One can provide links by simply mentioning a url like this, http://5.0ne.org/node/116#comment-59 , (that url can be found from the comment's title), or using a standard html anchor like this (which looks like "<a href="http://5.0ne.org/node/116#comment-59">standard html anchor like this</a>" when you type it).

Dave Allen Barker Jr's picture

The Communications Committee Has Been Notified

Wow, Susan, that's some great work you've done, thank you!

I've forwarded the issue listed you sent me privately to the head of our parent committee, the Communication Committee, to understand the process by which those issues can be addressed.

Dave Allen Barker Jr's picture

The Issues Are Now Online

The issues Susan found are now online as well.

I've yet to hear from the Communications Committee on how we should move forward.