Turn Aggregator items into true Drupal Nodes

Drupal 8 should address the longstanding issue with items from the core RSS aggregator module existing as simple db objects rather tahn nodes, which limits what you can do with them in many ways.

The core Aggregator is really a big deal for me. the db created from these items is one of the biggest draws to my site. So I have tried many things to work around the limitations of DB objects.

I flirted with the idea of dumping the table to a flat file then importing it. That would work, can be CRONed and all that. But I really want to keep my solution specific to Drupal and not engage in a one-off like that.

The core mod has a categorization interface that I really like. It allows me to go through 100s of items every day pretty quickly and assign relevant topics to one of about 20 categories.

So, this is what I did to get the aggregator db items into nodes.

I created a View of the Aggregator Items. I created an RSS feed based on this View. then I used Feeds Importers to import that RSS feed into a content type that I created just for that.  This will work and you will be able to get quite a bit of data from the importer. But it isn’t as flexible as I wanted. UI simply could not get all the fields that I wanted to come across, even when forcing fields. That will work, but I always got undesirable formatting for the links. Or the links would come across malformed, with the name of my site prepended to the URL for example.

I really can’t use the Feeds Importers module to import all of the items in natively either. It will work, And you can build an interface of sorts in Views using VBO to assign taxonomy terms to the items, but it is cludgy and doesn’t scale well. But that can be a viable method too if you don’t have a lot of volume.

So, I want to use the core module for its stability and ease of categorization. but I need to be able to do things like allow users to see the items and make comments on them. I say comments but I really mean take notes. So, I used a Views Content Pane (with override URL and AJAX enabled) with a Node Add/Edit Variant in a Panel Page to allow user to see the Aggregated items, drag URLs from the View, take notes and alter the View (that’s where AJAX is magic) from the same place. Here is a SS of what it looks like.

Image

Then, I created another View of the Content Type “Research” which is used to hold the notes, links and such. I created a simple Content Pane View that shows the title of the Nodes (I use the Private Module to keep these posts completely private) and enabled AJAX so that the View Content Pane refreshes without having to refresh the whole page. Now I have a very nice research interface for my member with an easy way to reference my data.

Aggregators, CRON Jobs and Drupal cleanup.

This was a really involved project. If you use the Aggregator Core module a lot, take a look. I depend on Aggregators more than anything right now and have really had to do some involved work with it. Read on:

I have aggregator needs that the core doesn’t really quite give me. but it does work pretty well. here is what I collect:

  • 50+ feeds from various newspapers culled hourly resulting in several hundred articles per day.
  • Each RSS source is categorized (automatically, by default in drupal) as z-Uncategorized which corresponds to a CID (in the drupal DB) of 22. 
  • As the articles come in, I review and categorize them. I have a shortcut to the z-uncategorized category of items. That gives me all the new items, regardless of the source in one place where I can categorize them quickly by clicking on the categorize tab provided by Core. I keep about 10% of the stories that come in.
  • Because the newspapers maintain articles in their RSS feeds for a period of time beyond my control, they are readded to drupal’s DB whenever the feed is pulled; but now listed with two categories. There are now two entries for each of these stories with the same IID but a different CID. It looks like this below. There is the default z-uncat… category and the Juvenile category that I chose before the feed was queried again.
  • Even thought this looks like one record, it is really two different records in the tables. So, if I look at the aggregator_category_item table, I can see two records for the one IID. One with CID of 22 (the default, z-uncategorized) and the other of whatever I assigned it to. So, I can run a query and delete all with category 22. But, until the newspaper removes it from THEIR feed, it continues to come through.
  • I perform a nightly clean up where I delete all the 22s. This occurs when the papers are slow and new items have all been categorized by me.
  • Eventually (after a few days for most news sources) the stories are removed from the papers’ RSS feeds and do not get repopulated in Drupal with the default of CID 22. So then I am left with a nice single record in the category that I have assigned it to. By cleaning up every night, I get rid of stale 22s as the newspaper removes them from their RSS feed and I don’t have to think about whether they still have it or not.

Image

This is the cron job that I have to do the clean up.

0 22 * * * /usr/bin/mysql –defaults-file=”/home/xxxx/.my.cnf_cron” -e “DELETE FROM drupal.aggregator_category_item WHERE aggregator_category_item.cid = 22” >>/dev/null 2>&1

The .my.cnf.cron file contains authentication information

[client]
host=localhost
user=crondel
password=*****

The user and password is a mysql specific user I created for this job.

The 0 22 * * * means that it will run at 10 PM EST every night. EST because that is the time zone for the server.

Here are the specific rights for the crondel account name for the drupal DB, named, drupal.

GRANT USAGE ON *.* TO ‘crondel’@’localhost’ IDENTIFIED BY PASSWORD ‘*6E52D2AA6010C379DE1AE3BC559E2416A9A5C513’
GRANT SELECT, DELETE ON `drupal`.`aggregator_category_item` TO ‘crondel’@’localhost’

The account needs SELECT rights to execute the WHERE condition of the SQL statement in addition to the DELETE FROM on the specific table in the DB.

You might ask, why not do all this with Feeds? Well, I did try to do it with Feeds. I spent quite a bit of time with it. But feeds grabs each RSS item as a node. And I could not figure out an easy way to categorize the hundreds of stories per day when they all come in as nodes. And since this DB will eventually be huge with 100k+ stories in a searchable archive, I think that it may be easier to keep it this way. I just had to figure out what to do with the extra 22s. And this solution seems to work.

Ug. This was a pain. And if you want to know more about the subject or I have been unclear, let me know and I’ll try to clarify.

Feeds

well, i ran into a major issue with the core aggregator module. I can’t use the categories item as a means of building a views for aggregator items. it is listed there and you would think it should work, but it doesn’t. and it really pissed me off yesterday.

so, i’m dumping the core aggregator to use the feeds module. hopefully, that will be better.

Import/export Aggregator feeds part II

well, i now have it running in prod and there were a few minor differences that i wanted to document.

since i don’t have an X window type desktop, i had to access the phpmyadmin via a remote web browser. so, i had to change the file /etc/conf.d/phpmyadmin.conf in three places to allow connections from all instead of just localhost. i was then able to access the web gui, import the file, and test the aggegator feeds. i had to chanmge som permissions as well to be able to access the file so i could edit it remotely. i could have done it locally via VI Editor, but i hate that thing. it sucks. so after the changes were made i made sure to change the perms back as well as change the file itself so that attempting to access the phpmyadmin gui from the remote host yielded again a “forbidden” error.

import/export aggregator sources – drupal 7 core module

So i have my three environments setup and i am moving module settings from sandbox to quality. i have dozens of aggregator sources that i need to move. and i don’t want to reenter them manually. so i am using phpmyadmin to export the table to an sql file and then import it.

well, i made this work. it was a bit of a pain but not too bad. i had to install phpmyadmin on my sandbox server and export the proper table (aggregator_feed) to a sql file. i then sneakerneted the file to the new server. i had to install phpmyadmin on that server too. that was a bit more work. once i had it installed i was unable to access it without changing the root mysql user from ‘blank’ (no password) to ‘something’ (one of my password defaults). then i was able to import the sql file into the drupal db. since this is a core mod, i didn’t have to do anything to get the file to import properly. then, since i had had changed the password of the user that drupal runs under, i had to change that in the settings.php file used by drupal. done and done and it only took a couple hours.

11/20/12

well, i have made some progress on the aggregator stuff. but now i have to move on.

Superfish menus

CSS stuff for the pixture theme

Language

Drupal News Aggregator Services

well, i’m trying out different news aggregators. there aren’t too many options but i think that the big problem here is that i am trying to make this easy on myself and not wanting to do the (often) hard work of learning something about drupal. i like drupal and it is as powerful as it gets but it can also be a complete and total pain in the ass when it comes to making it do what you want. that is why i have decided as a long term project to learn css and then php. that’s the best way to really become proficient at drupal.

ok, activity stream seems to be a good way to aggregate content on an individual user basis. not helpful for me. that was easy. i believe that feeds is going to be the way to go. but i have to spend some time with it and learn how to make it work properly.

New Aggregators and Drush’s ‘archive-backup’

i am working with the different aggregators that are available for drupal. it is really kinda slow going though because the documentation just isn’t that good. and i supposed that what I want to do is rather involved.

here are my requirements

  1. aggregate news items (RSS Feeds) from about 30 sources
  2. Place them in one big list with category choices in each post for categorization
  3. or have them categorized automatically based on a defined keyword algorithm

i think that is it. number three is the real key, i believe. so that is what i am working on in regards to drupal these days

all in all it is going pretty well though. but what i suspect is going to happen is that I am going to have to write the module for no. 3 myself. and that is cool except that i don’t know how. but, that is where i see my career going at this point. i need to learn programming. so, i am going to try to study php for an hour a day.