Author Topic: Text file/Excel problem. Any ideas?  (Read 4374 times)

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Text file/Excel problem. Any ideas?
« on: October 12, 2013, 04:06:49 AM »
I have a large text file of product descriptions by part number that I got from my distributor. I need to bring it into Excel and save it as a csv file.

After the first try, I discovered that Excel had cut short the descriptions on a lot of products. It didn't have to do with the length of the description, as some products had many sentences, and others got cut off after one or two. It must have something to do with the formatting of the text file.

I looked at the text file, and the full descriptions are there. I tried using an existing csv file with long descriptions, eliminating all but three or four rows just to make it manageable, and then importing the product description text file. It still cut off the descriptions, and on the same products. It's the same ones each time.

Does anyone have any idea as to what the problem might be, and if there's a way to solve it?

TechMan

  • Administrator
  • Senior Member
  • *****
  • Posts: 10,562
  • Yes, your moderation has been outsourced.
Re: Text file/Excel problem. Any ideas?
« Reply #1 on: October 12, 2013, 04:44:40 AM »
Can you post an example from the txt file and the excel file  (one that works and one that doesn't.)
Quote
Hawkmoon - Never underestimate another person's capacity for stupidity. Any time you think someone can't possibly be that dumb ... they'll prove you wrong.

Bacon and Eggs - A day's work for a chicken; A lifetime commitment for a pig.
Stupidity will always be its own reward.
Bad decisions make good stories.

Quote
Viking - The problem with the modern world is that there aren't really any predators eating stupid people.

Hawkmoon

  • friend
  • Senior Member
  • ***
  • Posts: 27,342
Re: Text file/Excel problem. Any ideas?
« Reply #2 on: October 12, 2013, 10:38:37 AM »
Excel has a limit on the size of a body of text it will paste into a cell. I have one spread sheet that I update by copying an old worksheet to a new one as a template, then entering data into the new copy (whose tab gets renamed for the date). One cell is a paragraph of text, and it doesn't come in in tact -- it's truncated. Excel gives me a warning -- I then have to go back to the master tab, go to that cell, highlight the full text in the edit window at the top of the screen, then go back to the new copy and paste it.

For me it's only a nuisance, since it only affects one cell. If it's affecting a bunch of entries and you have to read each one to find out which are affected, it's a problem. It sounds a bit like perhaps Access would be a better application than Excel.

The warning I get:

Quote
The sheet you are copying has cells that contain more than 255 characters. When you copy the entire sheet, only the first 255 characters in each cell are copied. To copy all of the characters, copy the cells to a new sheet instead of copying the entire sheet.
« Last Edit: October 12, 2013, 10:42:27 AM by Hawkmoon »
- - - - - - - - - - - - -
100% Politically Incorrect by Design

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Re: Text file/Excel problem. Any ideas?
« Reply #3 on: October 12, 2013, 11:19:47 AM »
I can try posting the files. They're about 3 megs each. There's 53,000 rows each.

Some of the descriptions are being cut off after one to three sentences. Some go on for 200 words with no problem. There's no pattern as to the length of the description. The ones being cut off don't have any difference in formatting, at least that I can see.


Tallpine

  • friends
  • Senior Member
  • ***
  • Posts: 23,172
  • Grumpy Old Grandpa
Re: Text file/Excel problem. Any ideas?
« Reply #4 on: October 12, 2013, 11:22:11 AM »
What is the separation character for the input text file?

Freedom is a heavy load, a great and strange burden for the spirit to undertake. It is not easy. It is not a gift given, but a choice made, and the choice may be a hard one. The road goes upward toward the light; but the laden traveller may never reach the end of it.  - Ursula Le Guin

GigaBuist

  • friends
  • Senior Member
  • ***
  • Posts: 4,345
    • http://www.justinbuist.org/blog/
Re: Text file/Excel problem. Any ideas?
« Reply #5 on: October 12, 2013, 01:07:06 PM »
The trick is probably in making sure you've the field separators, line separators, and (possibly optional) enclosure marks.  I wouldn't be surprised if you have description fields that look like this.

"This is a sentence.  This is a second one.  This will Import fine for MonkeyLeg."

"This describes a 3" ... and nobody's reading anymore because it cut off."

Or something like that.  You might have newlines in your descriptions too.

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Re: Text file/Excel problem. Any ideas?
« Reply #6 on: October 12, 2013, 01:45:00 PM »
The trick is probably in making sure you've the field separators, line separators, and (possibly optional) enclosure marks.  I wouldn't be surprised if you have description fields that look like this.

"This is a sentence.  This is a second one.  This will Import fine for MonkeyLeg."

"This describes a 3" ... and nobody's reading anymore because it cut off."

Or something like that.  You might have newlines in your descriptions too.

Tallpine, I've tried different characters. The one I use most often is a comma.

I've looked at the cut off sentences. There's not double quotes in most of them. A couple have embedded videos that have brackets and quotes. Those are easy enough to get rid of.

I'm thinking there's hard line breaks that I can't see in the text file.

AZRedhawk44

  • friends
  • Senior Member
  • ***
  • Posts: 13,987
Re: Text file/Excel problem. Any ideas?
« Reply #7 on: October 12, 2013, 01:54:31 PM »
Dick, send me the txt file.  Let a DBA hack at it for a bit with SQL Integration Services... much more powerful than Excel's data import capabilities.
"But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist."
--Lysander Spooner

I reject your authoritah!

Tallpine

  • friends
  • Senior Member
  • ***
  • Posts: 23,172
  • Grumpy Old Grandpa
Re: Text file/Excel problem. Any ideas?
« Reply #8 on: October 12, 2013, 02:14:58 PM »
Tallpine, I've tried different characters. The one I use most often is a comma.

...

For the input file?

Seems like the options are: comma, space, and tab.  The first two obviously won't work right if you have multiword text in a field ("This product is new, and you will really like it ...").

A hard line break should indicate a new record (row in excel).

I do this stuff all the time for configuration files.  A long time ago I wrote a C++ CFile derivation that had functions to read up to a specified "stop" character, etc for parsing text files directly.
Freedom is a heavy load, a great and strange burden for the spirit to undertake. It is not easy. It is not a gift given, but a choice made, and the choice may be a hard one. The road goes upward toward the light; but the laden traveller may never reach the end of it.  - Ursula Le Guin

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Re: Text file/Excel problem. Any ideas?
« Reply #9 on: October 12, 2013, 03:45:10 PM »
Sorry. Brain dead. I  use tab and quotation mark.

Tallpine

  • friends
  • Senior Member
  • ***
  • Posts: 23,172
  • Grumpy Old Grandpa
Re: Text file/Excel problem. Any ideas?
« Reply #10 on: October 12, 2013, 05:15:25 PM »
Sorry. Brain dead. I  use tab and quotation mark.

Like: "123456"<tab>"This product is wonderful!"<cr>     ???

I'm thinking there is a tab embedded in some of the product descriptions.  =|
Freedom is a heavy load, a great and strange burden for the spirit to undertake. It is not easy. It is not a gift given, but a choice made, and the choice may be a hard one. The road goes upward toward the light; but the laden traveller may never reach the end of it.  - Ursula Le Guin

AZRedhawk44

  • friends
  • Senior Member
  • ***
  • Posts: 13,987
Re: Text file/Excel problem. Any ideas?
« Reply #11 on: October 12, 2013, 05:20:41 PM »
Like: "123456"<tab>"This product is wonderful!"<cr>     ???

I'm thinking there is a tab embedded in some of the product descriptions.  =|

Or quotes inside of the description field.  Or CR/LF inside the description field.
"But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist."
--Lysander Spooner

I reject your authoritah!

Tallpine

  • friends
  • Senior Member
  • ***
  • Posts: 23,172
  • Grumpy Old Grandpa
Re: Text file/Excel problem. Any ideas?
« Reply #12 on: October 12, 2013, 05:44:49 PM »
Or quotes inside of the description field.  Or CR/LF inside the description field.

So it's a "text file problem" - not an "Excel problem"  :angel:

(I have no clue how Excel exports an <Alt-Enter> within a cell  ???)
Freedom is a heavy load, a great and strange burden for the spirit to undertake. It is not easy. It is not a gift given, but a choice made, and the choice may be a hard one. The road goes upward toward the light; but the laden traveller may never reach the end of it.  - Ursula Le Guin

Hawkmoon

  • friend
  • Senior Member
  • ***
  • Posts: 27,342
Re: Text file/Excel problem. Any ideas?
« Reply #13 on: October 12, 2013, 05:44:59 PM »
What program are you using to manipulate/check the text file? If you use Word, you can click to paragraph icon in the menu bar to display formatting codes such as spaces and line breaks. I've found in working with files sent to me by other people that there is some kind of "soft" line break that displays a different code symbol than a hard line break. I have never figured out what creates these "soft" line breaks, because I have never EVER seen one show up in any document that I have created -- but they exist, and may be related to your problem.

I'm still using Word 2003. For me, the "display formatting codes" icon is a typical backwards 'P' paragraph symbol that shows up on the upper menu bar, right next to the window for the zoom level. If you use Word 2007 or 2010, I don't know where the icon will be but I'm sure display codes is still available.

{Edit to add} Now y'all got me curious, so I went searching. The funky symbol I sometimes encounter is a "manual line break," and it's created by the combination of SHIFT+ENTER. The symbol that appears on screen for this is an 'L' shaped arrow, such as appears on many ENTER keys (what a coincidence!). In a relatively simple Word document, it seems to behave the same as a standard line break but it may do other things if it's in a file that gets imported into a different application.
« Last Edit: October 12, 2013, 05:56:00 PM by Hawkmoon »
- - - - - - - - - - - - -
100% Politically Incorrect by Design

Hawkmoon

  • friend
  • Senior Member
  • ***
  • Posts: 27,342
- - - - - - - - - - - - -
100% Politically Incorrect by Design

never_retreat

  • Head Muckety Muck
  • friend
  • Senior Member
  • ***
  • Posts: 3,158
Re: Text file/Excel problem. Any ideas?
« Reply #15 on: October 12, 2013, 07:32:50 PM »
Don't open it directly with excel. Open excel and do an import, you can then tell it what to do with each column.
Works good especially if you have part numbers that begin with a zero. Excel loves to dump those.
I needed a mod to change my signature because the concept of "family friendly" eludes me.
Just noticed that a mod changed my signature. How long ago was that?
A few months-mods

Phyphor

  • friend
  • Senior Member
  • ***
  • Posts: 2,330
Re: Text file/Excel problem. Any ideas?
« Reply #16 on: October 12, 2013, 09:24:44 PM »
What program are you using to manipulate/check the text file? If you use Word, you can click to paragraph icon in the menu bar to display formatting codes such as spaces and line breaks. I've found in working with files sent to me by other people that there is some kind of "soft" line break that displays a different code symbol than a hard line break. I have never figured out what creates these "soft" line breaks, because I have never EVER seen one show up in any document that I have created -- but they exist, and may be related to your problem.

I'm still using Word 2003. For me, the "display formatting codes" icon is a typical backwards 'P' paragraph symbol that shows up on the upper menu bar, right next to the window for the zoom level. If you use Word 2007 or 2010, I don't know where the icon will be but I'm sure display codes is still available.

{Edit to add} Now y'all got me curious, so I went searching. The funky symbol I sometimes encounter is a "manual line break," and it's created by the combination of SHIFT+ENTER. The symbol that appears on screen for this is an 'L' shaped arrow, such as appears on many ENTER keys (what a coincidence!). In a relatively simple Word document, it seems to behave the same as a standard line break but it may do other things if it's in a file that gets imported into a different application.

Assuming this is a standard ASCII derived text file, it could be ASCII code 10 (Line Feed) or 13 (Carriage Return, AKA Enter key).
I'd kind of like to see this text file myself, load it into a hex editor and see precisely what's going on.

"You know what's messed-up about taxes?
You don't even pay taxes. They take tax.
You get your check, money gone.
That ain't a payment, that's a jack." - Chris Rock "Bigger and Blacker"
He slapped his rifle. "This is one of the best arguments for peace there is. Nobody wants to shoot if somebody is going to shoot back. " Callaghen, Callaghen, Louis La'mour

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Re: Text file/Excel problem. Any ideas?
« Reply #17 on: October 12, 2013, 09:37:52 PM »
I imported the text file into Access, then exported that to Excel. Everything is working fine.

I forgot about that one.

Hawkmoon

  • friend
  • Senior Member
  • ***
  • Posts: 27,342
Re: Text file/Excel problem. Any ideas?
« Reply #18 on: October 13, 2013, 12:54:01 AM »
Good work-around, but it's too bad it doesn't tell you what's actually happening to create the glitch.
- - - - - - - - - - - - -
100% Politically Incorrect by Design

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Re: Text file/Excel problem. Any ideas?
« Reply #19 on: October 13, 2013, 01:45:15 AM »
Broke out the champagne too soon. it's there, but there's still long descriptions (such as for Leupold scopes) that are seriously truncated. Cordex has been generously helping with this, and he's going to try to fix the file. I don't think the distributor will want to put any more time into it.

Tallpine

  • friends
  • Senior Member
  • ***
  • Posts: 23,172
  • Grumpy Old Grandpa
Re: Text file/Excel problem. Any ideas?
« Reply #20 on: October 13, 2013, 10:35:38 AM »
Don't open it directly with excel. Open excel and do an import, you can then tell it what to do with each column.
Works good especially if you have part numbers that begin with a zero. Excel loves to dump those.

I usually do Ctrl-A/Ctrl-C in Notepad, and then paste into cell A-1 and Excel does the rest.

But then I am generally working with in-house text files to begin with, that don't have the Gettysburg Address in one of the fields  ;/
Freedom is a heavy load, a great and strange burden for the spirit to undertake. It is not easy. It is not a gift given, but a choice made, and the choice may be a hard one. The road goes upward toward the light; but the laden traveller may never reach the end of it.  - Ursula Le Guin

AZRedhawk44

  • friends
  • Senior Member
  • ***
  • Posts: 13,987
Re: Text file/Excel problem. Any ideas?
« Reply #21 on: October 13, 2013, 10:37:01 AM »
Broke out the champagne too soon. it's there, but there's still long descriptions (such as for Leupold scopes) that are seriously truncated. Cordex has been generously helping with this, and he's going to try to fix the file. I don't think the distributor will want to put any more time into it.

I believe that Excel will store up to 32KB in a cell, but it will only display in the UI up to 1024 characters.


ETA:  While it may not be displaying in the UI, you can confirm that the data is truly present by re-exporting the XLS (or XLSX) file back to CSV.  If it's the same size as the previous one, it has all the data and didn't truncate any of it.  Or, spot-check known records with extra long descriptions after exporting back to CSV.
« Last Edit: October 13, 2013, 10:45:23 AM by AZRedhawk44 »
"But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist."
--Lysander Spooner

I reject your authoritah!

cordex

  • Administrator
  • Senior Member
  • *****
  • Posts: 8,682
Re: Text file/Excel problem. Any ideas?
« Reply #22 on: October 13, 2013, 11:08:58 AM »
Here is where ML stands at the moment:
Removed all non-ASCII characters from the file.
Replaced all double quotes within HTML tags with single quotes (I don't think there is any nesting to worry about there).
Replaced all double quotes within HTML content with &quot;
Deleted two records with more than 32k of horrific extraneous HTML. If he wants to sell the FLIR we can handle those two manually.
Reexported the file in tab delimited format.

I toasted the double quotes entirely (except as text identifiers) because different systems escape them differently: Excel uses "" while almost everyone else uses \"

ML, if you need me to combine it with your descriptions I would be happy to do so in order to give you a file ready to import.

Monkeyleg

  • friend
  • Senior Member
  • ***
  • Posts: 14,589
  • Tattaglia is a pimp.
    • http://www.gunshopfinder.com
Re: Text file/Excel problem. Any ideas?
« Reply #23 on: October 13, 2013, 11:58:28 AM »
That sounds like a lot of work, Cordex. I can combine my hand-written descriptions with what you've done. I can't ask you to do any more.

MechAg94

  • friend
  • Senior Member
  • ***
  • Posts: 33,869
Re: Text file/Excel problem. Any ideas?
« Reply #24 on: October 13, 2013, 12:04:54 PM »
Don't open it directly with excel. Open excel and do an import, you can then tell it what to do with each column.
Works good especially if you have part numbers that begin with a zero. Excel loves to dump those.
I think there is an option to format each column a specific way.  I've had the opposite problem.  Imported big column of numbers and realize that Excel doesn't consider all the numbers to be numbers and skips them for totals and averages. 
“It is much more important to kill bad bills than to pass good ones.”  ― Calvin Coolidge